Since CBS was not calling reset for its child qdisc, there are scenarios
where it could cause an underflow on its parent's qlen/backlog. When the
parent is QFQ, a null-ptr deref could occur.
Add a test case that reproduces the underflow followed by a null-ptr
deref scenario.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During a reset, CBS is not calling reset on its child qdisc, which
might cause qlen/backlog accounting issues. For example, if we have CBS
with a QFQ parent and a netem child with delay, we can create a scenario
where the parent's qlen underflows. QFQ, specifically, uses qlen to
check whether it should deference a pointer, so this scenario may cause
a null-ptr deref in QFQ:
[ 43.875639][ T319] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] SMP KASAN NOPTI
[ 43.876124][ T319] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
[ 43.876417][ T319] CPU: 10 UID: 0 PID: 319 Comm: ping Not tainted 7.0.0-13039-ge728258debd5 #773 PREEMPT(full)
[ 43.876751][ T319] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 43.876949][ T319] RIP: 0010:qfq_dequeue+0x35c/0x1650
[ 43.877123][ T319] Code: 00 fc ff df 80 3c 02 00 0f 85 17 0e 00 00 4c 8d 73 48 48 89 9d b8 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 76 0c 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b
[ 43.877648][ T319] RSP: 0018:ffff8881017ef4f0 EFLAGS: 00010216
[ 43.877845][ T319] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: dffffc0000000000
[ 43.878073][ T319] RDX: 0000000000000009 RSI: 0000000c40000000 RDI: ffff88810eef02b0
[ 43.878306][ T319] RBP: ffff88810eef0000 R08: ffff88810eef0280 R09: 1ffff1102120fd63
[ 43.878523][ T319] R10: 1ffff1102120fd66 R11: 1ffff1102120fd67 R12: 0000000c40000000
[ 43.878742][ T319] R13: ffff88810eef02b8 R14: 0000000000000048 R15: 0000000020000000
[ 43.878959][ T319] FS: 00007f9c51c47c40(0000) GS:ffff88817a0be000(0000) knlGS:0000000000000000
[ 43.879214][ T319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 43.879403][ T319] CR2: 000055e69a2230a8 CR3: 000000010c07a000 CR4: 0000000000750ef0
[ 43.879621][ T319] PKRU: 55555554
[ 43.879735][ T319] Call Trace:
[ 43.879844][ T319] <TASK>
[ 43.879924][ T319] __qdisc_run+0x169/0x1900
[ 43.880075][ T319] ? dev_qdisc_enqueue+0x8b/0x210
[ 43.880222][ T319] __dev_queue_xmit+0x2346/0x37a0
[ 43.880376][ T319] ? register_lock_class+0x3f/0x800
[ 43.880531][ T319] ? srso_alias_return_thunk+0x5/0xfbef5
[ 43.880684][ T319] ? __pfx___dev_queue_xmit+0x10/0x10
[ 43.880834][ T319] ? srso_alias_return_thunk+0x5/0xfbef5
[ 43.880977][ T319] ? __lock_acquire+0x819/0x1df0
[ 43.881124][ T319] ? srso_alias_return_thunk+0x5/0xfbef5
[ 43.881275][ T319] ? srso_alias_return_thunk+0x5/0xfbef5
[ 43.881418][ T319] ? __asan_memcpy+0x3c/0x60
[ 43.881563][ T319] ? srso_alias_return_thunk+0x5/0xfbef5
[ 43.881708][ T319] ? eth_header+0x165/0x1a0
[ 43.881853][ T319] ? lockdep_hardirqs_on_prepare+0xdb/0x1a0
[ 43.882031][ T319] ? srso_alias_return_thunk+0x5/0xfbef5
[ 43.882174][ T319] ? neigh_resolve_output+0x3cc/0x7e0
[ 43.882325][ T319] ? srso_alias_return_thunk+0x5/0xfbef5
[ 43.882471][ T319] ip_finish_output2+0x6b6/0x1e10
Fix this by calling qdisc_reset for CBS' child qdisc.
Sashiko caught an issue which could result in a null ptr deref if
qdisc_create_dflt() is invoked on an unitialised cbs qdisc which is exposed
by this patch. We add an early return if the qdisc is null to address this.
This is a similar approach used by two other fixes[1][2].
The proper fix for this specific issue elucidated by sashiko is to remove
the call to qdisc_reset when qdisc_create_dflt fails. Since the dflt qdisc
isn't attached anywhere yet at that point, calling the reset callback doesn't
make much sense (and as stated has been a source of two other bugs).
We plan on submitting this fix in a later patch.
[1] https://lore.kernel.org/netdev/20221018063201.306474-2-shaozhengchao@huawei.com/
[2] https://lore.kernel.org/netdev/20221018063201.306474-4-shaozhengchao@huawei.com/
Fixes: 585d763af0 ("net/sched: Introduce Credit Based Shaper (CBS) qdisc")
Reported-by: Junyoung Jang <graypanda.inzag@gmail.com>
Tested-by: Junyoung Jang <graypanda.inzag@gmail.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The reset actions of the DEFZA adapters are exceedingly slow, taking up
to 30 seconds to complete by the device spec and typically in the range
of 10 seconds in reality, as required for the device RTOS to boot, still
quite a lot. Therefore a state machine is used that's interrupt driven,
however a safety mechanism is required in case of adapter malfunction,
so that if no state change interrupt has arrived in time, then the
situation is taken care of.
The safety mechanism depends on the origin of the reset. For regular
adapter initialisation at the device probe time a sleep is requested.
However a reset is also required by the device spec when the adapter has
transitioned into the halted state, such as in response to a PC Trace
event in the course of ring fault recovery, possibly a common network
event. In that case no sleep is possible as a device halt is reported
at the hardirq level.
A timer is therefore set up to ensure progress in case no adapter state
change interrupt has arrived in time, but as from commit 168f6b6ffb
("timers: Use del_timer_sync() even on UP") a warning is issued as the
timer is deleted in the hardirq handler upon an expected state change:
defza: v.1.1.4 Oct 6 2018 Maciej W. Rozycki
tc2: DEC FDDIcontroller 700 or 700-C at 0x18000000, irq 4
tc2: resetting the board...
------------[ cut here ]------------
WARNING: kernel/time/timer.c:1611 at __timer_delete_sync+0x104/0x120, CPU#0: swapper/0/0
Modules linked in:
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-dirty #2 VOLUNTARY
Stack : 9800000002027d08 00000000140120e0 0000000000000000 ffffffff8089d468
0000000000000000 0000000000000000 ffffffff807ed6b8 ffffffff80897458
ffffffff80897400 9800000002027b88 0000000000000000 7070617773203a6d
0000000000000000 9800000002027ba4 0000000000001000 6465746e69617420
0000000000000000 ffffffff807ed6b8 00000000140120e0 0000000000000009
000000000000064b ffffffff800dd14c 0000000000000036 9800000002184000
0000000000000000 0000000000000020 0000000000000000 ffffffff80910000
ffffffff8085c000 9800000002027c70 0000000000000001 ffffffff80045fa0
0000000000000000 0000000000000000 0000000000000000 0000000000000009
000000000000064b ffffffff800502b8 ffffffff807ed6b8 ffffffff80045fa0
...
Call Trace:
[<ffffffff800502b8>] show_stack+0x28/0xf0
[<ffffffff80045fa0>] dump_stack_lvl+0x48/0x7c
[<ffffffff80068c98>] __warn+0xa0/0x128
[<ffffffff8004120c>] warn_slowpath_fmt+0x64/0xa4
[<ffffffff800dd14c>] __timer_delete_sync+0x104/0x120
[<ffffffff804934ac>] fza_interrupt+0xc74/0xeb8
[<ffffffff800c6390>] __handle_irq_event_percpu+0x70/0x228
[<ffffffff800c6560>] handle_irq_event_percpu+0x18/0x78
[<ffffffff800cc320>] handle_percpu_irq+0x50/0x80
[<ffffffff800c5970>] generic_handle_irq+0x90/0xd0
[<ffffffff806e956c>] do_IRQ+0x1c/0x30
[<ffffffff8004ad4c>] handle_int+0x148/0x154
[<ffffffff800ab7c0>] do_idle+0x40/0x108
[<ffffffff800abb0c>] cpu_startup_entry+0x2c/0x38
[<ffffffff806dfec8>] kernel_init+0x0/0x108
---[ end trace 0000000000000000 ]---
tc2: OK
tc2: model 700 (DEFZA-AA), MMF PMD, address 08-00-2b-xx-xx-xx
tc2: ROM rev. 1.0, firmware rev. 1.2, RMC rev. A, SMT ver. 1
tc2: link unavailable
------------[ cut here ]------------
WARNING: kernel/time/timer.c:1611 at __timer_delete_sync+0x104/0x120, CPU#0: swapper/0/0
Modules linked in:
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G W 7.0.0-dirty #2 VOLUNTARY
Tainted: [W]=WARN
Stack : 9800000002027d08 00000000140120e0 0000000000000000 ffffffff8089d468
0000000000000000 0000000000000000 ffffffff807ed6b8 ffffffff80897458
ffffffff80897400 9800000002027b88 0000000000000000 0000000000000000
0000000000000000 9800000002027ba4 0000000000001000 0000000000000000
0000000000000000 ffffffff807ed6b8 00000000140120e0 0000000000000009
000000000000064b ffffffff800dd14c 0000000000000036 9800000002184000
0000000000000000 0000000000000020 0000000000000000 ffffffff80910000
ffffffff8085c000 9800000002027c70 0000000000000001 ffffffff80045fa0
0000000000000000 0000000000000000 0000000000000000 0000000000000009
000000000000064b ffffffff800502b8 ffffffff807ed6b8 ffffffff80045fa0
...
Call Trace:
[<ffffffff800502b8>] show_stack+0x28/0xf0
[<ffffffff80045fa0>] dump_stack_lvl+0x48/0x7c
[<ffffffff80068c98>] __warn+0xa0/0x128
[<ffffffff8004120c>] warn_slowpath_fmt+0x64/0xa4
[<ffffffff800dd14c>] __timer_delete_sync+0x104/0x120
[<ffffffff804934ac>] fza_interrupt+0xc74/0xeb8
[<ffffffff800c6390>] __handle_irq_event_percpu+0x70/0x228
[<ffffffff800c6560>] handle_irq_event_percpu+0x18/0x78
[<ffffffff800cc320>] handle_percpu_irq+0x50/0x80
[<ffffffff800c5970>] generic_handle_irq+0x90/0xd0
[<ffffffff806e956c>] do_IRQ+0x1c/0x30
[<ffffffff8004ad4c>] handle_int+0x148/0x154
[<ffffffff806de8a4>] arch_local_irq_disable+0x4/0x28
[<ffffffff800ab7d0>] do_idle+0x50/0x108
[<ffffffff800abb0c>] cpu_startup_entry+0x2c/0x38
[<ffffffff806dfec8>] kernel_init+0x0/0x108
---[ end trace 0000000000000000 ]---
tc2: registered as fddi0
The immediate origin of the new warning is the switch away from aliasing
del_timer_sync() to del_timer() (timer_delete_sync() to timer_delete()
in terms of current function names) for UP configurations, which however
is the only choice for this driver anyway as no SMP hardware supports
the TURBOchannel bus this device interfaces to. Therefore there is a
very remote issue only this is a sign of.
Specifically if an adapter reset issued upon a transition to the halted
state times out and first triggers fza_reset_timer() for another reset
assertion, which then schedules fza_reset_timer() for reset deassertion
and then that second call is pre-empted after poking at the hardware,
but before the timer has been rearmed and owing to high system load
causing exceedingly high scheduling latency control is not handed back
before a transition to the uninitialised state has caused the timer to
be deleted even before it has been started, then fza_reset_timer() will
be called yet again and issue another reset even though by then the
adapter has already recovered.
Prevent this situation from happening by switching to timer_delete() for
the transition to the halted state and protect the code region affected
with a spinlock, also to make sure add_timer() has not been called twice
in a row due to an execution race between the interrupt handler and the
timer handler (though it could only happen on SMP, but let's keep the
driver clean). It's a very unlikely sequence of events to happen and
therefore there's no point in trying to be overly clever about it, such
as by placing printk() calls outside the protection. For the transition
to the uninitialised state switch to timer_delete_sync_try() instead, so
that a timer isn't deleted that's just been rearmed by the timer handler
and needs to watch for the device to come out of reset again (again, an
SMP scenario only).
Retain timer_delete_sync() invocations outside the hardirq context for a
stray timer not to fire once device structures have been released.
Fixes: 61414f5ec9 ("FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When updating the driver to match latest datasheet to suspend access to
URAM when suspending DMA transfers a corner-case was missed, URAM access
will not be suspended if WoL is enabled. This lead to the error message
(correctly) being triggered as URAM access is not suspended even tho
it's requested as part of stopping DMA.
Avoid checking if URAM access is suspended and printing the error
message if WoL is enabled when we suspend the system, as we know it will
not be.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/all/CAMuHMdWnjV%3DHGE1o08zLhUfTgOSene5fYx1J5GG10mB%2BToq8qg@mail.gmail.com/
Fixes: 353d8e7989 ("net: ethernet: ravb: Suspend and resume the transmission flow")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ethnl_bitmap32_not_zero() should return true if some bit in [start, end)
is set:
- Fix inverted memchr_inv() sense: return true when the scan finds a
non-zero byte, not when the middle words are all zero.
- Return false for an empty interval (end <= start).
- When end is 32-bit aligned, indices in [start, end) do not include any
bits from map[end_word]; return false after earlier checks found no
non-zero data.
Fixes: 10b518d4e6 ("ethtool: netlink bitset handling")
Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The smc_msg_event tracepoint class, shared by smc_tx_sendmsg and
smc_rx_recvmsg, unconditionally dereferences smc->conn.lnk:
__string(name, smc->conn.lnk->ibname)
conn->lnk is only set for SMC-R; for SMC-D it is NULL. Other code on
these paths already handles this (e.g. !conn->lnk in
SMC_STAT_RMB_TX_SIZE_SMALL()). With the tracepoint enabled, the first
sendmsg()/recvmsg() on an SMC-D socket crashes:
Oops: general protection fault, probably for non-canonical address
KASAN: null-ptr-deref in range [...]
RIP: 0010:strlen+0x1e/0xa0
Call Trace:
trace_event_raw_event_smc_msg_event (net/smc/smc_tracepoint.h:44)
smc_rx_recvmsg (net/smc/smc_rx.c:515)
smc_recvmsg (net/smc/af_smc.c:2859)
__sys_recvfrom (net/socket.c:2315)
__x64_sys_recvfrom (net/socket.c:2326)
do_syscall_64
The faulting address 0x3e0 is offsetof(struct smc_link, ibname),
confirming the NULL ->lnk deref. Enabling the tracepoint requires
root, but the trigger itself is unprivileged: socket(AF_SMC, ...) has
no capability check, and SMC-D negotiation needs no admin step on
s390 or on x86 with the loopback ISM device loaded.
Log an empty device name for SMC-D instead of dereferencing NULL.
Fixes: aff3083f10 ("net/smc: Introduce tracepoints for tx and rx msg")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Sidraya Jayagond <sidraya@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A logic flaw in __smc_setsockopt() allows a local unprivileged user to
cause a Denial of Service (DoS) by holding the socket lock indefinitely.
The function __smc_setsockopt() calls copy_from_sockptr() while holding
lock_sock(sk). By passing a userfaultfd-monitored memory page (or
FUSE-backed memory on systems where unprivileged userfaultfd is disabled)
as the optval, an attacker can halt execution during the copy operation,
keeping the lock held.
Combined with asynchronous tear-down operations like shutdown(), this
exhausts the kernel wq (kworkers) and triggers the hung task watchdog.
[ 240.123456] INFO: task kworker/u8:2 blocked for more than 120 seconds.
[ 240.123489] Call Trace:
[ 240.123501] smc_shutdown+...
[ 240.123512] lock_sock_nested+...
This patch moves the user-space copy outside the lock_sock() critical
section to prevent the issue.
Fixes: a6a6fe27ba ("net/smc: Dynamic control handshake limitation by socket options")
Signed-off-by: Nicolò Coccia <n.coccia96@gmail.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Tested-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
phydev->drv can become NULL while the phy_device is still attached to
its net_device, namely after the PHY driver is unbound via sysfs:
echo <mdio_id> > /sys/bus/mdio_bus/drivers/<phy_drv>/unbind
phy_remove() clears phydev->drv but doesn't call phy_detach(), so the
phy_device stays in the link topology xarray and ethnl_req_get_phydev()
still hands it back. ETHTOOL_MSG_PHY_GET then oopses on:
rep_data->drvname = kstrdup(phydev->drv->name, GFP_KERNEL);
drvname is already treated as optional by phy_reply_size(),
phy_fill_reply() and phy_cleanup_data(), so just skip the allocation
when there is no driver bound.
Fixes: 9dd2ad5e92 ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump")
Cc: stable@vger.kernel.org # 6.13.x
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260509215046.107157-1-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The shutdown handler aq_pci_shutdown() unconditionally calls
pci_wake_from_d3(pdev, false), clearing the PCI PME_En bit even when
wake-on-LAN has been configured. While aq_nic_shutdown() correctly
programs the NIC firmware via aq_nic_set_power() to listen for magic
packets, the PCI subsystem will not propagate the resulting PME wake
event from D3, so the system never wakes after poweroff.
WOL from suspend (S3) is unaffected because aq_suspend_common() does
not touch pci_wake_from_d3() and relies on the PM core's wake
configuration via device_may_wakeup().
This affects all atlantic-supported NICs (AQC107/108/111/112/113);
users have reported that WOL works if the atlantic driver is never
loaded, but breaks once it has run its shutdown path.
Pass the configured WOL state to pci_wake_from_d3() instead of a
literal false, so the PCI PME_En bit is preserved when the user has
armed WOL via ethtool.
Fixes: 90869ddfef ("net: aquantia: Implement pci shutdown callback")
Cc: stable@vger.kernel.org
Signed-off-by: Zoran Ilievski <goodboy@rexbytes.com>
Reviewed-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260511064002.1857-1-goodboy@rexbytes.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski says:
====================
net: shaper: fix various minor bugs
Fix various minor bugs in the net shaper API.
First 2 patches deal with ordering issues around inserting
and publishing new shapers. Shapers are inserted "tentatively"
and marked valid only after HW op succeeded, this used to
be slightly racy.
Only other patch of note is patch 8. We want to add a Netlink
policy check on the handle ID. This necessitates patch 7.
The rest are simple and self-explanatory.
v1: https://lore.kernel.org/20260506000628.1501691-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20260510192904.3987113-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
net_shaper_parse_handle() does not enforce that the user provides
the handle ID. For NODE the ID defaults to UNSPEC for all other
cases it defaults to 0.
For NETDEV 0 is the only option. For QUEUE defaulting to 0 makes
less intuitive sense. Specifically because the behavior should
(IMHO) be the same for all cases where there may be more than
one ID (QUEUE and NODE).
We should either document this as intentional or reject.
I picked the latter with no strong conviction.
Fixes: 4b623f9f0f ("net-shapers: implement NL get operation")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20260510192904.3987113-11-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The NETDEV scope represents a singleton root shaper in the per-device
hierarchy. All code assumes NETDEV shapers have id 0:
net_shaper_default_parent() hardcodes parent->id = 0 when returning
the NETDEV parent for QUEUE/NODE children, and the UAPI documentation
describes NETDEV scope as "the main shaper" (singular, not plural).
Make sure we reject non-0 IDs.
Fixes: 4b623f9f0f ("net-shapers: implement NL get operation")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20260510192904.3987113-10-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
net_shaper_parse_handle() reads the user-supplied handle ID via
nla_get_u32(), accepting the full u32 range. However, the xarray key
is built by net_shaper_handle_to_index() using
FIELD_PREP(NET_SHAPER_ID_MASK, handle->id), where NET_SHAPER_ID_MASK
is GENMASK(25, 0) - only 26 bits wide. FIELD_PREP silently masks off
the upper bits at runtime. A user-supplied NODE id like 0x04000123
becomes id 0x123.
Additionally, a user-supplied id equal to NET_SHAPER_ID_UNSPEC
(0x03FFFFFF, which is NET_SHAPER_ID_MASK itself) would collide with
the sentinel used internally by the group operation to signal
"allocate a new NODE id".
Reject user-supplied IDs >= NET_SHAPER_ID_MASK (i.e., >= 0x03FFFFFF)
in the policy.
Fixes: 4b623f9f0f ("net-shapers: implement NL get operation")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20260510192904.3987113-9-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Using definitions in kernel policies is awkward right now.
On one hand we want defines for max values and such.
On the other we don't have a way of adding kernel-only defines.
Adding unnecessary defines to uAPI is a bad idea, we won't
be able to delete them. And when it comes to policy user
space should just query it via the policy dump, not use
hard coded defines.
Add a "scope" property to definitions, which will let us tell
the codegen that a definition is for kernel use only. Support
following values:
- uapi: render into the uAPI header (default, today's behavior)
- kernel: render to kernel header only
- user: same as kernel but for the user-side generated header
Definitions may have a header property (definition is "external",
provided by existing header). Extend the scope to headers, too.
If definition has both scope and header properties we will only
generate the includes in the right scope.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20260510192904.3987113-8-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
net_shaper_group_send_reply() writes both the NET_SHAPER_A_IFINDEX
attribute (via net_shaper_fill_binding()) and the nested
NET_SHAPER_A_HANDLE attribute (via net_shaper_fill_handle()), but
the reply skb at the call site in net_shaper_nl_group_doit() is
allocated using net_shaper_handle_size(), which only accounts for
the nested handle.
The allocation is therefore short by nla_total_size(sizeof(u32))
(8 bytes) for the IFINDEX attribute. In practice the slab allocator
rounds up the small allocation so the bug is latent, but the size
accounting is wrong and could bite if the reply grew further.
Introduce net_shaper_group_reply_size() that accounts for the full
reply payload and use it both at the genlmsg_new() call site and in
the defensive WARN_ONCE message.
Fixes: 5d5d4700e7 ("net-shapers: implement NL group operation")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20260510192904.3987113-7-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
net_shaper_nl_group_doit() does not deduplicate NET_SHAPER_A_LEAVES
entries. When userspace supplies the same leaf handle twice, the same
old-parent pointer lands twice in old_nodes[]. The cleanup loop double
frees the parent. Of course the same parent may still be in old_nodes[]
twice if we are moving multiple of its leaves.
Note that this patch also implicitly fixes the fact that the
i >= leaves_count path forgets to set ret.
Fixes: 5d5d4700e7 ("net-shapers: implement NL group operation")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20260510192904.3987113-4-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The usual way of inserting entries which are not yet fully ready
into XArray is to have a VALID flag. The shaper code has a NOT_VALID
flag. Since XArray code does not let us create entries with marks
already set - the creation of entries is currently not atomic.
Flip the polarity of the VALID flag. This closes the tiny race
in net_shaper_pre_insert() of entries being created without
the NOT_VALID flag.
Fixes: 93954b40f6 ("net-shapers: implement NL set and delete operations")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20260510192904.3987113-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The gmac_rx() NAPI poll function assembles packets in an
SKB from a ring buffer.
If the ring buffer gets completely emptied during a poll cycle,
we exit gmac_rx(), but the packet is not yet completely
assembled in the SKB, yet the fragment counter frag_nr is
reset to zero on the next invocation.
Solve this by making the RX fragment counter a part of the
port struct, and carry it over between invocations.
Reset the fragment counter only right after calling
napi_gro_frags(), on error (after calling napi_free_frags())
or if stopping the port.
Reset it in some place where not strictly necessary just to
emphasize what is going on.
This was found by Sashiko during normal patch review.
Fixes: 4d5ae32f5e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Link: https://sashiko.dev/#/patchset/20260505-gemini-ethernet-fix-v2-1-997c31d06079%40kernel.org
Signed-off-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260509-gemini-ethernet-fixes-v1-3-6c5d20ddc35b@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The SKB used to assemble packets from fragments in gmac_rx()
is static local, but the Gemini has two ethernet ports, meaning
there can be races between the ports on a bad day if a device
is using both.
Make the RX SKB a per-port variable and carry it over between
invocations in the port struct instead.
Zero the pointer once we call napi_gro_frags(), on error (after
calling napi_free_frags()) or if the port is stopped.
Zero it in some place where not strictly necessary just to
emphasize what is going on.
This was found by Sashiko during normal patch review.
Fixes: 4d5ae32f5e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Link: https://sashiko.dev/#/patchset/20260505-gemini-ethernet-fix-v2-1-997c31d06079%40kernel.org
Signed-off-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260509-gemini-ethernet-fixes-v1-2-6c5d20ddc35b@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stefano Garzarella says:
====================
vsock/virtio: fix vsockmon tap skb construction
While reviewing the patch posted by Yiqi Sun [1] to fix an issue in
virtio_transport_build_skb(), I discovered another issue related to
the offset and length of the payload to be copied in the new skb.
This was introduced when we did the skb conversion, and fixed by
patch 1.
Patch 2 fixes the issue found by Yiqi Sun in a different way: using
iov_iter_kvec() to properly initialize all the iov_iter fields and
removing the linear vs non-linear split like we alredy do in
vhost-vsock.
It could have been a single patch, but since there were two affected
commits, I decided to keep the fixes separate.
[1] https://lore.kernel.org/netdev/20260430071110.380509-1-sunyiqixm@gmail.com/
====================
Link: https://patch.msgid.link/20260508164411.261440-1-sgarzare@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
For non-linear skbs, virtio_transport_build_skb() goes through
virtio_transport_copy_nonlinear_skb() to copy the original payload
in the new skb to be delivered to the vsockmon tap device.
This manually initializes an iov_iter but does not set iov_iter.count.
Since the iov_iter is zero-initialized, the copy length is zero and no
payload is actually copied to the monitor interface, leaving data
un-initialized.
Fix this by removing the linear vs non-linear split and using
skb_copy_datagram_iter() with iov_iter_kvec() for all cases, as
vhost-vsock already does. This handles both linear and non-linear skbs,
properly initializes the iov_iter, and removes the now unused
virtio_transport_copy_nonlinear_skb().
While touching this code, let's also check the return value of
skb_copy_datagram_iter(), even though it's unlikely to fail.
Fixes: 4b0bf10eb0 ("vsock/virtio: non-linear skb handling for tap")
Reported-by: Yiqi Sun <sunyiqixm@gmail.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Arseniy Krasnov <avkrasnov@rulkc.org>
Link: https://patch.msgid.link/20260508164411.261440-3-sgarzare@redhat.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
virtio_transport_build_skb() builds a new skb to be delivered to the
vsockmon tap device. To build the new skb, it uses the original skb
data length as payload length, but as the comment notes, the original
packet stored in the skb may have been split in multiple packets, so we
need to use the length in the header, which is correctly updated before
the packet is delivered to the tap, and the offset for the data.
This was also similar to what we did before commit 71dc9ec9ac
("virtio/vsock: replace virtio_vsock_pkt with sk_buff") where we probably
missed something during the skb conversion.
Also update the comment above, which was left stale by the skb
conversion and still mentioned a buffer pointer that no longer exists.
Fixes: 71dc9ec9ac ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Arseniy Krasnov <avkrasnov@rulkc.org>
Link: https://patch.msgid.link/20260508164411.261440-2-sgarzare@redhat.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In the HSR (High-availability Seamless Redundancy) protocol, node
information is maintained in the node_db. When a supervision frame is
received, node->addr_B_port is updated to track the receiving port type
(e.g., HSR_PT_SLAVE_B).
If the underlying physical interface associated with this slave port is
removed (e.g., via `ip link del`), hsr_del_port() frees the hsr_port
object. However, the stale node->addr_B_port reference is kept in the
node_db until the node ages out.
Subsequently, if userspace queries the node status via the Netlink
command HSR_C_GET_NODE_STATUS, the kernel calls hsr_get_node_data().
This function unconditionally dereferences the pointer returned by
hsr_port_get_hsr():
if (node->addr_B_port != HSR_PT_NONE) {
port = hsr_port_get_hsr(hsr, node->addr_B_port);
*addr_b_ifindex = port->dev->ifindex; // <-- NULL deref
}
If the slave port has been deleted, hsr_port_get_hsr() returns NULL,
resulting in a kernel panic.
Oops: general protection fault, probably for non-canonical address
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
RIP: 0010:hsr_get_node_data+0x7b6/0x9e0
Call Trace:
<TASK>
hsr_get_node_status+0x445/0xa40
Fix this by adding a proper NULL pointer check. If the port lookup fails
due to a stale port type, gracefully treat it as if no valid port exists
and assign -1 to the interface index.
Steps to reproduce:
1. Create an HSR interface with two slave devices.
2. Receive a supervision frame to populate node_db with
addr_B_port assigned to SLAVE_B.
3. Delete the underlying slave device B.
4. Send an HSR_C_GET_NODE_STATUS Netlink message.
Fixes: c5a7591172 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
Signed-off-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Link: https://patch.msgid.link/20260508124636.1462346-1-2022090917019@std.uestc.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In qed_init_wfq_param(), variable non_requested_count can become zero
when the number of vports with the configured flag set (including the
current vport being configured) equals total num_vports. This happens
when configuring the last unconfigured vport or when re-configuring
an already configured vport.
The function then calculates left_rate_per_vp = total_left_rate /
non_requested_count, which causes division by zero.
Fix this by skipping the division when non_requested_count is zero.
In that case, there is no remaining bandwidth to distribute, so just
record the configuration for the current vport and return success.
Fixes: bcd197c81f ("qed: Add vport WFQ configuration APIs")
Signed-off-by: Evgenii Burenchev <evg28bur@yandex.ru>
Link: https://patch.msgid.link/20260507145520.23106-1-evg28bur@yandex.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
lockdep_sock_is_held() was added in tcp_ao_established_key()
by the cited commit.
It can be called from tcp_v[46]_timewait_ack() with twsk.
Since it does not have sk->sk_lock, the lockdep annotation
results in out-of-bound access.
$ pahole -C tcp_timewait_sock vmlinux | grep size
/* size: 288, cachelines: 5, members: 8 */
$ pahole -C sock vmlinux | grep sk_lock
socket_lock_t sk_lock; /* 440 192 */
Let's not use lockdep_sock_is_held() for TCP_TIME_WAIT.
Fixes: 6b2d11e2d8 ("net/tcp: Add missing lockdep annotations for TCP-AO hlist traversals")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260508120853.4098365-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ena_phc_gettimex64() is setting the output parameter regardless
of whether ena_com_phc_get_timestamp() succeeded or failed.
When ena_com_phc_get_timestamp() returns an error, the timestamp
parameter may contain uninitialized stack memory (e.g., when PHC is
disabled or in blocked state) or invalid hardware values. Passing
these to userspace via the PTP ioctl is both a security issue
(information leak) and a correctness bug.
Fix by checking the return code after releasing the lock and only
setting the output timestamp on success.
Fixes: e0ea34158e ("net: ena: Add PHC support in the ENA driver")
Cc: stable@vger.kernel.org
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260507003518.22554-1-akiyano@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When iov_iter_get_pages2() fails in rds_message_zcopy_from_user(),
the pinned pages are released with put_page(), and
rm->data.op_mmp_znotifier is cleared. But we fail to properly
clear rm->data.op_nents.
Later when rds_message_purge() is called from rds_sendmsg() the
cleanup loop iterates over the incorrectly non zero number of
op_nents and frees them again.
Fix this by properly resetting op_nents when it should be in
rds_message_zcopy_from_user().
Fixes: 0cebaccef3 ("rds: zerocopy Tx support.")
Signed-off-by: Allison Henderson <achender@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260505234336.2132721-1-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- fix integer overflow on buff_pos, by Lyes Bourennani
- fix invalid tp_meter access during teardown, by Jiexun Wang (2 patches)
- stop caching unowned originator pointers in BAT IV, by Jiexun Wang
- tp_meter: fix tp_num leak on kmalloc failure, by Sven Eckelmann
- fix BLA refcounting issues, by Sven Eckelmann (3 patches)
* tag 'batadv-net-pullrequest-20260508' of https://git.open-mesh.org/batadv:
batman-adv: bla: put backbone reference on failed claim hash insert
batman-adv: bla: only purge non-released claims
batman-adv: bla: prevent use-after-free when deleting claims
batman-adv: tp_meter: fix tp_num leak on kmalloc failure
batman-adv: stop caching unowned originator pointers in BAT IV
batman-adv: stop tp_meter sessions during mesh teardown
batman-adv: reject new tp_meter sessions during teardown
batman-adv: fix integer overflow on buff_pos
====================
Link: https://patch.msgid.link/20260508154314.12817-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the phc->active check and resp pointer assignment to after
acquiring the spinlock. Previously, phc->active was checked without
holding the lock, and resp was cached from ena_dev->phc.virt_addr
before the lock was acquired.
If ena_com_phc_destroy() runs between the lockless active check and
the lock acquisition, it sets active=false, releases the lock, frees
the DMA memory, and sets virt_addr=NULL. The get_timestamp path would
then read a NULL virt_addr and dereference it.
With both the active check and the pointer read under the lock,
destroy cannot free the memory while get_timestamp is using it.
Fixes: e0ea34158e ("net: ena: Add PHC support in the ENA driver")
Cc: stable@vger.kernel.org
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260508062126.7273-1-akiyano@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ethtool.h includes linux/typelimits.h which is a relatively new header
not yet shipped in most distro kernel-header packages. Without the
explicit entry, the build silently falls through to -idirafter.
dev_energymodel.h is a new YNL family whose uapi header is not in
system paths at all and was missing a CFLAGS entry entirely.
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260508204114.205896-2-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the device is removed all allocated resources should be freed.
In uhdlc_memclean the netdev transmit queue was already stopped. But at
this point we may have pending skb in the transmit queue which must be
freed. Therefore iterate over the tx_skbuff pointers and free all
pending skb. The issue was discovered by sashiko.
Tested on a ls1043a board running HDLC in bus mode on kernel 6.12.
https: //sashiko.dev/#/patchset/20260429114208.941011-1-holger.brunck%40hitachienergy.com
Fixes: c19b6d246a ("drivers/net: support hdlc function for QE-UCC")
Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com>
Link: https://patch.msgid.link/20260507155332.3452319-1-holger.brunck@hitachienergy.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains Netfilter fixes for net:
1) Allow initial x_tables table replacement without emitting an audit
log message. Delay the register message until after hooks are wired up
to avoid unnecessary unregister logs during error unwinding.
2) Fix a NULL dereference by allocating hook ops before adding the
table to the per-netns list. Use `synchronize_rcu()` during error
unwinding to ensure the table stops processing packets before
teardown. Defer audit log register message until all operations
succeed.
3) Refactor xtables to use a single `xt_unregister_table_pre_exit`
function. Eliminate code duplication by centralizing table
unregistration logic within the xtables core. ebtables cannot be
changed due to incompatibility.
4) Unregister xtables templates before module removal. This prevents
a race condition where userspace instantiates a new table after the
pernet unreg removed the current table.
5) Add `xtables_unregister_table_exit` to fully unregister netfilter
tables during module removal. Unlink the table from dying lists,
then free hook operations.
6) Implement a two-stage removal scheme for ebtables following the
x_tables pattern. Assign table->ops while holding the ebt mutex to
prevent exposing partially-filled structures.
7) Fix ebtables module initialization race. Register the template last
in table initialization functions. Prevent table instantiation before
pernet operations are available.
8) Fix a race condition in x_tables module initialization. Ensure
pernet ops are fully set up before exposing the table to userspace.
9) Fix a race condition in ebtables module initialization, similar to
previous patch.
10) Restore propagation of helper to expected connection, this is a
fix-for-recent-fix.
11) Validate that the expectation tuple and mask netlink attributes are
present when adding expectation via nfqueue, this fixes a possible
null-ptr-deref.
12) Fix possible rare memleak in the SIP helper in case helper has been
detached from conntrack entry, from Li Xiasong.
13) Fix refcount leak in nft_ct when creating custom expectation, also
from Li Xiason.
Patches 1-9 from Florian Westphal.
10) Restore propagation of helper to expected connection, this is a
fix-for-recent-fix.
11) Check that tuple and mask netlink attributes are set when creating an
expectation via nfqueue.
* tag 'nf-26-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_ct: fix missing expect put in obj eval
netfilter: nf_conntrack_sip: get helper before allocating expectation
netfilter: ctnetlink: check tuple and mask in expectations created via nfqueue
netfilter: nf_conntrack_expect: restore helper propagation via expectation
netfilter: bridge: eb_tables: close module init race
netfilter: x_tables: close dangling table module init race
netfilter: ebtables: close dangling table module init race
netfilter: ebtables: move to two-stage removal scheme
netfilter: x_tables: add and use xtables_unregister_table_exit
netfilter: x_tables: unregister the templates first
netfilter: x_tables: add and use xt_unregister_table_pre_exit
netfilter: x_tables: allocate hook ops while under mutex
netfilter: x_tables: allow initial table replace without emitting audit log message
====================
Link: https://patch.msgid.link/20260507234509.603182-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The SCTP_SENDALL path in sctp_sendmsg() iterates ep->asocs with
list_for_each_entry_safe(), which caches the next entry in @tmp before
the loop body runs. The body calls sctp_sendmsg_to_asoc(), which may
drop the socket lock inside sctp_wait_for_sndbuf().
While the lock is dropped, another thread can SCTP_SOCKOPT_PEELOFF the
association cached in @tmp, migrating it to a new endpoint via
sctp_sock_migrate() (list_del_init() + list_add_tail() to
newep->asocs), and optionally close the new socket which frees the
association via kfree_rcu(). The cached @tmp can also be freed by a
network ABORT for that association, processed in softirq while the
lock is dropped.
sctp_wait_for_sndbuf() revalidates @asoc (the current entry) on re-lock
via the "sk != asoc->base.sk" and "asoc->base.dead" checks, but nothing
revalidates @tmp. After a successful return, the iterator advances to
the stale @tmp, yielding either a use-after-free (if the peeled socket
was closed) or a list-walk onto the new endpoint's list head (type
confusion of &newep->asocs as a struct sctp_association *).
Both are reachable from CapEff=0; the type-confusion path gives
controlled indirect call via the outqueue.sched->init_sid pointer.
Fix by re-deriving @tmp from @asoc after sctp_sendmsg_to_asoc()
returns. @asoc is known to still be on ep->asocs at that point: the
only callers that list_del an association from ep->asocs are
sctp_association_free() (which sets asoc->base.dead) and
sctp_assoc_migrate() (which changes asoc->base.sk), and
sctp_wait_for_sndbuf() checks both under the lock before any
successful return; a tripped check propagates as err < 0 and the loop
bails before the re-derive.
The SCTP_ABORT path in sctp_sendmsg_check_sflags() returns 0 and the
loop hits 'continue' before sctp_sendmsg_to_asoc() is ever called, so
the @tmp cached by list_for_each_entry_safe() still covers the
lock-held free that ba59fb0273 ("sctp: walk the list of asoc
safely") was added for.
Fixes: 4910280503 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg")
Cc: stable@vger.kernel.org
Signed-off-by: Ben Morris <bmorris@anthropic.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20260508001455.3137-1-joycathacker@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
lan966x_probe_port() stores the newly allocated net_device in the
port before calling register_netdev(). If register_netdev() fails,
the probe error path calls lan966x_cleanup_ports(), which sees
port->dev and calls unregister_netdev() for a device that was never
registered.
Destroy the phylink instance created for this port and clear port->dev
before returning the registration error. The common cleanup path now skips
ports without port->dev before reaching the registered netdev cleanup, so
it only handles ports that reached the registered-netdev lifetime.
This also avoids treating an uninitialized FDMA netdev and the failed port
as a NULL == NULL match in the common cleanup path.
Fixes: d28d6d2e37 ("net: lan966x: add port module support")
Co-developed-by: Ijae Kim <ae878000@gmail.com>
Signed-off-by: Ijae Kim <ae878000@gmail.com>
Signed-off-by: Myeonghun Pak <mhun512@gmail.com>
Link: https://patch.msgid.link/20260506124331.31945-1-mhun512@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller says:
====================
Intel Wired LAN Driver Updates 2026-05-04 (i40e, ice, idpf)
Matt Volrath fixes two issues with the i40e driver probe routine, ensuring
that PTP is properly cleaned up if the probe fails.
Emil corrects the initialization of the read_dev_clk_lock spinlock in
idpf_ptp_init, ensuring it is initialized prior to when the
ptp_schedule_worker() is called.
Greg KH fixes a double free and use-after free in the idpf auxiliary device
error paths.
Marcin fixes ice_set_rss_hfunc() to use the correct q_opt_flags field,
correcting the assignment and preventing submission of invalid data to the
firmware.
Bart corrects the locking in ice_dcb_rebuild(), ensuring that the tc_mutex
is held over the entire operation.
Ivan fixes the rclk pin state get for E810 devices, ensuring the index is
properly offset by the base_rclk_idx value. This ensures that the correct
pin index is used to look up recovered clock state. He additionally adds
bounds checking to prevent attempting to access pins outside of the pin
state array.
Ivan also moves the CGU register macros to the top of ice_dpll.h, inside
the header guard to avoid duplicate macro definitions should the ice_dpll.h
header is included multiple times.
====================
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-0-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The refactoring of ice_dpll_rclk_state_on_pin_get() to use
ice_dpll_pin_get_parent_idx() omitted the base_rclk_idx adjustment that was
correctly added in the ice_dpll_rclk_state_on_pin_set() path. This breaks
E810 devices where base_rclk_idx is non-zero, causing the wrong hardware
index to be used for pin state lookup and incorrect recovered clock state
to be reported via the DPLL subsystem. E825C is unaffected as its
base_rclk_idx is 0.
While at it, add bounds check against ICE_DPLL_RCLK_NUM_MAX on hw_idx after
the base_rclk_idx subtraction in both ice_dpll_rclk_state_on_pin_{get,set}()
to prevent out-of-bounds access on the pin state array.
Fixes: ad1df4f2d5 ("ice: dpll: Support E825-C SyncE and dynamic pin discovery")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-7-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ice_set_rss_hfunc() performs a VSI update, in which it sets hashing
function, leaving other VSI options unchanged. However, ::q_opt_flags is
mistakenly set to the value of another field, instead of its original
value, probably due to a typo. What happens next is hardware-dependent:
On E810, only the first bit is meaningful (see
ICE_AQ_VSI_Q_OPT_PE_FLTR_EN) and can potentially end up in a different
state than before VSI update.
On E830, some of the remaining bits are not reserved. Setting them
to some unrelated values can cause the firmware to reject the update
because of invalid settings, or worse - succeed.
Reproducer:
sudo ethtool -X $PF1 equal 8
Output in dmesg:
Failed to configure RSS hash for VSI 6, error -5
Fixes: 352e9bf238 ("ice: enable symmetric-xor RSS for Toeplitz hash function")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-5-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When auxiliary_device_add() fails in idpf_plug_vport_aux_dev() or
idpf_plug_core_aux_dev(), the err_aux_dev_add label calls
auxiliary_device_uninit() and falls through to err_aux_dev_init. The
uninit call will trigger put_device(), which invokes the release
callback (idpf_vport_adev_release / idpf_core_adev_release) that frees
iadev. The fall-through then reads adev->id from the freed iadev for
ida_free() and double-frees iadev with kfree().
Free the IDA slot and clear the back-pointer before uninit, while adev
is still valid, then return immediately.
Commit 65637c3a18 ("idpf: fix UAF in RDMA core aux dev deinitialization")
fixed the same use-after-free in the matching unplug path in this file but
missed both probe error paths.
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: stable@kernel.org
Fixes: be91128c57 ("idpf: implement RDMA vport auxiliary dev create, init, and destroy")
Fixes: f4312e6bfa ("idpf: implement core RDMA auxiliary dev create, init, and destroy")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-4-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>