Commit Graph

1383398 Commits

Author SHA1 Message Date
Vincent Mailhol
17c8d79452 can: mcba_usb: populate ndo_change_mtu() to prevent buffer overflow
Sending an PF_PACKET allows to bypass the CAN framework logic and to
directly reach the xmit() function of a CAN driver. The only check
which is performed by the PF_PACKET framework is to make sure that
skb->len fits the interface's MTU.

Unfortunately, because the mcba_usb driver does not populate its
net_device_ops->ndo_change_mtu(), it is possible for an attacker to
configure an invalid MTU by doing, for example:

  $ ip link set can0 mtu 9999

After doing so, the attacker could open a PF_PACKET socket using the
ETH_P_CANXL protocol:

	socket(PF_PACKET, SOCK_RAW, htons(ETH_P_CANXL))

to inject a malicious CAN XL frames. For example:

	struct canxl_frame frame = {
		.flags = 0xff,
		.len = 2048,
	};

The CAN drivers' xmit() function are calling can_dev_dropped_skb() to
check that the skb is valid, unfortunately under above conditions, the
malicious packet is able to go through can_dev_dropped_skb() checks:

  1. the skb->protocol is set to ETH_P_CANXL which is valid (the
     function does not check the actual device capabilities).

  2. the length is a valid CAN XL length.

And so, mcba_usb_start_xmit() receives a CAN XL frame which it is not
able to correctly handle and will thus misinterpret it as a CAN frame.

This can result in a buffer overflow. The driver will consume cf->len
as-is with no further checks on these lines:

	usb_msg.dlc = cf->len;

	memcpy(usb_msg.data, cf->data, usb_msg.dlc);

Here, cf->len corresponds to the flags field of the CAN XL frame. In
our previous example, we set canxl_frame->flags to 0xff. Because the
maximum expected length is 8, a buffer overflow of 247 bytes occurs!

Populate net_device_ops->ndo_change_mtu() to ensure that the
interface's MTU can not be set to anything bigger than CAN_MTU. By
fixing the root cause, this prevents the buffer overflow.

Fixes: 51f3baad7d ("can: mcba_usb: Add support for Microchip CAN BUS Analyzer")
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-4-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-09-19 19:03:01 +02:00
Vincent Mailhol
61da0bd410 can: sun4i_can: populate ndo_change_mtu() to prevent buffer overflow
Sending an PF_PACKET allows to bypass the CAN framework logic and to
directly reach the xmit() function of a CAN driver. The only check
which is performed by the PF_PACKET framework is to make sure that
skb->len fits the interface's MTU.

Unfortunately, because the sun4i_can driver does not populate its
net_device_ops->ndo_change_mtu(), it is possible for an attacker to
configure an invalid MTU by doing, for example:

  $ ip link set can0 mtu 9999

After doing so, the attacker could open a PF_PACKET socket using the
ETH_P_CANXL protocol:

	socket(PF_PACKET, SOCK_RAW, htons(ETH_P_CANXL))

to inject a malicious CAN XL frames. For example:

	struct canxl_frame frame = {
		.flags = 0xff,
		.len = 2048,
	};

The CAN drivers' xmit() function are calling can_dev_dropped_skb() to
check that the skb is valid, unfortunately under above conditions, the
malicious packet is able to go through can_dev_dropped_skb() checks:

  1. the skb->protocol is set to ETH_P_CANXL which is valid (the
     function does not check the actual device capabilities).

  2. the length is a valid CAN XL length.

And so, sun4ican_start_xmit() receives a CAN XL frame which it is not
able to correctly handle and will thus misinterpret it as a CAN frame.

This can result in a buffer overflow. The driver will consume cf->len
as-is with no further checks on this line:

	dlc = cf->len;

Here, cf->len corresponds to the flags field of the CAN XL frame. In
our previous example, we set canxl_frame->flags to 0xff. Because the
maximum expected length is 8, a buffer overflow of 247 bytes occurs a
couple line below when doing:

	for (i = 0; i < dlc; i++)
		writel(cf->data[i], priv->base + (dreg + i * 4));

Populate net_device_ops->ndo_change_mtu() to ensure that the
interface's MTU can not be set to anything bigger than CAN_MTU. By
fixing the root cause, this prevents the buffer overflow.

Fixes: 0738eff14d ("can: Allwinner A10/A20 CAN Controller support - Kernel module")
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-3-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-09-19 19:03:01 +02:00
Vincent Mailhol
ac1c7656fa can: hi311x: populate ndo_change_mtu() to prevent buffer overflow
Sending an PF_PACKET allows to bypass the CAN framework logic and to
directly reach the xmit() function of a CAN driver. The only check
which is performed by the PF_PACKET framework is to make sure that
skb->len fits the interface's MTU.

Unfortunately, because the sun4i_can driver does not populate its
net_device_ops->ndo_change_mtu(), it is possible for an attacker to
configure an invalid MTU by doing, for example:

  $ ip link set can0 mtu 9999

After doing so, the attacker could open a PF_PACKET socket using the
ETH_P_CANXL protocol:

	socket(PF_PACKET, SOCK_RAW, htons(ETH_P_CANXL))

to inject a malicious CAN XL frames. For example:

	struct canxl_frame frame = {
		.flags = 0xff,
		.len = 2048,
	};

The CAN drivers' xmit() function are calling can_dev_dropped_skb() to
check that the skb is valid, unfortunately under above conditions, the
malicious packet is able to go through can_dev_dropped_skb() checks:

  1. the skb->protocol is set to ETH_P_CANXL which is valid (the
     function does not check the actual device capabilities).

  2. the length is a valid CAN XL length.

And so, hi3110_hard_start_xmit() receives a CAN XL frame which it is
not able to correctly handle and will thus misinterpret it as a CAN
frame. The driver will consume frame->len as-is with no further
checks.

This can result in a buffer overflow later on in hi3110_hw_tx() on
this line:

	memcpy(buf + HI3110_FIFO_EXT_DATA_OFF,
	       frame->data, frame->len);

Here, frame->len corresponds to the flags field of the CAN XL frame.
In our previous example, we set canxl_frame->flags to 0xff. Because
the maximum expected length is 8, a buffer overflow of 247 bytes
occurs!

Populate net_device_ops->ndo_change_mtu() to ensure that the
interface's MTU can not be set to anything bigger than CAN_MTU. By
fixing the root cause, this prevents the buffer overflow.

Fixes: 57e83fb9b7 ("can: hi311x: Add Holt HI-311x CAN driver")
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-2-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-09-19 19:03:01 +02:00
Vincent Mailhol
38c0abad45 can: etas_es58x: populate ndo_change_mtu() to prevent buffer overflow
Sending an PF_PACKET allows to bypass the CAN framework logic and to
directly reach the xmit() function of a CAN driver. The only check
which is performed by the PF_PACKET framework is to make sure that
skb->len fits the interface's MTU.

Unfortunately, because the etas_es58x driver does not populate its
net_device_ops->ndo_change_mtu(), it is possible for an attacker to
configure an invalid MTU by doing, for example:

  $ ip link set can0 mtu 9999

After doing so, the attacker could open a PF_PACKET socket using the
ETH_P_CANXL protocol:

	socket(PF_PACKET, SOCK_RAW, htons(ETH_P_CANXL));

to inject a malicious CAN XL frames. For example:

	struct canxl_frame frame = {
		.flags = 0xff,
		.len = 2048,
	};

The CAN drivers' xmit() function are calling can_dev_dropped_skb() to
check that the skb is valid, unfortunately under above conditions, the
malicious packet is able to go through can_dev_dropped_skb() checks:

  1. the skb->protocol is set to ETH_P_CANXL which is valid (the
     function does not check the actual device capabilities).

  2. the length is a valid CAN XL length.

And so, es58x_start_xmit() receives a CAN XL frame which it is not
able to correctly handle and will thus misinterpret it as a CAN(FD)
frame.

This can result in a buffer overflow. For example, using the es581.4
variant, the frame will be dispatched to es581_4_tx_can_msg(), go
through the last check at the beginning of this function:

	if (can_is_canfd_skb(skb))
		return -EMSGSIZE;

and reach this line:

	memcpy(tx_can_msg->data, cf->data, cf->len);

Here, cf->len corresponds to the flags field of the CAN XL frame. In
our previous example, we set canxl_frame->flags to 0xff. Because the
maximum expected length is 8, a buffer overflow of 247 bytes occurs!

Populate net_device_ops->ndo_change_mtu() to ensure that the
interface's MTU can not be set to anything bigger than CAN_MTU or
CANFD_MTU (depending on the device capabilities). By fixing the root
cause, this prevents the buffer overflow.

Fixes: 8537257874 ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces")
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-1-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-09-19 19:03:01 +02:00
Duy Nguyen
5cff263606 can: rcar_canfd: Fix controller mode setting
Driver configures register to choose controller mode before
setting all channels to reset mode leading to failure.
The patch corrects operation of mode setting.

Signed-off-by: Duy Nguyen <duy.nguyen.rh@renesas.com>
Signed-off-by: Tranh Ha <tranh.ha.xb@renesas.com>
Link: https://patch.msgid.link/TYWPR01MB87434739F83E27EDCD23DF44B416A@TYWPR01MB8743.jpnprd01.prod.outlook.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-09-19 18:59:15 +02:00
Chen Yufeng
6b69680847 can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled
This issue is similar to the vulnerability in the `mcp251x` driver,
which was fixed in commit 03c427147b ("can: mcp251x: fix resume from
sleep before interface was brought up").

In the `hi311x` driver, when the device resumes from sleep, the driver
schedules `priv->restart_work`. However, if the network interface was
not previously enabled, the `priv->wq` (workqueue) is not allocated and
initialized, leading to a null pointer dereference.

To fix this, we move the allocation and initialization of the workqueue
from the `hi3110_open` function to the `hi3110_can_probe` function.
This ensures that the workqueue is properly initialized before it is
used during device resume. And added logic to destroy the workqueue
in the error handling paths of `hi3110_can_probe` and in the
`hi3110_can_remove` function to prevent resource leaks.

Signed-off-by: Chen Yufeng <chenyufeng@iie.ac.cn>
Link: https://patch.msgid.link/20250911150820.250-1-chenyufeng@iie.ac.cn
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-09-19 18:59:15 +02:00
Linus Torvalds
cbf658dd09 Merge tag 'net-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless. No known regressions at this point.

  Current release - fix to a fix:

   - eth: Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"

   - wifi: iwlwifi: pcie: fix byte count table for 7000/8000 devices

   - net: clear sk->sk_ino in sk_set_socket(sk, NULL), fix CRIU

  Previous releases - regressions:

   - bonding: set random address only when slaves already exist

   - rxrpc: fix untrusted unsigned subtract

   - eth:
       - ice: fix Rx page leak on multi-buffer frames
       - mlx5: don't return mlx5_link_info table when speed is unknown

  Previous releases - always broken:

   - tls: make sure to abort the stream if headers are bogus

   - tcp: fix null-deref when using TCP-AO with TCP_REPAIR

   - dpll: fix skipping last entry in clock quality level reporting

   - eth: qed: don't collect too many protection override GRC elements,
     fix memory corruption"

* tag 'net-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
  octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
  cnic: Fix use-after-free bugs in cnic_delete_task
  devlink rate: Remove unnecessary 'static' from a couple places
  MAINTAINERS: update sundance entry
  net: liquidio: fix overflow in octeon_init_instr_queue()
  net: clear sk->sk_ino in sk_set_socket(sk, NULL)
  Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
  selftests: tls: test skb copy under mem pressure and OOB
  tls: make sure to abort the stream if headers are bogus
  selftest: packetdrill: Add tcp_fastopen_server_reset-after-disconnect.pkt.
  tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().
  octeon_ep: fix VF MAC address lifecycle handling
  selftests: bonding: add vlan over bond testing
  bonding: don't set oif to bond dev when getting NS target destination
  net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
  net/mlx5e: Add a miss level for ipsec crypto offload
  net/mlx5e: Harden uplink netdev access against device unbind
  MAINTAINERS: make the DPLL entry cover drivers
  doc/netlink: Fix typos in operation attributes
  igc: don't fail igc_probe() on LED setup error
  ...
2025-09-18 10:22:02 -07:00
Linus Torvalds
86cc796e5e Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "These are mostly Oliver's Arm changes: lock ordering fixes for the
  vGIC, and reverts for a buggy attempt to avoid RCU stalls on large
  VMs.

  Arm:

   - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
     visiting from an MMU notifier

   - Fixes to the TLB match process and TLB invalidation range for
     managing the VCNR pseudo-TLB

   - Prevent SPE from erroneously profiling guests due to UNKNOWN reset
     values in PMSCR_EL1

   - Fix save/restore of host MDCR_EL2 to account for eagerly
     programming at vcpu_load() on VHE systems

   - Correct lock ordering when dealing with VGIC LPIs, avoiding
     scenarios where an xarray's spinlock was nested with a *raw*
     spinlock

   - Permit stage-2 read permission aborts which are possible in the
     case of NV depending on the guest hypervisor's stage-2 translation

   - Call raw_spin_unlock() instead of the internal spinlock API

   - Fix parameter ordering when assigning VBAR_EL1

   - Reverted a couple of fixes for RCU stalls when destroying a stage-2
     page table.

     There appears to be some nasty refcounting / UAF issues lurking in
     those patches and the band-aid we tried to apply didn't hold.

  s390:

   - mm fixes, including userfaultfd bug fix

  x86:

   - Sync the vTPR from the local APIC to the VMCB even when AVIC is
     active.

     This fixes a bug where host updates to the vTPR, e.g. via
     KVM_SET_LAPIC or emulation of a guest access, are lost and result
     in interrupt delivery issues in the guest"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active
  Revert "KVM: arm64: Split kvm_pgtable_stage2_destroy()"
  Revert "KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables"
  KVM: arm64: vgic: fix incorrect spinlock API usage
  KVM: arm64: Remove stage 2 read fault check
  KVM: arm64: Fix parameter ordering for VBAR_EL1 assignment
  KVM: arm64: nv: Fix incorrect VNCR invalidation range calculation
  KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
  KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock
  KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks
  KVM: arm64: Spin off release helper from vgic_put_irq()
  KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs
  KVM: arm64: vgic: Drop stale comment on IRQ active state
  KVM: arm64: VHE: Save and restore host MDCR_EL2 value correctly
  KVM: arm64: Initialize PMSCR_EL1 when in VHE
  KVM: arm64: nv: fix VNCR TLB ASID match logic for non-Global entries
  KVM: s390: Fix FOLL_*/FAULT_FLAG_* confusion
  KVM: s390: Fix incorrect usage of mmu_notifier_register()
  KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
  KVM: arm64: Mark freed S2 MMUs as invalid
2025-09-18 09:42:55 -07:00
Linus Torvalds
604530cd9a Merge tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
 "Fixes and new HW support:

   - amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list

   - amd/pmf: Support new ACPI ID AMDI0108

   - asus-wmi: Re-add extra keys to ignore_key_wlan quirk

   - oxpec: Add support for AOKZOE A1X and OneXPlayer X1Pro EVA-02"

* tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: asus-wmi: Re-add extra keys to ignore_key_wlan quirk
  platform/x86/amd/pmf: Support new ACPI ID AMDI0108
  platform/x86: oxpec: Add support for AOKZOE A1X
  platform/x86: oxpec: Add support for OneXPlayer X1Pro EVA-02
  platform/x86/amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list
2025-09-18 09:22:34 -07:00
Linus Torvalds
f03e578c8a Merge tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML fixes from Johannes Berg:
 "A few fixes for UML, which I'd meant to send earlier but then forgot.

  All of them are pretty long-standing issues that are either not really
  happening (the UAF), in rarely used code (the FD buffer issue), or an
  issue only for some host configurations (the executable stack):

   - mark stack not executable to work on more modern systems with
     selinux

   - fix use-after-free in a virtio error path

   - fix stack buffer overflow in external unix socket FD receive
     function"

* tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: Fix FD copy size in os_rcv_fd_msg()
  um: virtio_uml: Fix use-after-free after put_device in probe
  um: Don't mark stack executable
2025-09-18 09:18:27 -07:00
Duoming Zhou
f8b4687151 octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
The original code relies on cancel_delayed_work() in otx2_ptp_destroy(),
which does not ensure that the delayed work item synctstamp_work has fully
completed if it was already running. This leads to use-after-free scenarios
where otx2_ptp is deallocated by otx2_ptp_destroy(), while synctstamp_work
remains active and attempts to dereference otx2_ptp in otx2_sync_tstamp().
Furthermore, the synctstamp_work is cyclic, the likelihood of triggering
the bug is nonnegligible.

A typical race condition is illustrated below:

CPU 0 (cleanup)           | CPU 1 (delayed work callback)
otx2_remove()             |
  otx2_ptp_destroy()      | otx2_sync_tstamp()
    cancel_delayed_work() |
    kfree(ptp)            |
                          |   ptp = container_of(...); //UAF
                          |   ptp-> //UAF

This is confirmed by a KASAN report:

BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x7d7/0x8c0
Write of size 8 at addr ffff88800aa09a18 by task bash/136
...
Call Trace:
 <IRQ>
 dump_stack_lvl+0x55/0x70
 print_report+0xcf/0x610
 ? __run_timer_base.part.0+0x7d7/0x8c0
 kasan_report+0xb8/0xf0
 ? __run_timer_base.part.0+0x7d7/0x8c0
 __run_timer_base.part.0+0x7d7/0x8c0
 ? __pfx___run_timer_base.part.0+0x10/0x10
 ? __pfx_read_tsc+0x10/0x10
 ? ktime_get+0x60/0x140
 ? lapic_next_event+0x11/0x20
 ? clockevents_program_event+0x1d4/0x2a0
 run_timer_softirq+0xd1/0x190
 handle_softirqs+0x16a/0x550
 irq_exit_rcu+0xaf/0xe0
 sysvec_apic_timer_interrupt+0x70/0x80
 </IRQ>
...
Allocated by task 1:
 kasan_save_stack+0x24/0x50
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0x7f/0x90
 otx2_ptp_init+0xb1/0x860
 otx2_probe+0x4eb/0xc30
 local_pci_probe+0xdc/0x190
 pci_device_probe+0x2fe/0x470
 really_probe+0x1ca/0x5c0
 __driver_probe_device+0x248/0x310
 driver_probe_device+0x44/0x120
 __driver_attach+0xd2/0x310
 bus_for_each_dev+0xed/0x170
 bus_add_driver+0x208/0x500
 driver_register+0x132/0x460
 do_one_initcall+0x89/0x300
 kernel_init_freeable+0x40d/0x720
 kernel_init+0x1a/0x150
 ret_from_fork+0x10c/0x1a0
 ret_from_fork_asm+0x1a/0x30

Freed by task 136:
 kasan_save_stack+0x24/0x50
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3a/0x60
 __kasan_slab_free+0x3f/0x50
 kfree+0x137/0x370
 otx2_ptp_destroy+0x38/0x80
 otx2_remove+0x10d/0x4c0
 pci_device_remove+0xa6/0x1d0
 device_release_driver_internal+0xf8/0x210
 pci_stop_bus_device+0x105/0x150
 pci_stop_and_remove_bus_device_locked+0x15/0x30
 remove_store+0xcc/0xe0
 kernfs_fop_write_iter+0x2c3/0x440
 vfs_write+0x871/0xd70
 ksys_write+0xee/0x1c0
 do_syscall_64+0xac/0x280
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
...

Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the delayed work item is properly canceled before the otx2_ptp is
deallocated.

This bug was initially identified through static analysis. To reproduce
and test it, I simulated the OcteonTX2 PCI device in QEMU and introduced
artificial delays within the otx2_sync_tstamp() function to increase the
likelihood of triggering the bug.

Fixes: 2958d17a89 ("octeontx2-pf: Add support for ptp 1-step mode on CN10K silicon")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 07:47:18 -07:00
Duoming Zhou
cfa7d9b1e3 cnic: Fix use-after-free bugs in cnic_delete_task
The original code uses cancel_delayed_work() in cnic_cm_stop_bnx2x_hw(),
which does not guarantee that the delayed work item 'delete_task' has
fully completed if it was already running. Additionally, the delayed work
item is cyclic, the flush_workqueue() in cnic_cm_stop_bnx2x_hw() only
blocks and waits for work items that were already queued to the
workqueue prior to its invocation. Any work items submitted after
flush_workqueue() is called are not included in the set of tasks that the
flush operation awaits. This means that after the cyclic work items have
finished executing, a delayed work item may still exist in the workqueue.
This leads to use-after-free scenarios where the cnic_dev is deallocated
by cnic_free_dev(), while delete_task remains active and attempt to
dereference cnic_dev in cnic_delete_task().

A typical race condition is illustrated below:

CPU 0 (cleanup)              | CPU 1 (delayed work callback)
cnic_netdev_event()          |
  cnic_stop_hw()             | cnic_delete_task()
    cnic_cm_stop_bnx2x_hw()  | ...
      cancel_delayed_work()  | /* the queue_delayed_work()
      flush_workqueue()      |    executes after flush_workqueue()*/
                             | queue_delayed_work()
  cnic_free_dev(dev)//free   | cnic_delete_task() //new instance
                             |   dev = cp->dev; //use

Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the cyclic delayed work item is properly canceled and that any
ongoing execution of the work item completes before the cnic_dev is
deallocated. Furthermore, since cancel_delayed_work_sync() uses
__flush_work(work, true) to synchronously wait for any currently
executing instance of the work item to finish, the flush_workqueue()
becomes redundant and should be removed.

This bug was identified through static analysis. To reproduce the issue
and validate the fix, I simulated the cnic PCI device in QEMU and
introduced intentional delays — such as inserting calls to ssleep()
within the cnic_delete_task() function — to increase the likelihood
of triggering the bug.

Fixes: fdf24086f4 ("cnic: Defer iscsi connection cleanup")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 07:47:18 -07:00
Cosmin Ratiu
3191df0a48 devlink rate: Remove unnecessary 'static' from a couple places
devlink_rate_node_get_by_name() and devlink_rate_nodes_destroy() have a
couple of unnecessary static variables for iterating over devlink rates.

This could lead to races/corruption/unhappiness if two concurrent
operations execute the same function.

Remove 'static' from both. It's amazing this was missed for 4+ years.
While at it, I confirmed there are no more examples of this mistake in
net/ with 1, 2 or 3 levels of indentation.

Fixes: a8ecb93ef0 ("devlink: Introduce rate nodes")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 07:47:18 -07:00
Denis Kirjanov
7736aff470 MAINTAINERS: update sundance entry
Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 07:47:17 -07:00
Alexey Nepomnyashih
cca7b1cfd7 net: liquidio: fix overflow in octeon_init_instr_queue()
The expression `(conf->instr_type == 64) << iq_no` can overflow because
`iq_no` may be as high as 64 (`CN23XX_MAX_RINGS_PER_PF`). Casting the
operand to `u64` ensures correct 64-bit arithmetic.

Fixes: f21fb3ed36 ("Add support of Cavium Liquidio ethernet adapters")
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 07:47:17 -07:00
Eric Dumazet
87ebb628a5 net: clear sk->sk_ino in sk_set_socket(sk, NULL)
Andrei Vagin reported that blamed commit broke CRIU.

Indeed, while we want to keep sk_uid unchanged when a socket
is cloned, we want to clear sk->sk_ino.

Otherwise, sock_diag might report multiple sockets sharing
the same inode number.

Move the clearing part from sock_orphan() to sk_set_socket(sk, NULL),
called both from sock_orphan() and sk_clone_lock().

Fixes: 5d6b58c932 ("net: lockless sock_i_ino()")
Closes: https://lore.kernel.org/netdev/aMhX-VnXkYDpKd9V@google.com/
Closes: https://github.com/checkpoint-restore/criu/issues/2744
Reported-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrei Vagin <avagin@google.com>
Link: https://patch.msgid.link/20250917135337.1736101-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 07:47:17 -07:00
Tariq Toukan
3fbfe251cc Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
This reverts commit d24341740f.
It causes errors when trying to configure QoS, as well as
loss of L2 connectivity (on multi-host devices).

Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/20250910170011.70528106@kernel.org
Fixes: d24341740f ("net/mlx5e: Update and set Xon/Xoff upon port speed set")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 07:47:17 -07:00
Jakub Kicinski
4c05c7ed88 selftests: tls: test skb copy under mem pressure and OOB
Add a test which triggers mem pressure via OOB writes.

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250917002814.1743558-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-18 12:43:54 +02:00
Jakub Kicinski
0aeb54ac4c tls: make sure to abort the stream if headers are bogus
Normally we wait for the socket to buffer up the whole record
before we service it. If the socket has a tiny buffer, however,
we read out the data sooner, to prevent connection stalls.
Make sure that we abort the connection when we find out late
that the record is actually invalid. Retrying the parsing is
fine in itself but since we copy some more data each time
before we parse we can overflow the allocated skb space.

Constructing a scenario in which we're under pressure without
enough data in the socket to parse the length upfront is quite
hard. syzbot figured out a way to do this by serving us the header
in small OOB sends, and then filling in the recvbuf with a large
normal send.

Make sure that tls_rx_msg_size() aborts strp, if we reach
an invalid record there's really no way to recover.

Reported-by: Lee Jones <lee@kernel.org>
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250917002814.1743558-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-18 12:43:54 +02:00
Linus Torvalds
8b789f2b76 Merge tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "15 hotfixes. 11 are cc:stable and the remainder address post-6.16
  issues or aren't considered necessary for -stable kernels. 13 of these
  fixes are for MM.

  The usual shower of singletons, plus

   - fixes from Hugh to address various misbehaviors in get_user_pages()

   - patches from SeongJae to address a quite severe issue in DAMON

   - another series also from SeongJae which completes some fixes for a
     DAMON startup issue"

* tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  zram: fix slot write race condition
  nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/*
  samples/damon/mtier: avoid starting DAMON before initialization
  samples/damon/prcl: avoid starting DAMON before initialization
  samples/damon/wsse: avoid starting DAMON before initialization
  MAINTAINERS: add Lance Yang as a THP reviewer
  MAINTAINERS: add Jann Horn as rmap reviewer
  mm/damon/sysfs: use dynamically allocated repeat mode damon_call_control
  mm/damon/core: introduce damon_call_control->dealloc_on_cancel
  mm: folio_may_be_lru_cached() unless folio_test_large()
  mm: revert "mm: vmscan.c: fix OOM on swap stress test"
  mm: revert "mm/gup: clear the LRU flag of a page before adding to LRU batch"
  mm/gup: local lru_add_drain() to avoid lru_add_drain_all()
  mm/gup: check ref_count instead of lru before migration
2025-09-17 21:34:26 -07:00
Linus Torvalds
592a93fea1 Merge tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:

 - Two fixes for remaining_data_length and offset checks in receive path

 - Don't go over max SGEs which caused smbdirect send to fail (and
   trigger disconnect)

* tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_size
  ksmbd: smbdirect: validate data_offset and data_length field of smb_direct_data_transfer
  smb: server: let smb_direct_writev() respect SMB_DIRECT_MAX_SEND_SGES
2025-09-17 18:23:01 -07:00
Linus Torvalds
992d4e481e Merge tag 'probes-fixes-v6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probe fix from Masami Hiramatsu:

 - kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal(),
   by handling NULL return of kmemdup() correctly

* tag 'probes-fixes-v6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal()
2025-09-17 16:52:26 -07:00
Jakub Kicinski
0984710897 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-09-16 (ice, i40e, ixgbe, igc)

For ice:
Jake resolves leaking pages with multi-buffer frames when a 0-sized
descriptor is encountered.

For i40e:
Maciej removes a redundant, and incorrect, memory barrier.

For ixgbe:
Jedrzej adjusts lifespan of ACI lock to ensure uses are while it is
valid.

For igc:
Kohei Enju does not fail probe on LED setup failure which resolves a
kernel panic in the cleanup path, if we were to fail.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: don't fail igc_probe() on LED setup error
  ixgbe: destroy aci.lock later within ixgbe_remove path
  ixgbe: initialize aci.lock before it's used
  i40e: remove redundant memory barrier when cleaning Tx descs
  ice: fix Rx page leak on multi-buffer frames
====================

Link: https://patch.msgid.link/20250916212801.2818440-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17 16:14:25 -07:00
Jakub Kicinski
934da21f99 Merge tag 'wireless-2025-09-17' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:

====================
Just two fixes:
 - fix crash in rfkill due to uninitialized type_name
 - fix aggregation in iwlwifi 7000/8000 devices

* tag 'wireless-2025-09-17' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
  wifi: iwlwifi: pcie: fix byte count table for some devices
====================

Link: https://patch.msgid.link/20250917105159.161583-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17 16:12:46 -07:00
Jakub Kicinski
d7995c2b91 Merge branch 'tcp-clear-tcp_sk-sk-fastopen_rsk-in-tcp_disconnect'
Kuniyuki Iwashima says:

====================
tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().

syzbot reported a warning in tcp_retransmit_timer() for TCP Fast
Open socket.

Patch 1 fixes the issue and Patch 2 adds a test for the scenario.
====================

Link: https://patch.msgid.link/20250915175800.118793-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17 16:01:54 -07:00
Kuniyuki Iwashima
1fd0362262 selftest: packetdrill: Add tcp_fastopen_server_reset-after-disconnect.pkt.
The test reproduces the scenario explained in the previous patch.

Without the patch, the test triggers the warning and cannot see the last
retransmitted packet.

  # ./ksft_runner.sh tcp_fastopen_server_reset-after-disconnect.pkt
  TAP version 13
  1..2
  [   29.229250] ------------[ cut here ]------------
  [   29.231414] WARNING: CPU: 26 PID: 0 at net/ipv4/tcp_timer.c:542 tcp_retransmit_timer+0x32/0x9f0
  ...
  tcp_fastopen_server_reset-after-disconnect.pkt:26: error handling packet: Timed out waiting for packet
  not ok 1 ipv4
  tcp_fastopen_server_reset-after-disconnect.pkt:26: error handling packet: Timed out waiting for packet
  not ok 2 ipv6
  # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250915175800.118793-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17 16:01:52 -07:00
Kuniyuki Iwashima
45c8a6cc2b tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().
syzbot reported the splat below where a socket had tcp_sk(sk)->fastopen_rsk
in the TCP_ESTABLISHED state. [0]

syzbot reused the server-side TCP Fast Open socket as a new client before
the TFO socket completes 3WHS:

  1. accept()
  2. connect(AF_UNSPEC)
  3. connect() to another destination

As of accept(), sk->sk_state is TCP_SYN_RECV, and tcp_disconnect() changes
it to TCP_CLOSE and makes connect() possible, which restarts timers.

Since tcp_disconnect() forgot to clear tcp_sk(sk)->fastopen_rsk, the
retransmit timer triggered the warning and the intended packet was not
retransmitted.

Let's call reqsk_fastopen_remove() in tcp_disconnect().

[0]:
WARNING: CPU: 2 PID: 0 at net/ipv4/tcp_timer.c:542 tcp_retransmit_timer (net/ipv4/tcp_timer.c:542 (discriminator 7))
Modules linked in:
CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Not tainted 6.17.0-rc5-g201825fb4278 #62 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:tcp_retransmit_timer (net/ipv4/tcp_timer.c:542 (discriminator 7))
Code: 41 55 41 54 55 53 48 8b af b8 08 00 00 48 89 fb 48 85 ed 0f 84 55 01 00 00 0f b6 47 12 3c 03 74 0c 0f b6 47 12 3c 04 74 04 90 <0f> 0b 90 48 8b 85 c0 00 00 00 48 89 ef 48 8b 40 30 e8 6a 4f 06 3e
RSP: 0018:ffffc900002f8d40 EFLAGS: 00010293
RAX: 0000000000000002 RBX: ffff888106911400 RCX: 0000000000000017
RDX: 0000000002517619 RSI: ffffffff83764080 RDI: ffff888106911400
RBP: ffff888106d5c000 R08: 0000000000000001 R09: ffffc900002f8de8
R10: 00000000000000c2 R11: ffffc900002f8ff8 R12: ffff888106911540
R13: ffff888106911480 R14: ffff888106911840 R15: ffffc900002f8de0
FS:  0000000000000000(0000) GS:ffff88907b768000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8044d69d90 CR3: 0000000002c30003 CR4: 0000000000370ef0
Call Trace:
 <IRQ>
 tcp_write_timer (net/ipv4/tcp_timer.c:738)
 call_timer_fn (kernel/time/timer.c:1747)
 __run_timers (kernel/time/timer.c:1799 kernel/time/timer.c:2372)
 timer_expire_remote (kernel/time/timer.c:2385 kernel/time/timer.c:2376 kernel/time/timer.c:2135)
 tmigr_handle_remote_up (kernel/time/timer_migration.c:944 kernel/time/timer_migration.c:1035)
 __walk_groups.isra.0 (kernel/time/timer_migration.c:533 (discriminator 1))
 tmigr_handle_remote (kernel/time/timer_migration.c:1096)
 handle_softirqs (./arch/x86/include/asm/jump_label.h:36 ./include/trace/events/irq.h:142 kernel/softirq.c:580)
 irq_exit_rcu (kernel/softirq.c:614 kernel/softirq.c:453 kernel/softirq.c:680 kernel/softirq.c:696)
 sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1050 (discriminator 35) arch/x86/kernel/apic/apic.c:1050 (discriminator 35))
 </IRQ>

Fixes: 8336886f78 ("tcp: TCP Fast Open Server - support TFO listeners")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250915175800.118793-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17 16:01:52 -07:00
Sathesh B Edara
a72175c985 octeon_ep: fix VF MAC address lifecycle handling
Currently, VF MAC address info is not updated when the MAC address is
configured from VF, and it is not cleared when the VF is removed. This
leads to stale or missing MAC information in the PF, which may cause
incorrect state tracking or inconsistencies when VFs are hot-plugged
or reassigned.

Fix this by:
 - storing the VF MAC address in the PF when it is set from VF
 - clearing the stored VF MAC address when the VF is removed

This ensures that the PF always has correct VF MAC state.

Fixes: cde29af9e6 ("octeon_ep: add PF-VF mailbox communication")
Signed-off-by: Sathesh B Edara <sedara@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250916133207.21737-1-sedara@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17 15:58:32 -07:00
Wang Liang
dc3382fffd tracing: kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal()
A crash was observed with the following output:

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 2899 Comm: syz.2.399 Not tainted 6.17.0-rc5+ #5 PREEMPT(none)
RIP: 0010:trace_kprobe_create_internal+0x3fc/0x1440 kernel/trace/trace_kprobe.c:911
Call Trace:
 <TASK>
 trace_kprobe_create_cb+0xa2/0xf0 kernel/trace/trace_kprobe.c:1089
 trace_probe_create+0xf1/0x110 kernel/trace/trace_probe.c:2246
 dyn_event_create+0x45/0x70 kernel/trace/trace_dynevent.c:128
 create_or_delete_trace_kprobe+0x5e/0xc0 kernel/trace/trace_kprobe.c:1107
 trace_parse_run_command+0x1a5/0x330 kernel/trace/trace.c:10785
 vfs_write+0x2b6/0xd00 fs/read_write.c:684
 ksys_write+0x129/0x240 fs/read_write.c:738
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x5d/0x2d0 arch/x86/entry/syscall_64.c:94
 </TASK>

Function kmemdup() may return NULL in trace_kprobe_create_internal(), add
check for it's return value.

Link: https://lore.kernel.org/all/20250916075816.3181175-1-wangliang74@huawei.com/

Fixes: 33b4e38baa ("tracing: kprobe-event: Allocate string buffers from heap")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-09-18 07:36:41 +09:00
Hangbin Liu
dc5f94b1ec selftests: bonding: add vlan over bond testing
Add a vlan over bond testing to make sure arp/ns target works.
Also change all the configs to mudules.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250916080127.430626-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17 15:13:51 -07:00
Hangbin Liu
a8ba87f04c bonding: don't set oif to bond dev when getting NS target destination
Unlike IPv4, IPv6 routing strictly requires the source address to be valid
on the outgoing interface. If the NS target is set to a remote VLAN interface,
and the source address is also configured on a VLAN over a bond interface,
setting the oif to the bond device will fail to retrieve the correct
destination route.

Fix this by not setting the oif to the bond device when retrieving the NS
target destination. This allows the correct destination device (the VLAN
interface) to be determined, so that bond_verify_device_path can return the
proper VLAN tags for sending NS messages.

Reported-by: David Wilder <wilder@us.ibm.com>
Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/
Suggested-by: Jay Vosburgh <jv@jvosburgh.net>
Tested-by: David Wilder <wilder@us.ibm.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Fixes: 4e24be018e ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250916080127.430626-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17 15:13:51 -07:00
Linus Torvalds
37889ceadd Merge tag 'sched_ext-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:

 - Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED

 - Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"
   which was causing issues with per-CPU task scheduling and reenqueuing
   behavior

* tag 'sched_ext-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext, sched/core: Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED
  Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"
2025-09-17 13:27:31 -07:00
Linus Torvalds
05950213a9 Merge tag 'cgroup-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "This contains two cgroup changes. Both are pretty low risk.

  - Fix deadlock in cgroup destruction when repeatedly
    mounting/unmounting perf_event and net_prio controllers.

    The issue occurs because cgroup_destroy_wq has max_active=1, causing
    root destruction to wait for CSS offline operations that are queued
    behind it.

    The fix splits cgroup_destroy_wq into three separate workqueues to
    eliminate the blocking.

  - Set of->priv to NULL upon file release to make potential bugs to
    manifest as NULL pointer dereferences rather than use-after-free
    errors"

* tag 'cgroup-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/psi: Set of->priv to NULL upon file release
  cgroup: split cgroup_destroy_wq into 3 workqueues
2025-09-17 13:22:08 -07:00
Paolo Bonzini
ecd42dd170 Merge tag 'kvm-s390-master-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
- KVM mm fixes
- Postcopy fix
2025-09-17 19:45:21 +02:00
Paolo Bonzini
f6f43a5338 Merge tag 'kvm-x86-fixes-6.17-rcN' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fix for 6.17-rcN

Sync the vTPR from the local APIC to the VMCB even when AVIC is active, to fix
a bug where host updates to the vTPR, e.g. via KVM_SET_LAPIC or emulation of a
guest access, effectively get lost and result in interrupt delivery issues in
the guest.
2025-09-17 19:45:02 +02:00
Paolo Bonzini
7d862707d6 Merge tag 'kvmarm-fixes-6.17-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #3

 - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
   visiting from an MMU notifier

 - Fixes to the TLB match process and TLB invalidation range for
   managing the VCNR pseudo-TLB

 - Prevent SPE from erroneously profiling guests due to UNKNOWN reset
   values in PMSCR_EL1

 - Fix save/restore of host MDCR_EL2 to account for eagerly programming
   at vcpu_load() on VHE systems

 - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios
   where an xarray's spinlock was nested with a *raw* spinlock

 - Permit stage-2 read permission aborts which are possible in the case
   of NV depending on the guest hypervisor's stage-2 translation

 - Call raw_spin_unlock() instead of the internal spinlock API

 - Fix parameter ordering when assigning VBAR_EL1
2025-09-17 19:44:40 +02:00
Linus Torvalds
d4b779985a Merge tag 'for-6.17/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mikulas Patocka:

 - fix integer overflow in dm-stripe

 - limit tag size in dm-integrity to 255 bytes

 - fix 'alignment inconsistency' warning in dm-raid

* tag 'for-6.17/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm-raid: don't set io_min and io_opt for raid1
  dm-integrity: limit MAX_TAG_SIZE to 255
  dm-stripe: fix a possible integer overflow
2025-09-17 08:07:02 -07:00
Linus Torvalds
b6f456a76f Merge tag 'for-6.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - in zoned mode, turn assertion to proper code when reserving space in
   relocation block group

 - fix search key of extended ref (hardlink) when replaying log

 - fix initialization of file extent tree on filesystems without
   no-holes feature

 - add harmless data race annotation to block group comparator

* tag 'for-6.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: annotate block group access with data_race() when sorting for reclaim
  btrfs: initialize inode::file_extent_tree after i_mode has been set
  btrfs: zoned: fix incorrect ASSERT in btrfs_zoned_reserve_data_reloc_bg()
  btrfs: fix invalid extref key setup when replaying dentry
2025-09-17 07:55:45 -07:00
Mikulas Patocka
a865562646 dm-raid: don't set io_min and io_opt for raid1
These commands
 modprobe brd rd_size=1048576
 vgcreate vg /dev/ram*
 lvcreate -m4 -L10 -n lv vg
trigger the following warnings:
device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency
device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency

The warnings are caused by the fact that io_min is 512 and physical block
size is 4096.

If there's chunk-less raid, such as raid1, io_min shouldn't be set to zero
because it would be raised to 512 and it would trigger the warning.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
2025-09-17 16:04:29 +02:00
Hans de Goede
b6f56a44e4 net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
Since commit 7d5e9737ef ("net: rfkill: gpio: get the name and type from
device property") rfkill_find_type() gets called with the possibly
uninitialized "const char *type_name;" local variable.

On x86 systems when rfkill-gpio binds to a "BCM4752" or "LNV4752"
acpi_device, the rfkill->type is set based on the ACPI acpi_device_id:

        rfkill->type = (unsigned)id->driver_data;

and there is no "type" property so device_property_read_string() will fail
and leave type_name uninitialized, leading to a potential crash.

rfkill_find_type() does accept a NULL pointer, fix the potential crash
by initializing type_name to NULL.

Note likely sofar this has not been caught because:

1. Not many x86 machines actually have a "BCM4752"/"LNV4752" acpi_device
2. The stack happened to contain NULL where type_name is stored

Fixes: 7d5e9737ef ("net: rfkill: gpio: get the name and type from device property")
Cc: stable@vger.kernel.org
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20250913113515.21698-1-hansg@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-09-17 12:37:05 +02:00
Johannes Berg
e882985b09 Merge tag 'iwlwifi-fixes-2025-09-15' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next
Miri Korenblit says:
====================
iwlwifi fix
====================

The fix is for byte count tables in 7000/8000 family devices.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-09-17 12:36:08 +02:00
Tejun Heo
a1eab4d813 sched_ext, sched/core: Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED
While collecting SCX related fields in struct task_group into struct
scx_task_group, 6e6558a6bc ("sched_ext, sched/core: Factor out struct
scx_task_group") forgot update tg->scx_weight usage in tg_weight(), which
leads to build failure when CONFIG_FAIR_GROUP_SCHED is disabled but
CONFIG_EXT_GROUP_SCHED is enabled. Fix it.

Fixes: 6e6558a6bc ("sched_ext, sched/core: Factor out struct scx_task_group")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509170230.MwZsJSWa-lkp@intel.com/
Tested-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-09-16 23:07:27 -10:00
Jakub Kicinski
8c47485399 Merge branch 'mlx5e-misc-fixes-2025-09-15'
Tariq Toukan says:

====================
mlx5e misc fixes 2025-09-15

This patchset provides misc bug fixes from the team to the mlx5 Eth
driver.
====================

Link: https://patch.msgid.link/1757939074-617281-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16 17:19:15 -07:00
Lama Kayal
7601a0a462 net/mlx5e: Add a miss level for ipsec crypto offload
The cited commit adds a miss table for switchdev mode. But it
uses the same level as policy table. Will hit the following error
when running command:

 # ip xfrm state add src 192.168.1.22 dst 192.168.1.21 proto	\
	esp spi 1001 reqid 10001 aead 'rfc4106(gcm(aes))'	\
	0x3a189a7f9374955d3817886c8587f1da3df387ff 128		\
	mode tunnel offload dev enp8s0f0 dir in
 Error: mlx5_core: Device failed to offload this state.

The dmesg error is:

 mlx5_core 0000:03:00.0: ipsec_miss_create:578:(pid 311797): fail to create IPsec miss_rule err=-22

Fix it by adding a new miss level to avoid the error.

Fixes: 7d9e292ecd ("net/mlx5e: Move IPSec policy check after decryption")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1757939074-617281-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16 17:19:14 -07:00
Jianbo Liu
6b4be64fd9 net/mlx5e: Harden uplink netdev access against device unbind
The function mlx5_uplink_netdev_get() gets the uplink netdevice
pointer from mdev->mlx5e_res.uplink_netdev. However, the netdevice can
be removed and its pointer cleared when unbound from the mlx5_core.eth
driver. This results in a NULL pointer, causing a kernel panic.

 BUG: unable to handle page fault for address: 0000000000001300
 at RIP: 0010:mlx5e_vport_rep_load+0x22a/0x270 [mlx5_core]
 Call Trace:
  <TASK>
  mlx5_esw_offloads_rep_load+0x68/0xe0 [mlx5_core]
  esw_offloads_enable+0x593/0x910 [mlx5_core]
  mlx5_eswitch_enable_locked+0x341/0x420 [mlx5_core]
  mlx5_devlink_eswitch_mode_set+0x17e/0x3a0 [mlx5_core]
  devlink_nl_eswitch_set_doit+0x60/0xd0
  genl_family_rcv_msg_doit+0xe0/0x130
  genl_rcv_msg+0x183/0x290
  netlink_rcv_skb+0x4b/0xf0
  genl_rcv+0x24/0x40
  netlink_unicast+0x255/0x380
  netlink_sendmsg+0x1f3/0x420
  __sock_sendmsg+0x38/0x60
  __sys_sendto+0x119/0x180
  do_syscall_64+0x53/0x1d0
  entry_SYSCALL_64_after_hwframe+0x4b/0x53

Ensure the pointer is valid before use by checking it for NULL. If it
is valid, immediately call netdev_hold() to take a reference, and
preventing the netdevice from being freed while it is in use.

Fixes: 7a9fb35e8c ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1757939074-617281-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16 17:19:11 -07:00
Jakub Kicinski
94ff1ed303 MAINTAINERS: make the DPLL entry cover drivers
DPLL maintainers should probably be CCed on driver patches, too.
Remove the *, which makes the pattern only match files directly
under drivers/dpll but not its sub-directories.

Acked-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20250915234255.1306612-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16 16:55:59 -07:00
Remy D. Farley
109f8b5154 doc/netlink: Fix typos in operation attributes
I'm trying to generate Rust bindings for netlink using the yaml spec.

It looks like there's a typo in conntrack spec: attribute set conntrack-attrs
defines attributes "counters-{orig,reply}" (plural), while get operation
references "counter-{orig,reply}" (singular). The latter should be fixed, as it
denotes multiple counters (packet and byte). The corresonding C define is
CTA_COUNTERS_ORIG.

Also, dump request references "nfgen-family" attribute, which neither exists in
conntrack-attrs attrset nor ctattr_type enum. There's member of nfgenmsg struct
with the same name, which is where family value is actually taken from.

> static int ctnetlink_dump_exp_ct(struct net *net, struct sock *ctnl,
>                struct sk_buff *skb,
>                const struct nlmsghdr *nlh,
>                const struct nlattr * const cda[],
>                struct netlink_ext_ack *extack)
> {
>   int err;
>   struct nfgenmsg *nfmsg = nlmsg_data(nlh);
>   u_int8_t u3 = nfmsg->nfgen_family;
                         ^^^^^^^^^^^^

Signed-off-by: Remy D. Farley <one-d-wide@protonmail.com>
Fixes: 23fc9311a5 ("netlink: specs: add conntrack dump and stats dump support")
Link: https://patch.msgid.link/20250913140515.1132886-1-one-d-wide@protonmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16 16:30:06 -07:00
Linus Torvalds
5aca7966d2 Merge tag 'perf-tools-fixes-for-v6.17-2025-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
 "A small set of fixes for crashes in different commands and conditions"

* tag 'perf-tools-fixes-for-v6.17-2025-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf maps: Ensure kmap is set up for all inserts
  perf lock: Provide a host_env for session new
  perf subcmd: avoid crash in exclude_cmds when excludes is empty
2025-09-16 15:15:54 -07:00
Kohei Enju
528eb4e19e igc: don't fail igc_probe() on LED setup error
When igc_led_setup() fails, igc_probe() fails and triggers kernel panic
in free_netdev() since unregister_netdev() is not called. [1]
This behavior can be tested using fault-injection framework, especially
the failslab feature. [2]

Since LED support is not mandatory, treat LED setup failures as
non-fatal and continue probe with a warning message, consequently
avoiding the kernel panic.

[1]
 kernel BUG at net/core/dev.c:12047!
 Oops: invalid opcode: 0000 [#1] SMP NOPTI
 CPU: 0 UID: 0 PID: 937 Comm: repro-igc-led-e Not tainted 6.17.0-rc4-enjuk-tnguy-00865-gc4940196ab02 #64 PREEMPT(voluntary)
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
 RIP: 0010:free_netdev+0x278/0x2b0
 [...]
 Call Trace:
  <TASK>
  igc_probe+0x370/0x910
  local_pci_probe+0x3a/0x80
  pci_device_probe+0xd1/0x200
 [...]

[2]
 #!/bin/bash -ex

 FAILSLAB_PATH=/sys/kernel/debug/failslab/
 DEVICE=0000:00:05.0
 START_ADDR=$(grep " igc_led_setup" /proc/kallsyms \
         | awk '{printf("0x%s", $1)}')
 END_ADDR=$(printf "0x%x" $((START_ADDR + 0x100)))

 echo $START_ADDR > $FAILSLAB_PATH/require-start
 echo $END_ADDR > $FAILSLAB_PATH/require-end
 echo 1 > $FAILSLAB_PATH/times
 echo 100 > $FAILSLAB_PATH/probability
 echo N > $FAILSLAB_PATH/ignore-gfp-wait

 echo $DEVICE > /sys/bus/pci/drivers/igc/bind

Fixes: ea578703b0 ("igc: Add support for LEDs on i225/i226")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-16 14:01:53 -07:00
Jedrzej Jagielski
316ba68175 ixgbe: destroy aci.lock later within ixgbe_remove path
There's another issue with aci.lock and previous patch uncovers it.
aci.lock is being destroyed during removing ixgbe while some of the
ixgbe closing routines are still ongoing. These routines use Admin
Command Interface which require taking aci.lock which has been already
destroyed what leads to call trace.

[  +0.000004] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[  +0.000007] WARNING: CPU: 12 PID: 10277 at kernel/locking/mutex.c:155 mutex_lock+0x5f/0x70
[  +0.000002] Call Trace:
[  +0.000003]  <TASK>
[  +0.000006]  ixgbe_aci_send_cmd+0xc8/0x220 [ixgbe]
[  +0.000049]  ? try_to_wake_up+0x29d/0x5d0
[  +0.000009]  ixgbe_disable_rx_e610+0xc4/0x110 [ixgbe]
[  +0.000032]  ixgbe_disable_rx+0x3d/0x200 [ixgbe]
[  +0.000027]  ixgbe_down+0x102/0x3b0 [ixgbe]
[  +0.000031]  ixgbe_close_suspend+0x28/0x90 [ixgbe]
[  +0.000028]  ixgbe_close+0xfb/0x100 [ixgbe]
[  +0.000025]  __dev_close_many+0xae/0x220
[  +0.000005]  dev_close_many+0xc2/0x1a0
[  +0.000004]  ? kernfs_should_drain_open_files+0x2a/0x40
[  +0.000005]  unregister_netdevice_many_notify+0x204/0xb00
[  +0.000006]  ? __kernfs_remove.part.0+0x109/0x210
[  +0.000006]  ? kobj_kset_leave+0x4b/0x70
[  +0.000008]  unregister_netdevice_queue+0xf6/0x130
[  +0.000006]  unregister_netdev+0x1c/0x40
[  +0.000005]  ixgbe_remove+0x216/0x290 [ixgbe]
[  +0.000021]  pci_device_remove+0x42/0xb0
[  +0.000007]  device_release_driver_internal+0x19c/0x200
[  +0.000008]  driver_detach+0x48/0x90
[  +0.000003]  bus_remove_driver+0x6d/0xf0
[  +0.000006]  pci_unregister_driver+0x2e/0xb0
[  +0.000005]  ixgbe_exit_module+0x1c/0xc80 [ixgbe]

Same as for the previous commit, the issue has been highlighted by the
commit 337369f8ce ("locking/mutex: Add MUTEX_WARN_ON() into fast path").

Move destroying aci.lock to the end of ixgbe_remove(), as this
simply fixes the issue.

Fixes: 4600cdf9f5 ("ixgbe: Enable link management in E610 device")
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-16 14:01:53 -07:00