linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-16 19:36:58 -04:00

Author	SHA1	Message	Date
Andrey Vatoropin	7d277a7a58	be2net: pass wrb_params in case of OS2BMC be_insert_vlan_in_pkt() is called with the wrb_params argument being NULL at be_send_pkt_to_bmc() call site. This may lead to dereferencing a NULL pointer when processing a workaround for specific packet, as commit `bc0c3405ab` ("be2net: fix a Tx stall bug caused by a specific ipv6 packet") states. The correct way would be to pass the wrb_params from be_xmit(). Fixes: `760c295e0e` ("be2net: Support for OS2BMC.") Cc: stable@vger.kernel.org Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru> Link: https://patch.msgid.link/20251119105015.194501-1-a.vatoropin@crpt.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 07:39:54 -08:00
Paolo Abeni	dc9e7e652f	Merge tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== wireless-2025-11-20 A single fix for scanning on some rtw89 devices. * tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: rtw89: hw_scan: Don't let the operating channel be last ==================== Link: https://patch.msgid.link/20251120085433.8601-3-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 13:03:43 +01:00
Oleksij Rempel	3ceb6ac211	net: dsa: microchip: lan937x: Fix RGMII delay tuning Correct RGMII delay application logic in lan937x_set_tune_adj(). The function was missing `data16 &= ~PORT_TUNE_ADJ` before setting the new delay value. This caused the new value to be bitwise-OR'd with the existing PORT_TUNE_ADJ field instead of replacing it. For example, when setting the RGMII 2 TX delay on port 4, the intended TUNE_ADJUST value of 0 (RGMII_2_TX_DELAY_2NS) was incorrectly OR'd with the default 0x1B (from register value 0xDA3), leaving the delay at the wrong setting. This patch adds the missing mask to clear the field, ensuring the correct delay value is written. Physical measurements on the RGMII TX lines confirm the fix, showing the delay changing from ~1ns (before change) to ~2ns. While testing on i.MX 8MP showed this was within the platform's timing tolerance, it did not match the intended hardware-characterized value. Fixes: `b19ac41faa` ("net: dsa: microchip: apply rgmii tx and rx delay in phylink mac config") Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20251114090951.4057261-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 11:26:14 +01:00
Jakub Kicinski	f170b1dc26	Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-11-18 (idpf, ice) This series contains updates to idpf and ice drivers. Emil adds a check for NULL vport_config during removal to avoid NULL pointer dereference in idpf. Grzegorz fixes PTP teardown paths to account for some missed cleanups for ice driver. * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: fix PTP cleanup on driver removal in error path idpf: fix possible vport_config NULL pointer deref in remove ==================== Link: https://patch.msgid.link/20251118235207.2165495-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:10:53 -08:00
Bitterblue Smith	e837b9091b	wifi: rtw89: hw_scan: Don't let the operating channel be last Scanning can be offloaded to the firmware. To that end, the driver prepares a list of channels to scan, including periodic visits back to the operating channel, and sends the list to the firmware. When the channel list is too long to fit in a single H2C message, the driver splits the list, sends the first part, and tells the firmware to scan. When the scan is complete, the driver sends the next part of the list and tells the firmware to scan. When the last channel that fit in the H2C message is the operating channel something seems to go wrong in the firmware. It will acknowledge receiving the list of channels but apparently it will not do anything more. The AP can't be pinged anymore. The driver still receives beacons, though. One way to avoid this is to split the list of channels before the operating channel. Affected devices: * RTL8851BU with firmware 0.29.41.3 * RTL8832BU with firmware 0.29.29.8 * RTL8852BE with firmware 0.29.29.8 The commit `57a5fbe39a` ("wifi: rtw89: refactor flow that hw scan handles channel list") is found by git blame, but it is actually to refine the scan flow, but not a culprit, so skip Fixes tag. Reported-by: Bitterblue Smith <rtl8821cerfe2@gmail.com> Closes: https://lore.kernel.org/linux-wireless/0abbda91-c5c2-4007-84c8-215679e652e1@gmail.com/ Cc: stable@vger.kernel.org # 6.16+ Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/c1e61744-8db4-4646-867f-241b47d30386@gmail.com	2025-11-20 11:36:01 +08:00
Wei Fang	e31a11be41	net: phylink: add missing supported link modes for the fixed-link Pause, Asym_Pause and Autoneg bits are not set when pl->supported is initialized, so these link modes will not work for the fixed-link. This leads to a TCP performance degradation issue observed on the i.MX943 platform. The switch CPU port of i.MX943 is connected to an ENETC MAC, this link is a fixed link and the link speed is 2.5Gbps. And one of the switch user ports is the RGMII interface, and its link speed is 1Gbps. If the flow-control of the fixed link is not enabled, we can easily observe the iperf performance of TCP packets is very low. Because the inbound rate on the CPU port is greater than the outbound rate on the user port, the switch is prone to congestion, leading to the loss of some TCP packets and requiring multiple retransmissions. Solving this problem should be as simple as setting the Asym_Pause and Pause bits. The reason why the Autoneg bit needs to be set, Russell has gave a very good explanation in the thread [1], see below. "As the advertising and lp_advertising bitmasks have to be non-empty, and the swphy reports aneg capable, aneg complete, and AN enabled, then for consistency with that state, Autoneg should be set. This is how it was prior to the blamed commit." Fixes: `de7d3f87be` ("net: phylink: Use phy_caps_lookup for fixed-link configuration") Link: https://lore.kernel.org/aRjqLN8eQDIQfBjS@shell.armlinux.org.uk # [1] Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20251117102943.1862680-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 08:31:11 -08:00
Pradyumn Rahar	d47515af6c	net/mlx5: Clean up only new IRQ glue on request_irq() failure The mlx5_irq_alloc() function can inadvertently free the entire rmap and end up in a crash[1] when the other threads tries to access this, when request_irq() fails due to exhausted IRQ vectors. This commit modifies the cleanup to remove only the specific IRQ mapping that was just added. This prevents removal of other valid mappings and ensures precise cleanup of the failed IRQ allocation's associated glue object. Note: This error is observed when both fwctl and rds configs are enabled. [1] mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1 mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to request irq. err = -28 infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while trying to test write-combining support mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1 mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1 mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to request irq. err = -28 infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while trying to test write-combining support mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1 mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to request irq. err = -28 mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to request irq. err = -28 general protection fault, probably for non-canonical address 0xe277a58fde16f291: 0000 [#1] SMP NOPTI RIP: 0010:free_irq_cpu_rmap+0x23/0x7d Call Trace: <TASK> ? show_trace_log_lvl+0x1d6/0x2f9 ? show_trace_log_lvl+0x1d6/0x2f9 ? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core] ? __die_body.cold+0x8/0xa ? die_addr+0x39/0x53 ? exc_general_protection+0x1c4/0x3e9 ? dev_vprintk_emit+0x5f/0x90 ? asm_exc_general_protection+0x22/0x27 ? free_irq_cpu_rmap+0x23/0x7d mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core] irq_pool_request_vector+0x7d/0x90 [mlx5_core] mlx5_irq_request+0x2e/0xe0 [mlx5_core] mlx5_irq_request_vector+0xad/0xf7 [mlx5_core] comp_irq_request_pci+0x64/0xf0 [mlx5_core] create_comp_eq+0x71/0x385 [mlx5_core] ? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core] mlx5_comp_eqn_get+0x72/0x90 [mlx5_core] ? xas_load+0x8/0x91 mlx5_comp_irqn_get+0x40/0x90 [mlx5_core] mlx5e_open_channel+0x7d/0x3c7 [mlx5_core] mlx5e_open_channels+0xad/0x250 [mlx5_core] mlx5e_open_locked+0x3e/0x110 [mlx5_core] mlx5e_open+0x23/0x70 [mlx5_core] __dev_open+0xf1/0x1a5 __dev_change_flags+0x1e1/0x249 dev_change_flags+0x21/0x5c do_setlink+0x28b/0xcc4 ? __nla_parse+0x22/0x3d ? inet6_validate_link_af+0x6b/0x108 ? cpumask_next+0x1f/0x35 ? __snmp6_fill_stats64.constprop.0+0x66/0x107 ? __nla_validate_parse+0x48/0x1e6 __rtnl_newlink+0x5ff/0xa57 ? kmem_cache_alloc_trace+0x164/0x2ce rtnl_newlink+0x44/0x6e rtnetlink_rcv_msg+0x2bb/0x362 ? __netlink_sendskb+0x4c/0x6c ? netlink_unicast+0x28f/0x2ce ? rtnl_calcit.isra.0+0x150/0x146 netlink_rcv_skb+0x5f/0x112 netlink_unicast+0x213/0x2ce netlink_sendmsg+0x24f/0x4d9 __sock_sendmsg+0x65/0x6a ____sys_sendmsg+0x28f/0x2c9 ? import_iovec+0x17/0x2b ___sys_sendmsg+0x97/0xe0 __sys_sendmsg+0x81/0xd8 do_syscall_64+0x35/0x87 entry_SYSCALL_64_after_hwframe+0x6e/0x0 RIP: 0033:0x7fc328603727 Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48 RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727 RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc </TASK> ---[ end trace f43ce73c3c2b13a2 ]--- RIP: 0010:free_irq_cpu_rmap+0x23/0x7d Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00 74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31 f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff RSP: 0018:ff384881640eaca0 EFLAGS: 00010282 RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400 RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88 R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8 FS: 00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) kvm-guest: disable async PF for cpu 0 Fixes: `3354822cde` ("net/mlx5: Use dynamic msix vectors allocation") Signed-off-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com> Tested-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763381768-1234998-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-18 18:47:09 -08:00
Grzegorz Nitka	23a5b9b12d	ice: fix PTP cleanup on driver removal in error path Improve the cleanup on releasing PTP resources in error path. The error case might happen either at the driver probe and PTP feature initialization or on PTP restart (errors in reset handling, NVM update etc). In both cases, calls to PF PTP cleanup (ice_ptp_cleanup_pf function) and 'ps_lock' mutex deinitialization were missed. Additionally, ptp clock was not unregistered in the latter case. Keep PTP state as 'uninitialized' on init to distinguish between error scenarios and to avoid resource release duplication at driver removal. The consequence of missing ice_ptp_cleanup_pf call is the following call trace dumped when ice_adapter object is freed (port list is not empty, as it is required at this stage): [ T93022] ------------[ cut here ]------------ [ T93022] WARNING: CPU: 10 PID: 93022 at ice/ice_adapter.c:67 ice_adapter_put+0xef/0x100 [ice] ... [ T93022] RIP: 0010:ice_adapter_put+0xef/0x100 [ice] ... [ T93022] Call Trace: [ T93022] <TASK> [ T93022] ? ice_adapter_put+0xef/0x100 [ice 33d2647ad4f6d866d41eefff1806df37c68aef0c] [ T93022] ? __warn.cold+0xb0/0x10e [ T93022] ? ice_adapter_put+0xef/0x100 [ice 33d2647ad4f6d866d41eefff1806df37c68aef0c] [ T93022] ? report_bug+0xd8/0x150 [ T93022] ? handle_bug+0xe9/0x110 [ T93022] ? exc_invalid_op+0x17/0x70 [ T93022] ? asm_exc_invalid_op+0x1a/0x20 [ T93022] ? ice_adapter_put+0xef/0x100 [ice 33d2647ad4f6d866d41eefff1806df37c68aef0c] [ T93022] pci_device_remove+0x42/0xb0 [ T93022] device_release_driver_internal+0x19f/0x200 [ T93022] driver_detach+0x48/0x90 [ T93022] bus_remove_driver+0x70/0xf0 [ T93022] pci_unregister_driver+0x42/0xb0 [ T93022] ice_module_exit+0x10/0xdb0 [ice 33d2647ad4f6d866d41eefff1806df37c68aef0c] ... [ T93022] ---[ end trace 0000000000000000 ]--- [ T93022] ice: module unloaded Fixes: `e800654e85` ("ice: Use ice_adapter for PTP shared data instead of auxdev") Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2025-11-18 14:30:11 -08:00
Emil Tantilov	118082368c	idpf: fix possible vport_config NULL pointer deref in remove Attempting to remove the driver will cause a crash in cases where the vport failed to initialize. Following trace is from an instance where the driver failed during an attempt to create a VF: [ 1661.543624] idpf 0000:84:00.7: Device HW Reset initiated [ 1722.923726] idpf 0000:84:00.7: Transaction timed-out (op:1 cookie:2900 vc_op:1 salt:29 timeout:60000ms) [ 1723.353263] BUG: kernel NULL pointer dereference, address: 0000000000000028 ... [ 1723.358472] RIP: 0010:idpf_remove+0x11c/0x200 [idpf] ... [ 1723.364973] Call Trace: [ 1723.365475] <TASK> [ 1723.365972] pci_device_remove+0x42/0xb0 [ 1723.366481] device_release_driver_internal+0x1a9/0x210 [ 1723.366987] pci_stop_bus_device+0x6d/0x90 [ 1723.367488] pci_stop_and_remove_bus_device+0x12/0x20 [ 1723.367971] pci_iov_remove_virtfn+0xbd/0x120 [ 1723.368309] sriov_disable+0x34/0xe0 [ 1723.368643] idpf_sriov_configure+0x58/0x140 [idpf] [ 1723.368982] sriov_numvfs_store+0xda/0x1c0 Avoid the NULL pointer dereference by adding NULL pointer check for vport_config[i], before freeing user_config.q_coalesce. Fixes: `e1e3fec3e3` ("idpf: preserve coalescing settings across resets") Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Chittim Madhu <madhu.chittim@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2025-11-18 13:22:11 -08:00
Florian Fuchs	0f08f0b0fb	net: ps3_gelic_net: handle skb allocation failures Handle skb allocation failures in RX path, to avoid NULL pointer dereference and RX stalls under memory pressure. If the refill fails with -ENOMEM, complete napi polling and wake up later to retry via timer. Also explicitly re-enable RX DMA after oom, so the dmac doesn't remain stopped in this situation. Previously, memory pressure could lead to skb allocation failures and subsequent Oops like: Oops: Kernel access of bad area, sig: 11 [#2] Hardware name: SonyPS3 Cell Broadband Engine 0x701000 PS3 NIP [c0003d0000065900] gelic_net_poll+0x6c/0x2d0 [ps3_gelic] (unreliable) LR [c0003d00000659c4] gelic_net_poll+0x130/0x2d0 [ps3_gelic] Call Trace: gelic_net_poll+0x130/0x2d0 [ps3_gelic] (unreliable) __napi_poll+0x44/0x168 net_rx_action+0x178/0x290 Steps to reproduce the issue: 1. Start a continuous network traffic, like scp of a 20GB file 2. Inject failslab errors using the kernel fault injection: echo -1 > /sys/kernel/debug/failslab/times echo 30 > /sys/kernel/debug/failslab/interval echo 100 > /sys/kernel/debug/failslab/probability 3. After some time, traces start to appear, kernel Oopses and the system stops Step 2 is not always necessary, as it is usually already triggered by the transfer of a big enough file. Fixes: `02c1889166` ("ps3: gigabit ethernet driver for PS3, take3") Signed-off-by: Florian Fuchs <fuchsfl@gmail.com> Link: https://patch.msgid.link/20251113181000.3914980-1-fuchsfl@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-18 12:31:09 +01:00
Pavel Zhigulin	896f1a2493	net: qlogic/qede: fix potential out-of-bounds read in qede_tpa_cont() and qede_tpa_end() The loops in 'qede_tpa_cont()' and 'qede_tpa_end()', iterate over 'cqe->len_list[]' using only a zero-length terminator as the stopping condition. If the terminator was missing or malformed, the loop could run past the end of the fixed-size array. Add an explicit bound check using ARRAY_SIZE() in both loops to prevent a potential out-of-bounds access. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `55482edc25` ("qede: Add slowpath/fastpath support and enable hardware GRO") Signed-off-by: Pavel Zhigulin <Pavel.Zhigulin@kaspersky.com> Link: https://patch.msgid.link/20251113112757.4166625-1-Pavel.Zhigulin@kaspersky.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-18 11:09:58 +01:00
Lorenzo Bianconi	8e0a754b08	net: airoha: Do not loopback traffic to GDM2 if it is available on the device Airoha_eth driver forwards offloaded uplink traffic (packets received on GDM1 and forwarded to GDM{3,4}) to GDM2 in order to apply hw QoS. This is correct if the device does not support a dedicated GDM2 port. In this case, in order to enable hw offloading for uplink traffic, the packets should be sent to GDM{3,4} directly. Fixes: `9cd451d414` ("net: airoha: Add loopback support for GDM2") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20251113-airoha-hw-offload-gdm2-fix-v1-1-7e4ca300872f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-17 20:00:07 -08:00
Jesper Dangaard Brouer	5442a9da69	veth: more robust handing of race to avoid txq getting stuck Commit `dc82a33297` ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") introduced a race condition that can lead to a permanently stalled TXQ. This was observed in production on ARM64 systems (Ampere Altra Max). The race occurs in veth_xmit(). The producer observes a full ptr_ring and stops the queue (netif_tx_stop_queue()). The subsequent conditional logic, intended to re-wake the queue if the consumer had just emptied it (if (__ptr_ring_empty(...)) netif_tx_wake_queue()), can fail. This leads to a "lost wakeup" where the TXQ remains stopped (QUEUE_STATE_DRV_XOFF) and traffic halts. This failure is caused by an incorrect use of the __ptr_ring_empty() API from the producer side. As noted in kernel comments, this check is not guaranteed to be correct if a consumer is operating on another CPU. The empty test is based on ptr_ring->consumer_head, making it reliable only for the consumer. Using this check from the producer side is fundamentally racy. This patch fixes the race by adopting the more robust logic from an earlier version V4 of the patchset, which always flushed the peer: (1) In veth_xmit(), the racy conditional wake-up logic and its memory barrier are removed. Instead, after stopping the queue, we unconditionally call __veth_xdp_flush(rq). This guarantees that the NAPI consumer is scheduled, making it solely responsible for re-waking the TXQ. This handles the race where veth_poll() consumes all packets and completes NAPI before veth_xmit() on the producer side has called netif_tx_stop_queue. The __veth_xdp_flush(rq) will observe rx_notify_masked is false and schedule NAPI. (2) On the consumer side, the logic for waking the peer TXQ is moved out of veth_xdp_rcv() and placed at the end of the veth_poll() function. This placement is part of fixing the race, as the netif_tx_queue_stopped() check must occur after rx_notify_masked is potentially set to false during NAPI completion. This handles the race where veth_poll() consumes all packets, but haven't finished (rx_notify_masked is still true). The producer veth_xmit() stops the TXQ and __veth_xdp_flush(rq) will observe rx_notify_masked is true, meaning not starting NAPI. Then veth_poll() change rx_notify_masked to false and stops NAPI. Before exiting veth_poll() will observe TXQ is stopped and wake it up. Fixes: `dc82a33297` ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/176295323282.307447.14790015927673763094.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-14 18:16:53 -08:00
Pavel Zhigulin	b0c959fec1	net: mlxsw: linecards: fix missing error check in mlxsw_linecard_devlink_info_get() The call to devlink_info_version_fixed_put() in mlxsw_linecard_devlink_info_get() did not check for errors, although it is checked everywhere in the code. Add missed 'err' check to the mlxsw_linecard_devlink_info_get() Fixes: `3fc0c51905` ("mlxsw: core_linecards: Expose device PSID over device info") Signed-off-by: Pavel Zhigulin <Pavel.Zhigulin@kaspersky.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251113161922.813828-1-Pavel.Zhigulin@kaspersky.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-14 18:01:32 -08:00
Pavel Zhigulin	e6751b0b19	net: dsa: hellcreek: fix missing error handling in LED registration The LED setup routine registered both led_sync_good and led_is_gm devices without checking the return values of led_classdev_register(). If either registration failed, the function continued silently, leaving the driver in a partially-initialized state and leaking a registered LED classdev. Add proper error handling Fixes: `7d9ee2e8ff` ("net: dsa: hellcreek: Add PTP status LEDs") Signed-off-by: Pavel Zhigulin <Pavel.Zhigulin@kaspersky.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://patch.msgid.link/20251113135745.92375-1-Pavel.Zhigulin@kaspersky.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-14 17:46:32 -08:00
Zilin Guan	407a06507c	mlxsw: spectrum: Fix memory leak in mlxsw_sp_flower_stats() The function mlxsw_sp_flower_stats() calls mlxsw_sp_acl_ruleset_get() to obtain a ruleset reference. If the subsequent call to mlxsw_sp_acl_rule_lookup() fails to find a rule, the function returns an error without releasing the ruleset reference, causing a memory leak. Fix this by using a goto to the existing error handling label, which calls mlxsw_sp_acl_ruleset_put() to properly release the reference. Fixes: `7c1b8eb175` ("mlxsw: spectrum: Add support for TC flower offload statistics") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251112052114.1591695-1-zilin@seu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-13 17:25:33 -08:00
Xuan Zhuo	0eff2eaa53	virtio-net: fix incorrect flags recording in big mode The purpose of commit `703eec1b24` ("virtio_net: fixing XDP for fully checksummed packets handling") is to record the flags in advance, as their value may be overwritten in the XDP case. However, the flags recorded under big mode are incorrect, because in big mode, the passed buf does not point to the rx buffer, but rather to the page of the submitted buffer. This commit fixes this issue. For the small mode, the commit `c11a49d58a` ("virtio_net: Fix mismatched buf address when unmapping for small packets") fixed it. Tested-by: Alyssa Ross <hi@alyssa.is> Fixes: `703eec1b24` ("virtio_net: fixing XDP for fully checksummed packets handling") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251111090828.23186-1-xuanzhuo@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-13 13:16:30 +01:00
Jakub Kicinski	fe82c4f8a2	Merge tag 'wireless-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple more fixes: - mwl8k: work around FW expecting a DSSS element in beacons - ath11k: report correct TX status - iwlwifi: avoid toggling links due to wrong element use - iwlwifi: fix beacon template rate on older devices - iwlwifi: fix loop iterator being used after loop - mac80211: disallow address changes while using the address - mac80211: avoid bad rate warning in monitor/sniffer mode - hwsim: fix potential NULL deref (on monitor injection) * tag 'wireless-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: mld: always take beacon ies in link grading wifi: iwlwifi: mvm: fix beacon template/fixed rate wifi: iwlwifi: fix aux ROC time event iterator usage wifi: mwl8k: inject DSSS Parameter Set element into beacons if missing wifi: mac80211_hwsim: Fix possible NULL dereference wifi: mac80211: skip rate verification for not captured PSDUs wifi: mac80211: reject address change while connecting wifi: ath11k: zero init info->status in wmi_process_mgmt_tx_comp() ==================== Link: https://patch.msgid.link/20251112114621.15716-5-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-12 09:33:09 -08:00
Miri Korenblit	1a222625b4	wifi: iwlwifi: mld: always take beacon ies in link grading One of the factors of a link's grade is the channel load, which is calculated from the AP's bss load element. The current code takes this element from the beacon for an active link, and from bss->ies for an inactive link. bss->ies is set to either the beacon's ies or to the probe response ones, with preference to the probe response (meaning that if there was even one probe response, the ies of it will be stored in bss->ies and won't be overiden by the beacon ies). The probe response can be very old, i.e. from the connection time, where a beacon is updated before each link selection (which is triggered only after a passive scan). In such case, the bss load element in the probe response will not include the channel load caused by the STA, where the beacon will. This will cause the inactive link to always have a lower channel load, and therefore an higher grade than the active link's one. This causes repeated link switches, causing the throughput to drop. Fix this by always taking the ies from the beacon, as those are for sure new. Fixes: `d1e879ec60` ("wifi: iwlwifi: add iwlmld sub-driver") Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251110145652.b493dbb1853a.I058ba7309c84159f640cc9682d1bda56dd56a536@changeid Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>	2025-11-12 09:54:46 +02:00
Johannes Berg	3592c0083f	wifi: iwlwifi: mvm: fix beacon template/fixed rate During the development of the rate changes, I evidently made some changes that shouldn't have been there; beacon templates with rate_n_flags are only in old versions, so no changes to them should have been necessary, and evidently broke on some devices. This also would have broken fixed (injection) rates, it would seem. Restore the old handling of this. Fixes: `dabc88cb3b` ("wifi: iwlwifi: handle v3 rates") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220558 Reviewed-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20251008112044.3bb8ea849d8d.I90f4d2b2c1f62eaedaf304a61d2ab9e50c491c2d@changeid Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>	2025-11-12 09:54:46 +02:00
Junjie Cao	f4c737d449	wifi: iwlwifi: fix aux ROC time event iterator usage The list_for_each_entry() iterator must not be used outside the loop. Even though we break and check for NULL, doing so still violates kernel iteration rules and triggers Coccinelle's use_after_iter.cocci warning. Cache the matched entry in aux_roc_te and use it consistently after the loop. This follows iterator best practices, resolves the warning, and makes the code more maintainable. Signed-off-by: Junjie Cao <junjie.cao@intel.com> Link: https://patch.msgid.link/20251016014919.383565-1-junjie.cao@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>	2025-11-12 09:54:46 +02:00
Akiva Goldberger	e5eba42f01	mlx5: Fix default values in create CQ Currently, CQs without a completion function are assigned the mlx5_add_cq_to_tasklet function by default. This is problematic since only user CQs created through the mlx5_ib driver are intended to use this function. Additionally, all CQs that will use doorbells instead of polling for completions must call mlx5_cq_arm. However, the default CQ creation flow leaves a valid value in the CQ's arm_db field, allowing FW to send interrupts to polling-only CQs in certain corner cases. These two factors would allow a polling-only kernel CQ to be triggered by an EQ interrupt and call a completion function intended only for user CQs, causing a null pointer exception. Some areas in the driver have prevented this issue with one-off fixes but did not address the root cause. This patch fixes the described issue by adding defaults to the create CQ flow. It adds a default dummy completion function to protect against null pointer exceptions, and it sets an invalid command sequence number by default in kernel CQs to prevent the FW from sending an interrupt to the CQ until it is armed. User CQs are responsible for their own initialization values. Callers of mlx5_core_create_cq are responsible for changing the completion function and arming the CQ per their needs. Fixes: `cdd04f4d4d` ("net/mlx5: Add support to create SQ and CQ for ASO") Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Leon Romanovsky <leon@kernel.org> Link: https://patch.msgid.link/1762681743-1084694-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-11 15:12:18 +01:00
Gal Pressman	9fcc2b6c10	net/mlx5e: Fix potentially misleading debug message Change the debug message to print the correct units instead of always assuming Gbps, as the value can be in either 100 Mbps or 1 Gbps units. Fixes: `5da8bc3eff` ("net/mlx5e: DCBNL, Add debug messages log") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1762681073-1084058-6-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-11 15:05:44 +01:00
Gal Pressman	43b27d1bd8	net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps Add validation to reject rates exceeding 255 Gbps that would overflow the 8 bits max bandwidth field. Fixes: `d8880795da` ("net/mlx5e: Implement DCBNL IEEE max rate") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1762681073-1084058-5-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-11 15:05:44 +01:00
Gal Pressman	a7bf4d5063	net/mlx5e: Fix maxrate wraparound in threshold between units The previous calculation used roundup() which caused an overflow for rates between 25.5Gbps and 26Gbps. For example, a rate of 25.6Gbps would result in using 100Mbps units with value of 256, which would overflow the 8 bits field. Simplify the upper_limit_mbps calculation by removing the unnecessary roundup, and adjust the comparison to use <= to correctly handle the boundary condition. Fixes: `d8880795da` ("net/mlx5e: Implement DCBNL IEEE max rate") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1762681073-1084058-4-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-11 15:05:44 +01:00
Cosmin Ratiu	2dc768c052	net/mlx5e: Trim the length of the num_doorbell error When trying to set num_doorbells to a value greater than the max number of channels, the error message was going over the netlink limit of 80 chars, truncating the most important part of the message, the number of channels. Fix that by trimming the length a bit. Fixes: `11bbcfb766` ("net/mlx5e: Use the 'num_doorbells' devlink param") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1762681073-1084058-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-11 15:05:44 +01:00
Carolina Jubran	0bcd5b3b50	net/mlx5e: Fix missing error assignment in mlx5e_xfrm_add_state() Assign the return value of mlx5_eswitch_block_mode() to 'err' before checking it to avoid returning an uninitialized error code. Fixes: `22239eb258` ("net/mlx5e: Prevent tunnel reformat when tunnel mode not allowed") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202510271649.uwsIxD6O-lkp@intel.com/ Closes: http://lore.kernel.org/linux-rdma/aPIEK4rLB586FdDt@stanley.mountain/ Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1762681073-1084058-2-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-11 15:05:44 +01:00
Pawel Dembicki	c4e1ac09ee	wifi: mwl8k: inject DSSS Parameter Set element into beacons if missing Some Marvell AP firmware used with mwl8k misbehaves when beacon frames do not contain a WLAN_EID_DS_PARAMS element with the current channel. It was reported on OpenWrt Github issues [0]. When hostapd/mac80211 omits DSSS Parameter Set from the beacon (which is valid on some bands), the firmware stops transmitting sane frames and RX status starts reporting bogus channel information. This makes AP mode unusable. Newer Marvell drivers (mwlwifi [1]) hard-code DSSS Parameter Set into AP beacons for all chips, which suggests this is a firmware requirement rather than a mwl8k-specific quirk. Mirror that behaviour in mwl8k: when setting the beacon, check if WLAN_EID_DS_PARAMS is present, and if not, extend the beacon and inject a DSSS Parameter Set element, using the current channel from hw->conf.chandef.chan. Tested on Linksys EA4500 (88W8366). [0] https://github.com/openwrt/openwrt/issues/19088 [1] `db97edf20f/hif/fwcmd.c (L675)` Fixes: `b64fe619e3` ("mwl8k: basic AP interface support") Tested-by: Antony Kolitsos <zeusomighty@hotmail.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Link: https://patch.msgid.link/20251111100733.2825970-3-paweldembicki@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-11 11:38:57 +01:00
Ilan Peer	eaa7ce66c3	wifi: mac80211_hwsim: Fix possible NULL dereference The 'vif' pointer in the Tx information might be NULL, e.g., in case of injected frames etc. and is not checked in all paths. Fix it. While at it, also directly use the local 'vif' pointer. Fixes: `a37a6f5443` ("wifi: mac80211_hwsim: Add simulation support for NAN device") Signed-off-by: Ilan Peer <ilan.peer@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/linux-wireless/aNJUlyIiSTW9zZdr@stanley.mountain Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251110140128.ec00ae795a32.I9c65659b52434189d8b2ba06710d482669a3887a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-11 09:25:18 +01:00
Johannes Berg	2027e3bcbf	Merge tag 'ath-current-20251110' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath Jeff Johnson says: ================== ath.git update for v6.18-rc6 Fix an ath11k transmit status reporting issue. This issue has always been present, but not reported until recently. Bringing this through the current release since there is now a userspace entity that wants to leverage this. ================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-11 09:25:11 +01:00
Buday Csaba	e6ca8f533e	net: mdio: fix resource leak in mdiobus_register_device() Fix a possible leak in mdiobus_register_device() when both a reset-gpio and a reset-controller are present. Clean up the already claimed reset-gpio, when the registration of the reset-controller fails, so when an error code is returned, the device retains its state before the registration attempt. Link: https://lore.kernel.org/all/20251106144603.39053c81@kernel.org/ Fixes: `71dd6c0dff` ("net: phy: add support for reset-controller") Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Link: https://patch.msgid.link/4b419377f8dd7d2f63f919d0f74a336c734f8fff.1762584481.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 18:15:13 -08:00
Aksh Garg	d4b00d132d	net: ethernet: ti: am65-cpsw-qos: fix IET verify retry mechanism The am65_cpsw_iet_verify_wait() function attempts verification 20 times, toggling the AM65_CPSW_PN_IET_MAC_LINKFAIL bit in each iteration. When the LINKFAIL bit transitions from 1 to 0, the MAC merge layer initiates the verification process and waits for the timeout configured in MAC_VERIFY_CNT before automatically retransmitting. The MAC_VERIFY_CNT register is configured according to the user-defined verify/response timeout in am65_cpsw_iet_set_verify_timeout_count(). As per IEEE 802.3 Clause 99, the hardware performs this automatic retry up to 3 times. Current implementation toggles LINKFAIL after the user-configured verify/response timeout in each iteration, forcing the hardware to restart verification instead of respecting the MAC_VERIFY_CNT timeout. This bypasses the hardware's automatic retry mechanism. Fix this by moving the LINKFAIL bit toggle outside the retry loop and reducing the retry count from 20 to 3. The software now only monitors the status register while the hardware autonomously handles the 3 verification attempts at proper MAC_VERIFY_CNT intervals. Fixes: `49a2eb9068` ("net: ethernet: ti: am65-cpsw-qos: Add Frame Preemption MAC Merge support") Signed-off-by: Aksh Garg <a-garg7@ti.com> Link: https://patch.msgid.link/20251106092305.1437347-3-a-garg7@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 18:00:40 -08:00
Aksh Garg	49b3916465	net: ethernet: ti: am65-cpsw-qos: fix IET verify/response timeout The CPSW module uses the MAC_VERIFY_CNT bit field in the CPSW_PN_IET_VERIFY_REG_k register to set the verify/response timeout count. This register specifies the number of clock cycles to wait before resending a verify packet if the verification fails. The verify/response timeout count, as being set by the function am65_cpsw_iet_set_verify_timeout_count() is hardcoded for 125MHz clock frequency, which varies based on PHY mode and link speed. The respective clock frequencies are as follows: - RGMII mode: * 1000 Mbps: 125 MHz * 100 Mbps: 25 MHz * 10 Mbps: 2.5 MHz - QSGMII/SGMII mode: 125 MHz (all speeds) Fix this by adding logic to calculate the correct timeout counts based on the actual PHY interface mode and link speed. Fixes: `49a2eb9068` ("net: ethernet: ti: am65-cpsw-qos: Add Frame Preemption MAC Merge support") Signed-off-by: Aksh Garg <a-garg7@ti.com> Link: https://patch.msgid.link/20251106092305.1437347-2-a-garg7@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 18:00:40 -08:00
Nicolas Dichtel	2554559aba	bonding: fix mii_status when slave is down netif_carrier_ok() doesn't check if the slave is up. Before the below commit, netif_running() was also checked. Fixes: `23a6037ce7` ("bonding: Remove support for use_carrier") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20251106180252.3974772-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:51:30 -08:00
Horatiu Vultur	96a9178a29	net: phy: micrel: lan8814 fix reset of the QSGMII interface The lan8814 is a quad-phy and it is using QSGMII towards the MAC. The problem is that everytime when one of the ports is configured then the PCS is reseted for all the PHYs. Meaning that the other ports can loose traffic until the link is establish again. To fix this, do the reset one time for the entire PHY package. Fixes: `ece1950283` ("net: phy: micrel: 1588 support for LAN8814 phy") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Divya Koppera <Divya.Koppera@microchip.com > Link: https://patch.msgid.link/20251106090637.2030625-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-07 19:00:38 -08:00
Wei Fang	ad17e7e92a	net: fec: correct rx_bytes statistic for the case SHIFT16 is set Two additional bytes in front of each frame received into the RX FIFO if SHIFT16 is set, so we need to subtract the extra two bytes from pkt_len to correct the statistic of rx_bytes. Fixes: `3ac72b7b63` ("net: fec: align IP header in hardware") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20251106021421.2096585-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-07 18:55:27 -08:00
Horatiu Vultur	0216721ce7	lan966x: Fix sleeping in atomic context The following warning was seen when we try to connect using ssh to the device. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:575 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 104, name: dropbear preempt_count: 1, expected: 0 INFO: lockdep is turned off. CPU: 0 UID: 0 PID: 104 Comm: dropbear Tainted: G W 6.18.0-rc2-00399-g6f1ab1b109b9-dirty #530 NONE Tainted: [W]=WARN Hardware name: Generic DT based system Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x7c/0xac dump_stack_lvl from __might_resched+0x16c/0x2b0 __might_resched from __mutex_lock+0x64/0xd34 __mutex_lock from mutex_lock_nested+0x1c/0x24 mutex_lock_nested from lan966x_stats_get+0x5c/0x558 lan966x_stats_get from dev_get_stats+0x40/0x43c dev_get_stats from dev_seq_printf_stats+0x3c/0x184 dev_seq_printf_stats from dev_seq_show+0x10/0x30 dev_seq_show from seq_read_iter+0x350/0x4ec seq_read_iter from seq_read+0xfc/0x194 seq_read from proc_reg_read+0xac/0x100 proc_reg_read from vfs_read+0xb0/0x2b0 vfs_read from ksys_read+0x6c/0xec ksys_read from ret_fast_syscall+0x0/0x1c Exception stack(0xf0b11fa8 to 0xf0b11ff0) 1fa0: 00000001 00001000 00000008 be9048d8 00001000 00000001 1fc0: 00000001 00001000 00000008 00000003 be905920 0000001e 00000000 00000001 1fe0: 0005404c be9048c0 00018684 b6ec2cd8 It seems that we are using a mutex in a atomic context which is wrong. Change the mutex with a spinlock. Fixes: `12c2d0a5b8` ("net: lan966x: add ethtool configuration and statistics") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251105074955.1766792-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 07:31:34 -08:00
Nicolas Escande	9065b96875	wifi: ath11k: zero init info->status in wmi_process_mgmt_tx_comp() When reporting tx completion using ieee80211_tx_status_xxx() family of functions, the status part of the struct ieee80211_tx_info nested in the skb is used to report things like transmit rates & retry count to mac80211 On the TX data path, this is correctly memset to 0 before calling ieee80211_tx_status_ext(), but on the tx mgmt path this was not done. This leads to mac80211 treating garbage values as valid transmit counters (like tx retries for example) and accounting them as real statistics that makes their way to userland via station dump. The same issue was resolved in ath12k by commit `9903c0986f` ("wifi: ath12k: Add memset and update default rate value in wmi tx completion") Tested-on: QCN9074 PCI WLAN.HK.2.9.0.1-01977-QCAHKSWPL_SILICONZ-1 Fixes: `d5c65159f2` ("ath11k: driver for Qualcomm IEEE 802.11ax devices") Signed-off-by: Nicolas Escande <nico.escande@gmail.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Link: https://patch.msgid.link/20251104083957.717825-1-nico.escande@gmail.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>	2025-11-06 07:26:21 -08:00
Hangbin Liu	067bf016e9	bonding: fix NULL pointer dereference in actor_port_prio setting Liang reported an issue where setting a slave’s actor_port_prio to predefined values such as 0, 255, or 65535 would cause a system crash. The problem occurs because in bond_opt_parse(), when the provided value matches a predefined table entry, the function returns that table entry, which does not contain slave information. Later, in bond_option_actor_port_prio_set(), calling bond_slave_get_rtnl() leads to a NULL pointer dereference. Since actor_port_prio is defined as a u16 and initialized to the default value of 255 in ad_initialize_port(), there is no need for the bond_actor_port_prio_tbl. Using the BOND_OPTFLAG_RAWVAL flag is sufficient. Fixes: `6b6dc81ee7` ("bonding: add support for per-port LACP actor priority") Reported-by: Liang Li <liali@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20251105072620.164841-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 07:16:37 -08:00
Tristram Ha	96baf482ca	net: dsa: microchip: Fix reserved multicast address table programming KSZ9477/KSZ9897 and LAN937X families of switches use a reserved multicast address table for some specific forwarding with some multicast addresses, like the one used in STP. The hardware assumes the host port is the last port in KSZ9897 family and port 5 in LAN937X family. Most of the time this assumption is correct but not in other cases like KSZ9477. Originally the function just setups the first entry, but the others still need update, especially for one common multicast address that is used by PTP operation. LAN937x also uses different register bits when accessing the reserved table. Fixes: `457c182af5` ("net: dsa: microchip: generic access to ksz9477 static and reserved table") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Tested-by: Łukasz Majewski <lukma@nabladev.com> Link: https://patch.msgid.link/20251105033741.6455-1-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 07:11:36 -08:00
Jakub Kicinski	7d1988a943	Merge tag 'wireless-2025-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Just two small fixes: - ath12k: revert a change that caused performance regressions - hwsim: don't ignore netns on netlink socket matching * tag 'wireless-2025-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup Revert "wifi: ath12k: Fix missing station power save configuration" ==================== Link: https://patch.msgid.link/20251105152827.53254-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-05 18:04:55 -08:00
Haotian Zhang	4d6ec3a793	net: wan: framer: pef2256: Switch to devm_mfd_add_devices() The driver calls mfd_add_devices() but fails to call mfd_remove_devices() in error paths after successful MFD device registration and in the remove function. This leads to resource leaks where MFD child devices are not properly unregistered. Replace mfd_add_devices with devm_mfd_add_devices to automatically manage the device resources. Fixes: `c96e976d9a` ("net: wan: framer: Add support for the Lantiq PEF2256 framer") Suggested-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Acked-by: Herve Codina <herve.codina@bootlin.com> Link: https://patch.msgid.link/20251105034716.662-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-05 18:02:34 -08:00
Jiawen Wu	a04ea57aae	net: libwx: fix device bus LAN ID The device bus LAN ID was obtained from PCI_FUNC(), but when a PF port is passthrough to a virtual machine, the function number may not match the actual port index on the device. This could cause the driver to perform operations such as LAN reset on the wrong port. Fix this by reading the LAN ID from port status register. Fixes: `a34b3e6ed8` ("net: txgbe: Store PCI info") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/B60A670C1F52CB8E+20251104062321.40059-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-05 17:52:13 -08:00
Dragos Tatulea	d8a7ed9586	net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pages The MLX5E_SHAMPO_WQ_HEADER_PER_PAGE and MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE macros are used directly in several places under the assumption that there will always be more headers per WQE than headers per page. However, this assumption doesn't hold for 64K page sizes and higher MTUs (> 4K). This can be first observed during header page allocation: ksm_entries will become 0 during alignment to MLX5E_SHAMPO_WQ_HEADER_PER_PAGE. This patch introduces 2 additional members to the mlx5e_shampo_hd struct which are meant to be used instead of the macrose mentioned above. When the number of headers per WQE goes below MLX5E_SHAMPO_WQ_HEADER_PER_PAGE, clamp the number of headers per page and expand the header size accordingly so that the headers for one WQE cover a full page. All the formulas are adapted to use these two new members. Fixes: `945ca432bf` ("net/mlx5e: SHAMPO, Drop info array") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1762238915-1027590-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-05 17:48:37 -08:00
Dragos Tatulea	bacd8d8018	net/mlx5e: SHAMPO, Fix skb size check for 64K pages mlx5e_hw_gro_skb_has_enough_space() uses a formula to check if there is enough space in the skb frags to store more data. This formula is incorrect for 64K page sizes and it triggers early GRO session termination because the first fragment will blow up beyond GRO_LEGACY_MAX_SIZE. This patch adds a special case for page sizes >= GRO_LEGACY_MAX_SIZE (64K) which uses the skb->len instead. Within this context, the check is safe from fragment overflow because the hardware will continuously fill the data up to the reservation size of 64K and the driver will coalesce all data from the same page to the same fragment. This means that the data will span one fragment or at most two for such a large page size. It is expected that the if statement will be optimized out as the check is done with constants. Fixes: `92552d3abd` ("net/mlx5e: HW_GRO cqe handler implementation") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1762238915-1027590-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-05 17:48:36 -08:00
Dragos Tatulea	665a7e13c2	net/mlx5e: SHAMPO, Fix header mapping for 64K pages HW-GRO is broken on mlx5 for 64K page sizes. The patch in the fixes tag didn't take into account larger page sizes when doing an align down of max_ksm_entries. For 64K page size, max_ksm_entries is 0 which will skip mapping header pages via WQE UMR. This breaks header-data split and will result in the following syndrome: mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x4c9, ci 0x0, qn 0x1133, opcode 0xe, syndrome 0x4, vendor syndrome 0x32 00000000: 00 00 00 00 04 4a 00 00 00 00 00 00 20 00 93 32 00000010: 55 00 00 00 fb cc 00 00 00 00 00 00 07 18 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4a 00000030: 00 00 3b c7 93 01 32 04 00 00 00 00 00 00 bf e0 mlx5_core 0000:00:08.0 eth2: ERR CQE on RQ: 0x1133 Furthermore, the function that fills in WQE UMRs for the headers (mlx5e_build_shampo_hd_umr()) only supports mapping page sizes that fit in a single UMR WQE. This patch goes back to the old non-aligned max_ksm_entries value and it changes mlx5e_build_shampo_hd_umr() to support mapping a large page over multiple UMR WQEs. This means that mlx5e_build_shampo_hd_umr() can now leave a page only partially mapped. The caller, mlx5e_alloc_rx_hd_mpwqe(), ensures that there are enough UMR WQEs to cover complete pages by working on ksm_entries that are multiples of MLX5E_SHAMPO_WQ_HEADER_PER_PAGE. Fixes: `8a0ee54027` ("net/mlx5e: SHAMPO, Simplify UMR allocation for headers") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1762238915-1027590-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-05 17:48:36 -08:00
Meghana Malladi	ae4789affd	net: ti: icssg-prueth: Fix fdb hash size configuration The ICSSG driver does the initial FDB configuration which includes setting the control registers. Other run time management like learning is managed by the PRU's. The default FDB hash size used by the firmware is 512 slots, which is currently missing in the current driver. Update the driver FDB config to include FDB hash size as well. Please refer trm [1] 6.4.14.12.17 section on how the FDB config register gets configured. From the table 6-1404, there is a reset field for FDB_HAS_SIZE which is 4, meaning 1024 slots. Currently the driver is not updating this reset value from 4(1024 slots) to 3(512 slots). This patch fixes this by updating the reset value to 512 slots. [1]: https://www.ti.com/lit/pdf/spruim2 Fixes: `abd5576b9c` ("net: ti: icssg-prueth: Add support for ICSSG switch firmware") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251104104415.3110537-1-m-malladi@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-05 17:43:08 -08:00
Gal Pressman	d1c94bc5b9	net/mlx5e: Fix return value in case of module EEPROM read error mlx5e_get_module_eeprom_by_page() has weird error handling. First, it is treating -EINVAL as a special case, but it is unclear why. Second, it tries to fail "gracefully" by returning the number of bytes read even in case of an error. This results in wrongly returning success (0 return value) if the error occurs before any bytes were read. Simplify the error handling by returning an error when such occurs. This also aligns with the error handling we have in mlx5e_get_module_eeprom() for the old API. This fixes the following case where the query fails, but userspace ethtool wrongly treats it as success and dumps an output: # ethtool -m eth2 netlink warning: mlx5_core: Query module eeprom by page failed, read 0 bytes, err -5 netlink warning: mlx5_core: Query module eeprom by page failed, read 0 bytes, err -5 Offset Values ------ ------ 0x0000: 00 00 00 00 05 00 04 00 00 00 00 00 05 00 05 00 0x0010: 00 00 00 00 05 00 06 00 50 00 00 00 67 65 20 66 0x0020: 61 69 6c 65 64 2c 20 72 65 61 64 20 30 20 62 79 0x0030: 74 65 73 2c 20 65 72 72 20 2d 35 00 14 00 03 00 0x0040: 08 00 01 00 03 00 00 00 08 00 02 00 1a 00 00 00 0x0050: 14 00 04 00 08 00 01 00 04 00 00 00 08 00 02 00 0x0060: 0e 00 00 00 14 00 05 00 08 00 01 00 05 00 00 00 0x0070: 08 00 02 00 1a 00 00 00 14 00 06 00 08 00 01 00 Fixes: `e109d2b204` ("net/mlx5: Implement get_module_eeprom_by_page()") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Alex Lazar <alazar@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1762265736-1028868-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-05 17:42:37 -08:00
Michal Swiatkowski	b1d16f7c00	libie: depend on DEBUG_FS when building LIBIE_FWLOG LIBIE_FWLOG is unusable without DEBUG_FS. Mark it in Kconfig. Fix build error on ixgbe when DEBUG_FS is not set. To not add another layer of #if IS_ENABLED(LIBIE_FWLOG) in ixgbe fwlog code define debugfs dentry even when DEBUG_FS isn't enabled. In this case the dummy functions of LIBIE_FWLOG will be used, so not initialized dentry isn't a problem. Fixes: `641585bc97` ("ixgbe: fwlog support for e610") Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/lkml/f594c621-f9e1-49f2-af31-23fbcb176058@roeck-us.net/ Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20251104172333.752445-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-05 17:38:03 -08:00
Johannes Berg	4c740c4d8b	Merge tag 'ath-current-20251103' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath Jeff Johnson says: ================== ath.git update for v6.18-rc5 Revert an ath12k change which resulted in a significance performance impact on WCN7850. ================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-05 16:18:48 +01:00

1 2 3 4 5 ...

135744 Commits