linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 04:21:09 -04:00

Author	SHA1	Message	Date
Niklas Söderlund	f5b2772d14	net: ethernet: ravb: Do not check URAM suspension when WoL is active When updating the driver to match latest datasheet to suspend access to URAM when suspending DMA transfers a corner-case was missed, URAM access will not be suspended if WoL is enabled. This lead to the error message (correctly) being triggered as URAM access is not suspended even tho it's requested as part of stopping DMA. Avoid checking if URAM access is suspended and printing the error message if WoL is enabled when we suspend the system, as we know it will not be. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/all/CAMuHMdWnjV%3DHGE1o08zLhUfTgOSene5fYx1J5GG10mB%2BToq8qg@mail.gmail.com/ Fixes: `353d8e7989` ("net: ethernet: ravb: Suspend and resume the transmission flow") Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sai Krishna <saikrishnag@marvell.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-12 18:46:19 -07:00
Zoran Ilievski	2c308cf342	net: atlantic: preserve PCI wake-from-D3 on shutdown when WOL enabled The shutdown handler aq_pci_shutdown() unconditionally calls pci_wake_from_d3(pdev, false), clearing the PCI PME_En bit even when wake-on-LAN has been configured. While aq_nic_shutdown() correctly programs the NIC firmware via aq_nic_set_power() to listen for magic packets, the PCI subsystem will not propagate the resulting PME wake event from D3, so the system never wakes after poweroff. WOL from suspend (S3) is unaffected because aq_suspend_common() does not touch pci_wake_from_d3() and relies on the PM core's wake configuration via device_may_wakeup(). This affects all atlantic-supported NICs (AQC107/108/111/112/113); users have reported that WOL works if the atlantic driver is never loaded, but breaks once it has run its shutdown path. Pass the configured WOL state to pci_wake_from_d3() instead of a literal false, so the PCI PME_En bit is preserved when the user has armed WOL via ethtool. Fixes: `90869ddfef` ("net: aquantia: Implement pci shutdown callback") Cc: stable@vger.kernel.org Signed-off-by: Zoran Ilievski <goodboy@rexbytes.com> Reviewed-by: Sukhdeep Singh <sukhdeeps@marvell.com> Link: https://patch.msgid.link/20260511064002.1857-1-goodboy@rexbytes.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-12 16:41:41 -07:00
Ethan Nelson-Moore	36a8d04a82	net: ethernet: cs89x0: remove stale CONFIG_MACH_MX31ADS reference The legacy ARM board file for MACH_MX31ADS was removed in commit `c93197b004` ("ARM: imx: Remove i.MX31 board files"), but a reference to it remained in the cs89x0 driver. Drop this unused code. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Fixes: `c93197b004` ("ARM: imx: Remove i.MX31 board files") Link: https://patch.msgid.link/20260509023732.42256-1-enelsonmoore@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 15:28:22 +02:00
Linus Walleij	ebd8ec2b30	net: ethernet: cortina: Carry over frag counter The gmac_rx() NAPI poll function assembles packets in an SKB from a ring buffer. If the ring buffer gets completely emptied during a poll cycle, we exit gmac_rx(), but the packet is not yet completely assembled in the SKB, yet the fragment counter frag_nr is reset to zero on the next invocation. Solve this by making the RX fragment counter a part of the port struct, and carry it over between invocations. Reset the fragment counter only right after calling napi_gro_frags(), on error (after calling napi_free_frags()) or if stopping the port. Reset it in some place where not strictly necessary just to emphasize what is going on. This was found by Sashiko during normal patch review. Fixes: `4d5ae32f5e` ("net: ethernet: Add a driver for Gemini gigabit ethernet") Link: https://sashiko.dev/#/patchset/20260505-gemini-ethernet-fix-v2-1-997c31d06079%40kernel.org Signed-off-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260509-gemini-ethernet-fixes-v1-3-6c5d20ddc35b@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 15:20:16 +02:00
Linus Walleij	06937db21e	net: ethernet: cortina: Make RX SKB per-port The SKB used to assemble packets from fragments in gmac_rx() is static local, but the Gemini has two ethernet ports, meaning there can be races between the ports on a bad day if a device is using both. Make the RX SKB a per-port variable and carry it over between invocations in the port struct instead. Zero the pointer once we call napi_gro_frags(), on error (after calling napi_free_frags()) or if the port is stopped. Zero it in some place where not strictly necessary just to emphasize what is going on. This was found by Sashiko during normal patch review. Fixes: `4d5ae32f5e` ("net: ethernet: Add a driver for Gemini gigabit ethernet") Link: https://sashiko.dev/#/patchset/20260505-gemini-ethernet-fix-v2-1-997c31d06079%40kernel.org Signed-off-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260509-gemini-ethernet-fixes-v1-2-6c5d20ddc35b@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 15:20:16 +02:00
Linus Walleij	2cb1562130	net: ethernet: cortina: No mapping is a dropped rx Increase stats.rx_dropped++ even if this is the first fragment (skb == NULL) so we are doing proper accounting. Fixes: `b266bacba7` ("net: ethernet: cortina: Drop half-assembled SKB") Link: https://sashiko.dev/#/patchset/20260505-gemini-ethernet-fix-v2-1-997c31d06079%40kernel.org Signed-off-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260509-gemini-ethernet-fixes-v1-1-6c5d20ddc35b@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 15:20:16 +02:00
Evgenii Burenchev	be48e5fe51	qed: fix division by zero in qed_init_wfq_param when all vports are configured In qed_init_wfq_param(), variable non_requested_count can become zero when the number of vports with the configured flag set (including the current vport being configured) equals total num_vports. This happens when configuring the last unconfigured vport or when re-configuring an already configured vport. The function then calculates left_rate_per_vp = total_left_rate / non_requested_count, which causes division by zero. Fix this by skipping the division when non_requested_count is zero. In that case, there is no remaining bandwidth to distribute, so just record the configuration for the current vport and return success. Fixes: `bcd197c81f` ("qed: Add vport WFQ configuration APIs") Signed-off-by: Evgenii Burenchev <evg28bur@yandex.ru> Link: https://patch.msgid.link/20260507145520.23106-1-evg28bur@yandex.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-11 18:11:06 -07:00
Arthur Kiyanovski	24a08d7d62	net: ena: PHC: Check return code before setting timestamp output ena_phc_gettimex64() is setting the output parameter regardless of whether ena_com_phc_get_timestamp() succeeded or failed. When ena_com_phc_get_timestamp() returns an error, the timestamp parameter may contain uninitialized stack memory (e.g., when PHC is disabled or in blocked state) or invalid hardware values. Passing these to userspace via the PTP ioctl is both a security issue (information leak) and a correctness bug. Fix by checking the return code after releasing the lock and only setting the output timestamp on success. Fixes: `e0ea34158e` ("net: ena: Add PHC support in the ENA driver") Cc: stable@vger.kernel.org Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260507003518.22554-1-akiyano@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-11 17:22:48 -07:00
Shitalkumar Gandhi	a450063ef8	net: xgene: fix mdio_np leak in xgene_mdiobus_register() The for_each_child_of_node() loop captures mdio_np via break, holding the refcount. of_mdiobus_register() does not consume the reference, so it leaks on success. Put it after registration. Fixes: `e6ad767305` ("drivers: net: Add APM X-Gene SoC ethernet driver support.") Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com> Link: https://patch.msgid.link/20260507142024.811543-1-shitalkumar.gandhi@cambiumnetworks.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-10 10:16:58 -07:00
Arthur Kiyanovski	e42c755582	net: ena: PHC: Fix potential use-after-free in get_timestamp Move the phc->active check and resp pointer assignment to after acquiring the spinlock. Previously, phc->active was checked without holding the lock, and resp was cached from ena_dev->phc.virt_addr before the lock was acquired. If ena_com_phc_destroy() runs between the lockless active check and the lock acquisition, it sets active=false, releases the lock, frees the DMA memory, and sets virt_addr=NULL. The get_timestamp path would then read a NULL virt_addr and dereference it. With both the active check and the pointer read under the lock, destroy cannot free the memory while get_timestamp is using it. Fixes: `e0ea34158e` ("net: ena: Add PHC support in the ENA driver") Cc: stable@vger.kernel.org Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260508062126.7273-1-akiyano@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-10 10:05:50 -07:00
Shitalkumar Gandhi	6635fa8440	net: ti: icssm-prueth: fix eth_ports_node leak in probe The error path on of_property_read_u32() failure inside icssm_prueth_probe() returns without putting eth_ports_node, which was acquired before the for_each_child_of_node() loop. Drop it before returning. Fixes: `511f6c1ae0` ("net: ti: icssm-prueth: Adds ICSSM Ethernet driver") Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com> Link: https://patch.msgid.link/20260506195813.641610-1-shitalkumar.gandhi@cambiumnetworks.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-08 18:10:30 -07:00
Myeonghun Pak	c4f3d6eb1f	net: lan966x: avoid unregistering netdev on register failure lan966x_probe_port() stores the newly allocated net_device in the port before calling register_netdev(). If register_netdev() fails, the probe error path calls lan966x_cleanup_ports(), which sees port->dev and calls unregister_netdev() for a device that was never registered. Destroy the phylink instance created for this port and clear port->dev before returning the registration error. The common cleanup path now skips ports without port->dev before reaching the registered netdev cleanup, so it only handles ports that reached the registered-netdev lifetime. This also avoids treating an uninitialized FDMA netdev and the failed port as a NULL == NULL match in the common cleanup path. Fixes: `d28d6d2e37` ("net: lan966x: add port module support") Co-developed-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Myeonghun Pak <mhun512@gmail.com> Link: https://patch.msgid.link/20260506124331.31945-1-mhun512@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-08 16:30:45 -07:00
Ivan Vecera	30f1658fc5	ice: dpll: fix misplaced header macros The CGU register definitions (ICE_CGU_R10, ICE_CGU_R11 and related field masks) were placed after the #endif of the _ICE_DPLL_H_ include guard, leaving them unprotected. Move them inside the guard. Fixes: `ad1df4f2d5` ("ice: dpll: Support E825-C SyncE and dynamic pin discovery") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-8-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-08 16:01:09 -07:00
Ivan Vecera	cce709d8df	ice: dpll: fix rclk pin state get for E810 The refactoring of ice_dpll_rclk_state_on_pin_get() to use ice_dpll_pin_get_parent_idx() omitted the base_rclk_idx adjustment that was correctly added in the ice_dpll_rclk_state_on_pin_set() path. This breaks E810 devices where base_rclk_idx is non-zero, causing the wrong hardware index to be used for pin state lookup and incorrect recovered clock state to be reported via the DPLL subsystem. E825C is unaffected as its base_rclk_idx is 0. While at it, add bounds check against ICE_DPLL_RCLK_NUM_MAX on hw_idx after the base_rclk_idx subtraction in both ice_dpll_rclk_state_on_pin_{get,set}() to prevent out-of-bounds access on the pin state array. Fixes: `ad1df4f2d5` ("ice: dpll: Support E825-C SyncE and dynamic pin discovery") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-7-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-08 16:01:09 -07:00
Bart Van Assche	0ded1f36ba	ice: fix locking in ice_dcb_rebuild() Move the mutex_lock() call up to prevent that DCB settings change after the first ice_query_port_ets() call. The second ice_query_port_ets() call in ice_dcb_rebuild() is already protected by pf->tc_mutex. This also fixes a bug in an error path, as before taking the first "goto dcb_error" in the function jumped over mutex_lock() to mutex_unlock(). This bug has been detected by the clang thread-safety analyzer. Cc: intel-wired-lan@lists.osuosl.org Fixes: `242b5e068b` ("ice: Fix DCB rebuild after reset") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-6-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-08 16:01:09 -07:00
Marcin Szycik	b3cda96feb	ice: fix setting RSS VSI hash for E830 ice_set_rss_hfunc() performs a VSI update, in which it sets hashing function, leaving other VSI options unchanged. However, ::q_opt_flags is mistakenly set to the value of another field, instead of its original value, probably due to a typo. What happens next is hardware-dependent: On E810, only the first bit is meaningful (see ICE_AQ_VSI_Q_OPT_PE_FLTR_EN) and can potentially end up in a different state than before VSI update. On E830, some of the remaining bits are not reserved. Setting them to some unrelated values can cause the firmware to reject the update because of invalid settings, or worse - succeed. Reproducer: sudo ethtool -X $PF1 equal 8 Output in dmesg: Failed to configure RSS hash for VSI 6, error -5 Fixes: `352e9bf238` ("ice: enable symmetric-xor RSS for Toeplitz hash function") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-5-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-08 16:01:09 -07:00
Greg Kroah-Hartman	6c77b95108	idpf: fix double free and use-after-free in aux device error paths When auxiliary_device_add() fails in idpf_plug_vport_aux_dev() or idpf_plug_core_aux_dev(), the err_aux_dev_add label calls auxiliary_device_uninit() and falls through to err_aux_dev_init. The uninit call will trigger put_device(), which invokes the release callback (idpf_vport_adev_release / idpf_core_adev_release) that frees iadev. The fall-through then reads adev->id from the freed iadev for ida_free() and double-frees iadev with kfree(). Free the IDA slot and clear the back-pointer before uninit, while adev is still valid, then return immediately. Commit `65637c3a18` ("idpf: fix UAF in RDMA core aux dev deinitialization") fixed the same use-after-free in the matching unplug path in this file but missed both probe error paths. Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: stable@kernel.org Fixes: `be91128c57` ("idpf: implement RDMA vport auxiliary dev create, init, and destroy") Fixes: `f4312e6bfa` ("idpf: implement core RDMA auxiliary dev create, init, and destroy") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-4-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-08 16:01:09 -07:00
Emil Tantilov	da4f76b6a8	idpf: fix read_dev_clk_lock spinlock init in idpf_ptp_init() In idpf_ptp_init(), read_dev_clk_lock is initialized after ptp_schedule_worker() had already been called (and after idpf_ptp_settime64() could reach the lock). The PTP aux worker fires immediately upon scheduling and can call into idpf_ptp_read_src_clk_reg_direct(), which takes spin_lock(&ptp->read_dev_clk_lock) on an uninitialized lock, triggering the lockdep "non-static key" warning: [12973.796587] idpf 0000:83:00.0: Device HW Reset initiated [12974.094507] INFO: trying to register non-static key. ... [12974.097208] Call Trace: [12974.097213] <TASK> [12974.097218] dump_stack_lvl+0x93/0xe0 [12974.097234] register_lock_class+0x4c4/0x4e0 [12974.097249] ? __lock_acquire+0x427/0x2290 [12974.097259] __lock_acquire+0x98/0x2290 [12974.097272] lock_acquire+0xc6/0x310 [12974.097281] ? idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf] [12974.097311] ? lockdep_hardirqs_on_prepare+0xde/0x190 [12974.097318] ? finish_task_switch.isra.0+0xd2/0x350 [12974.097330] ? __pfx_ptp_aux_kworker+0x10/0x10 [ptp] [12974.097343] _raw_spin_lock+0x30/0x40 [12974.097353] ? idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf] [12974.097373] idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf] [12974.097391] ? kthread_worker_fn+0x88/0x3d0 [12974.097404] ? kthread_worker_fn+0x4e/0x3d0 [12974.097411] idpf_ptp_update_cached_phctime+0x26/0x120 [idpf] [12974.097428] ? _raw_spin_unlock_irq+0x28/0x50 [12974.097436] idpf_ptp_do_aux_work+0x15/0x20 [idpf] [12974.097454] ptp_aux_kworker+0x20/0x40 [ptp] [12974.097464] kthread_worker_fn+0xd5/0x3d0 [12974.097474] ? __pfx_kthread_worker_fn+0x10/0x10 [12974.097482] kthread+0xf4/0x130 [12974.097489] ? __pfx_kthread+0x10/0x10 [12974.097498] ret_from_fork+0x32c/0x410 [12974.097512] ? __pfx_kthread+0x10/0x10 [12974.097519] ret_from_fork_asm+0x1a/0x30 [12974.097540] </TASK> Move the call to spin_lock_init() up a bit to make sure read_dev_clk_lock is not touched before it's been initialized. Fixes: `5cb8805d23` ("idpf: negotiate PTP capabilities and get PTP clock") Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-3-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-08 16:01:09 -07:00
Matt Vollrath	678b713ece	i40e: Cleanup PTP pins on probe failure PTP pin structs are allocated early in probe, but never cleaned up. Fix this by calling i40e_ptp_free_pins in the error path. To support this, i40e_ptp_free_pins is added to the header and pin_config is correctly nullified after being freed. This has been an issue since i40e_ptp_alloc_pins was introduced. Fixes: `1050713026` ("i40e: add support for PTP external synchronization clock") Reported-by: Kohei Enju <kohei@enjuk.jp> Cc: stable@vger.kernel.org Signed-off-by: Matt Vollrath <tactii@gmail.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Kohei Enju <kohei@enjuk.jp> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-2-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-08 16:01:08 -07:00
Matt Vollrath	1619553b0a	i40e: Cleanup PTP registration on probe failure Fix two conditions which would leak PTP registration on probe failure: 1. i40e_setup_pf_switch can encounter an error in i40e_setup_pf_filter_control, call i40e_ptp_init, then return non-zero, sending i40e_probe to err_vsis. 2. i40e_setup_misc_vector can return non-zero, sending i40e_probe to err_vsis. Both of these conditions have been present since PTP was introduced in this driver. Found with coccinelle. Fixes: `beb0dff125` ("i40e: enable PTP") Signed-off-by: Matt Vollrath <tactii@gmail.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-1-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-08 16:01:08 -07:00
Linus Torvalds	fcee7d82f2	Merge tag 'net-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Netfilter, IPsec, Bluetooth and WiFi. Current release - fix to a fix: - ipmr: add __rcu to netns_ipv4.mrt, make sure we hold the RCU lock in all relevant places Current release - new code bugs: - fixes for the recently added resizable hash tables - ipv6: make sure we default IPv6 tunnel drivers to =m now that IPv6 itself is built in - drv: octeontx2-af: fixes for parser/CAM fixes Previous releases - regressions: - phy: micrel: fix LAN8814 QSGMII soft reset - wifi: - cw1200: revert "Fix locking in error paths" - ath12k: fix crash on WCN7850, due to adding the same queue buffer to a list multiple times Previous releases - always broken: - number of info leak fixes - ipv6: implement limits on extension header parsing - wifi: number of fixes for missing bound checks in the drivers - Bluetooth: fixes for races and locking issues - af_unix: - fix an issue between garbage collection and PEEK - fix yet another issue with OOB data - xfrm: esp: avoid in-place decrypt on shared skb frags - netfilter: replace skb_try_make_writable() by skb_ensure_writable() - openvswitch: vport: fix race between tunnel creation and linking leading to invalid memory accesses (type confusion) - drv: amd-xgbe: fix PTP addend overflow causing frozen clock Misc: - sched/isolation: make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN (for relevant IPVS change)" * tag 'net-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (190 commits) net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init() net: sparx5: fix wrong chip ids for TSN SKUs net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel() tcp: Fix dst leak in tcp_v6_connect(). ipmr: Call ipmr_fib_lookup() under RCU. net: phy: broadcom: Save PHY counters during suspend net/smc: fix missing sk_err when TCP handshake fails af_unix: Reject SIOCATMARK on non-stream sockets veth: fix OOB txq access in veth_poll() with asymmetric queue counts eth: fbnic: fix double-free of PCS on phylink creation failure net: ethernet: cortina: Drop half-assembled SKB selftests: mptcp: pm: restrict 'unknown' check to pm_nl_ctl selftests: mptcp: check output: catch cmd errors mptcp: pm: prio: skip closed subflows mptcp: pm: ADD_ADDR rtx: return early if no retrans mptcp: pm: ADD_ADDR rtx: skip inactive subflows mptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quicker mptcp: pm: ADD_ADDR rtx: free sk if last mptcp: pm: ADD_ADDR rtx: always decrease sk refcount mptcp: pm: ADD_ADDR rtx: fix potential data-race ...	2026-05-07 10:32:03 -07:00
Daniel Machon	41ae14071c	net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init() sparx5_port_init() only invokes sparx5_serdes_set() and the associated shadow-device enable and low-speed device switch for SGMII and QSGMII. On any port with a high-speed primary device (DEV5G/DEV10G/DEV25G) configured for 1000BASE-X the serdes is therefore left uninitialized, the DEV2G5 shadow is never enabled, and the port stays pointed at its high-speed device rather than the DEV2G5. The PCS1G block looks healthy in isolation, but no frames reach the link partner. Add 1000BASE-X to the check so the same three steps run. Note: the same issue might apply to 2500BASE-X, but that will, eventually, be addressed in a separate commit. Reported-by: Andrew Lunn <andrew@lunn.ch> Fixes: `946e7fd505` ("net: sparx5: add port module support") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20260506-misc-fixes-sparx5-lan969x-v2-4-fb236aa96908@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-07 09:08:47 -07:00
Daniel Machon	b131dc93f7	net: sparx5: fix wrong chip ids for TSN SKUs The TSN SKUs in enum spx5_target_chiptype have incorrect IDs: SPX5_TARGET_CT_7546TSN = 0x47546, SPX5_TARGET_CT_7549TSN = 0x47549, SPX5_TARGET_CT_7552TSN = 0x47552, SPX5_TARGET_CT_7556TSN = 0x47556, SPX5_TARGET_CT_7558TSN = 0x47558, The value read back from the chip is GCB_CHIP_ID_PART_ID, which is a GENMASK(27, 12) field, i.e. at most 16 bits wide. It can never match these IDs, so probing a TSN part fails with a "Target not supported" error. Fix the enum to use the actual 16-bit part IDs returned by the hardware: 0x0546, 0x0549, 0x0552, 0x0556 and 0x0558. Reported-by: Andrew Lunn <andrew@lunn.ch> Fixes: `3cfa11bac9` ("net: sparx5: add the basic sparx5 driver") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20260506-misc-fixes-sparx5-lan969x-v2-3-fb236aa96908@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-07 09:08:46 -07:00
Joey Lu	dedf6c9038	net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel() priv->dev was never initialized after devm_kzalloc() allocates the private data structure. When nvt_set_phy_intf_sel() is later invoked via the phylink interface_select callback, it calls nvt_gmac_get_delay(priv->dev, ...) which dereferences the NULL pointer. Fix this by assigning priv->dev = dev immediately after allocation. Fixes: `4d7c557f58` ("net: stmmac: dwmac-nuvoton: Add dwmac glue for Nuvoton MA35 family") Signed-off-by: Joey Lu <a0987203069@gmail.com> Link: https://patch.msgid.link/20260506084614.192894-2-a0987203069@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-07 08:41:30 -07:00
Bobby Eshleman	593dfd40a9	eth: fbnic: fix double-free of PCS on phylink creation failure fbnic_phylink_create() stores the newly allocated PCS in fbn->pcs and then calls phylink_create(). When phylink_create() fails, the error path correctly destroys the PCS via xpcs_destroy_pcs(), but the caller, fbnic_netdev_alloc(), responds by invoking fbnic_netdev_free() which calls fbnic_phylink_destroy(). That function finds fbn->pcs non-NULL and calls xpcs_destroy_pcs() a second time on the already-freed object, triggering a refcount underflow use-after-free: [ 1.934973] fbnic 0000:01:00.0: Failed to create Phylink interface, err: -22 [ 1.935103] ------------[ cut here ]------------ [ 1.935179] refcount_t: underflow; use-after-free. [ 1.935252] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x59/0x90, CPU#0: swapper/0/1 [ 1.935389] Modules linked in: [ 1.935484] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-virtme-04244-g1f5ffc672165-dirty #1 PREEMPT(lazy) [ 1.935661] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 1.935826] RIP: 0010:refcount_warn_saturate+0x59/0x90 [ 1.935931] Code: 44 48 8d 3d 49 f9 a7 01 67 48 0f b9 3a e9 bf 1e 96 00 48 8d 3d 48 f9 a7 01 67 48 0f b9 3a c3 cc cc cc cc 48 8d 3d 47 f9 a7 01 <67> 48 0f b9 3a c3 cc cc cc cc 48 8d 3d 46 f9 a7 01 67 48 0f b9 3a [ 1.936274] RSP: 0000:ffffd0d440013c58 EFLAGS: 00010246 [ 1.936376] RAX: 0000000000000000 RBX: ffff8f39c188c278 RCX: 000000000000002b [ 1.936524] RDX: ffff8f39c004f000 RSI: 0000000000000003 RDI: ffffffff96abab00 [ 1.936692] RBP: ffff8f39c188c240 R08: ffffffff96988e88 R09: 00000000ffffdfff [ 1.936835] R10: ffffffff96878ea0 R11: 0000000000000187 R12: 0000000000000000 [ 1.936970] R13: ffff8f39c0cef0c8 R14: ffff8f39c1ac01c0 R15: 0000000000000000 [ 1.937114] FS: 0000000000000000(0000) GS:ffff8f3ba08b4000(0000) knlGS:0000000000000000 [ 1.937273] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.937382] CR2: ffff8f3b3ffff000 CR3: 0000000172642001 CR4: 0000000000372ef0 [ 1.937540] Call Trace: [ 1.937619] <TASK> [ 1.937698] xpcs_destroy_pcs+0x25/0x40 [ 1.937783] fbnic_netdev_alloc+0x1e5/0x200 [ 1.937859] fbnic_probe+0x230/0x370 [ 1.937939] local_pci_probe+0x3e/0x90 [ 1.938013] pci_device_probe+0xbb/0x1e0 [ 1.938091] ? sysfs_do_create_link_sd+0x6d/0xe0 [ 1.938188] really_probe+0xc1/0x2b0 [ 1.938282] __driver_probe_device+0x73/0x120 [ 1.938371] driver_probe_device+0x1e/0xe0 [ 1.938466] __driver_attach+0x8d/0x190 [ 1.938560] ? __pfx___driver_attach+0x10/0x10 [ 1.938663] bus_for_each_dev+0x7b/0xd0 [ 1.938758] bus_add_driver+0xe8/0x210 [ 1.938854] driver_register+0x60/0x120 [ 1.938929] ? __pfx_fbnic_init_module+0x10/0x10 [ 1.939026] fbnic_init_module+0x25/0x60 [ 1.939109] do_one_initcall+0x49/0x220 [ 1.939202] ? rdinit_setup+0x20/0x40 [ 1.939304] kernel_init_freeable+0x1b0/0x310 [ 1.939449] ? __pfx_kernel_init+0x10/0x10 [ 1.939560] kernel_init+0x1a/0x1c0 [ 1.939640] ret_from_fork+0x1ed/0x240 [ 1.939730] ? __pfx_kernel_init+0x10/0x10 [ 1.939805] ret_from_fork_asm+0x1a/0x30 [ 1.939886] </TASK> [ 1.939927] ---[ end trace 0000000000000000 ]--- [ 1.940184] fbnic 0000:01:00.0: Netdev allocation failed Instead of calling fbnic_phylink_destroy(), the prior initialization of netdev should just be unrolled with free_netdev() and clearing fbd->netdev. Clearing fbd->netdev to NULL avoids UAF in init_failure_mode where callers guard by checking !fbd->netdev, such as fbnic_mdio_read_pmd(). These callers remain active even after a failed probe, so fdb->netdev still needs to be cleared. Fixes: `d0fe7104c7` ("fbnic: Replace use of internal PCS w/ Designware XPCS") Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260504-fbnic-pcs-fix-v2-1-de45192821d9@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-07 12:34:42 +02:00
Andreas Haarmann-Thiemann	b266bacba7	net: ethernet: cortina: Drop half-assembled SKB In gmac_rx() (drivers/net/ethernet/cortina/gemini.c), when gmac_get_queue_page() returns NULL for the second page of a multi-page fragment, the driver logs an error and continues — but does not free the partially assembled skb that was being assembled via napi_build_skb() / napi_get_frags(). Free the in-progress partially assembled skb via napi_free_frags() and increase the number of dropped frames appropriately and assign the skb pointer NULL to make sure it is not lingering around, matching the pattern already used elsewhere in the driver. Fixes: `4d5ae32f5e` ("net: ethernet: Add a driver for Gemini gigabit ethernet") Signed-off-by: Andreas Haarmann-Thiemann <eitschman@nebelreich.de> Signed-off-by: Linus Walleij <linusw@kernel.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260505-gemini-ethernet-fix-v2-1-997c31d06079@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 18:43:41 -07:00
Shitalkumar Gandhi	701ea57fea	net: rtsn: fix mdio_node leak in rtsn_mdio_alloc() of_get_child_by_name() takes a reference. The rtsn_reset() and rtsn_change_mode() failure paths jump to out_free_bus and leak mdio_node. Add out_put_node to drop it before falling through. Fixes: `b0d3969d2b` ("net: ethernet: rtsn: Add support for Renesas Ethernet-TSN") Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://patch.msgid.link/20260505123236.406000-1-shitalkumar.gandhi@cambiumnetworks.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 17:42:50 -07:00
Shay Drory	d466ddda55	net/mlx5e: SD, Fix race condition in secondary device probe/remove When utilizing Socket-Direct single netdev functionality the driver resolves the actual auxiliary device using mlx5_sd_get_adev(). However, the current implementation returns the primary ETH auxiliary device without holding the device lock, leading to a potential race condition where the ETH device could be unbound or removed concurrently during probe, suspend, resume, or remove operations.[1] Fix this by introducing mlx5_sd_put_adev() and updating mlx5_sd_get_adev() so that secondaries devices would get a ref and acquire the device lock of the returned auxiliary device. After the lock is acquired, a second devcom check is needed[2]. In addition, update The callers to pair the get operation with the new put operation, ensuring the lock is held while the auxiliary device is being operated on and released afterwards. The "primary" designation is determined once in sd_register(). It's set before devcom is marked ready, and it never changes after that. In Addition, The primary path never locks a secondary: When the primary device invoke mlx5_sd_get_adev(), it sees dev == primary and returns. no additional lock is taken. Therefore lock ordering is always: secondary_lock -> primary_lock. The reverse never happens, so ABBA deadlock is impossible. [1] for example: BUG: kernel NULL pointer dereference, address: 0000000000000370 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP CPU: 4 UID: 0 PID: 3945 Comm: bash Not tainted 6.19.0-rc3+ #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5e_dcbnl_dscp_app+0x23/0x100 [mlx5_core] Call Trace: <TASK> mlx5e_remove+0x82/0x12a [mlx5_core] device_release_driver_internal+0x194/0x1f0 bus_remove_device+0xc6/0x140 device_del+0x159/0x3c0 ? devl_param_driverinit_value_get+0x29/0x80 mlx5_rescan_drivers_locked+0x92/0x160 [mlx5_core] mlx5_unregister_device+0x34/0x50 [mlx5_core] mlx5_uninit_one+0x43/0xb0 [mlx5_core] remove_one+0x4e/0xc0 [mlx5_core] pci_device_remove+0x39/0xa0 device_release_driver_internal+0x194/0x1f0 unbind_store+0x99/0xa0 kernfs_fop_write_iter+0x12e/0x1e0 vfs_write+0x215/0x3d0 ksys_write+0x5f/0xd0 do_syscall_64+0x55/0xe90 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [2] CPU0 (primary) CPU1 (secondary) ========================================================================== mlx5e_remove() (device_lock held) mlx5e_remove() (2nd device_lock held) mlx5_sd_get_adev() mlx5_devcom_comp_is_ready() => true device_lock(primary) mlx5_sd_get_adev() ==> ret adev _mlx5e_remove() mlx5_sd_cleanup() // mlx5e_remove finished // releasing device_lock //need another check here... mlx5_devcom_comp_is_ready() => false Fixes: `381978d283` ("net/mlx5e: Create single netdev per SD group") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504180206.268568-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:13:09 -07:00
Shay Drory	3564222cfd	net/mlx5e: SD, Fix missing cleanup on probe error When _mlx5e_probe() fails, the preceding successful mlx5_sd_init() is not undone. Auxiliary bus probe failure skips binding, so mlx5e_remove() is never called for that adev and the matching mlx5_sd_cleanup() never runs - leaking the per-dev SD struct. Call mlx5_sd_cleanup() on the probe error path to balance mlx5_sd_init(). A similar gap exists on the resume path: mlx5_sd_init() and mlx5_sd_cleanup() are currently bundled with both probe/remove and suspend/resume, even though only the FW alias state actually needs to follow the suspend/resume lifecycle - the sd struct allocation and devcom membership are software state that should track the full bound lifetime. As a result, a failed resume can leave a still-bound device with sd == NULL, which mlx5_sd_get_adev() can't distinguish from a non-SD device. Fixing this requires sd_suspend/resume APIs which will only destroy FW resources and is left for a follow-up series. Fixes: `381978d283` ("net/mlx5e: Create single netdev per SD group") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504180206.268568-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:13:09 -07:00
Shay Drory	05217e4ffb	net/mlx5: SD, Keep multi-pf debugfs entries on primary mlx5_sd_init() creates the "multi-pf" debugfs directory under the primary device debugfs root, but stored the dentry in the calling device's sd struct. When sd_cleanup() run on a different PF, this leads to using the wrong sd->dfs for removing entries, which results in memory leak and an error in when re-creating the SD.[1] Fix it by explicitly storing the debugfs dentry in the primary device sd struct and use it for all per-group files. [1] debugfs: 'multi-pf' already exists in '0000:08:00.1' Fixes: `4375130bf5` ("net/mlx5: SD, Add debugfs") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504180206.268568-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:13:09 -07:00
Shay Drory	3abcedfdfd	net/mlx5: SD: Serialize init/cleanup mlx5_sd_init() / mlx5_sd_cleanup() may run from multiple PFs in the same Socket-Direct group. This can cause the SD bring-up/tear-down sequence to be executed more than once or interleaved across PFs. Protect SD init/cleanup with mlx5_devcom_comp_lock() and track the SD group state on the primary device. Skip init if the primary is already UP, and skip cleanup unless the primary is UP. The state check on cleanup is needed because sd_register() drops the devcom comp lock between marking the comp ready and assigning primary_dev on each peer. A concurrent cleanup that acquires the lock in this window would observe devcom_is_ready==true while primary_dev is still NULL (causing mlx5_sd_get_primary() to return NULL) or while the FW alias setup performed by mlx5_sd_init()'s body has not yet run (causing sd_cmd_unset_primary() to dereference a NULL tx_ft). Gate the cleanup body on primary_sd->state == MLX5_SD_STATE_UP, which is set only at the very end of mlx5_sd_init() under the same comp lock - so observing UP guarantees primary_dev, secondaries[], tx_ft, and dfs are all populated. Also bail explicitly if mlx5_sd_get_primary() returns NULL, in case state is checked on a peer whose primary_dev hasn't been assigned yet. In addition, move mlx5_devcom_comp_set_ready(false) from sd_unregister() into the cleanup's locked section, including the !primary and state != UP early-exit paths, so the device cannot unregister and free its struct mlx5_sd while devcom is still marked ready. A concurrent init acquiring the devcom lock will now observe devcom is no longer ready and bail out immediately. Fixes: `381978d283` ("net/mlx5e: Create single netdev per SD group") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504180206.268568-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:13:09 -07:00
Cosmin Ratiu	c4a5c46199	net/mlx5e: psp: Hook PSP dev reg/unreg to profile enable/disable devlink reload while PSP connections are active does: mlx5_unload_one_devl_locked() -> mlx5_detach_device() -> _mlx5e_suspend() -> mlx5e_detach_netdev() -> profile->cleanup_rx -> profile->cleanup_tx -> mlx5e_destroy_mdev_resources() -> mlx5_core_dealloc_pd() fails: ... mlx5_core 0000:08:00.0: mlx5_cmd_out_err:821:(pid 19722): DEALLOC_PD(0x801) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xef0c8a), err(-22) ... The reason for failure is the existence of TX keys, which are removed by the PSP dev unregistration happening in: profile->cleanup() -> mlx5e_psp_unregister() -> mlx5e_psp_cleanup() -> psp_dev_unregister() ...but this isn't invoked in the devlink reload flow, only when changing the NIC profile (e.g. when transitioning to switchdev mode) or on dev teardown. Move PSP device registration into mlx5e_nic_enable(), and unregistration into the corresponding mlx5e_nic_disable(). These functions are called during netdev attach/detach after RX & TX are set up. This ensures that the keys will be gone by the time the PD is destroyed. Fixes: `89ee2d92f6` ("net/mlx5e: Support PSP offload functionality") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504181100.269334-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:09:04 -07:00
Cosmin Ratiu	50690733db	net/mlx5e: psp: Expose only a fully initialized priv->psp Currently, during PSP init, priv->psp is initialized to an incompletely built psp struct. Additionally, on fs init failure priv->psp is reset to NULL. Change this so that only a fully initialized priv->psp is set, which makes the code easier to reason about in failure scenarios. Fixes: `af2196f494` ("net/mlx5e: Implement PSP operations .assoc_add and .assoc_del") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504181100.269334-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:09:04 -07:00
Cosmin Ratiu	ae9582cd0b	net/mlx5e: psp: Fix invalid access on PSP dev registration fail priv->psp->psp is initialized with the PSP device as returned by psp_dev_create(). This could also return an error, in which case a future psp_dev_unregister() will result in unpleasantness. Avoid that by using a local variable and only saving the PSP device when registration succeeds. In case psp_dev_create() fails, priv->psp and steering structs are left in place, but they will be inert. The unchecked access of priv->psp in mlx5e_psp_offload_handle_rx_skb() won't happen because without a PSP device, there can be no SAs added and therefore no packets will be successfully decrypted and be handed off to the SW handler. Fixes: `89ee2d92f6` ("net/mlx5e: Support PSP offload functionality") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504181100.269334-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:09:04 -07:00
Pavan Chebbi	bd279e104e	bnxt_en: Use absolute target ns from ptp_clock_request There is no need to calculate the target PHC cycles required to make phase adjustment on the PPS OUT signal. This is because the application supplies absolute n_sec value in the future and is already the actual desired target value. Remove the unnecessary code. Fixes: `9e518f2580` ("bnxt_en: 1PPS functions to configure TSIO pins") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Tested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260504083611.1383776-5-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 17:36:15 -07:00
Kalesh AP	16517bc98a	bnxt_en: Check return value of bnxt_hwrm_vnic_cfg When the bnxt RDMA driver is loaded, it calls bnxt_register_dev(). As part of this, driver sends HWRM_VNIC_CFG firmware command to configure the VNIC to operate in dual VNIC mode. Currently the driver ignores the result of this firmware command. The RDMA driver must know the result since it affects its functioning. Check return value of call to bnxt_hwrm_vnic_cfg() in bnxt_register_dev() and return failure on error. Fixes: `a588e4580a` ("bnxt_en: Add interface to support RDMA driver.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20260504083611.1383776-4-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 17:36:14 -07:00
Michael Chan	54c28fab2f	bnxt_en: Set bp->max_tpa according to what the FW supports Fix the logic to set bp->max_tpa no higher than what the FW supports. On P5 chips, some older FW sets max_tpa very low so we override it to prevent performance regressions with the older FW. Fixes: `79632e9ba3` ("bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Colin Winegarden <colin.winegarden@broadcom.com> Reviewed-by: Rukhsana Ansari <rukhsana.ansari@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20260504083611.1383776-3-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 17:36:14 -07:00
Michael Chan	07f4443335	bnxt_en: Delay for 5 seconds after AER DPC for all chips The FW on all chips is requiring a 5-second delay after Downstream Port Containment (DPC) AER. The previously added 900 msec delay was not long enough in all cases because the chip's CRS (Configuration Request Retry Status) mechanism is not always reliable. Fixes: `d5ab32e9b0` ("bnxt_en: Add delay to handle Downstream Port Containment (DPC) AER") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20260504083611.1383776-2-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 17:36:14 -07:00
Linus Torvalds	9207d47f96	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: - Several error unwind misses on system calls in mlx5, mana, ocrdma, vmw_pvrdma, mlx4, and hns - More rxe bugs processing network packets - User triggerable races in mlx5 when destroying and creating the same same object when the FW returns the same object ID - Incorrect passing of an IPv6 address through netlink RDMA_NL_LS_OP_IP_RESOLVE - Add memory ordering for mlx5's lock avoidance pattenr - Protect mana from kernel memory overflow - Use safe patterns for xarray/radix_tree look up in mlx5 and hns * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (24 commits) RDMA/hns: Fix unlocked call to hns_roce_qp_remove() RDMA/hns: Fix xarray race in hns_roce_create_qp_common() RDMA/hns: Fix xarray race in hns_roce_create_srq() RDMA/mlx4: Fix mis-use of RCU in mlx4_srq_event() RDMA/mlx4: Fix resource leak on error in mlx4_ib_create_srq() RDMA/vmw_pvrdma: Fix double free on pvrdma_alloc_ucontext() error path RDMA/ocrdma: Don't NULL deref uctx on errors in ocrdma_copy_pd_uresp() RDMA/ocrdma: Clarify the mm_head searching RDMA/mana: Fix error unwind in mana_ib_create_qp_rss() RDMA/mana: Fix mana_destroy_wq_obj() cleanup in mana_ib_create_qp_rss() RDMA/mana: Remove user triggerable WARN_ON() in mana_ib_create_qp_rss() RDMA/mana: Validate rx_hash_key_len RDMA/mlx5: Add missing store/release for lock elision pattern RDMA/mlx5: Restore zero-init to mlx5_ib_modify_qp() ucmd RDMA/ionic: Fix typo in format string RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation RDMA/core: Fix rereg_mr use-after-free race IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg() RDMA/mlx5: Fix UAF in DCT destroy due to race with create RDMA/mlx5: Fix UAF in SRQ destroy due to race with create ...	2026-05-05 09:11:52 -07:00
Dipayaan Roy	95084f1883	net: mana: Fix crash from unvalidated SHM offset read from BAR0 during FLR During Function Level Reset recovery, the MANA driver reads hardware BAR0 registers that may temporarily contain garbage values. The SHM (Shared Memory) offset read from GDMA_REG_SHM_OFFSET is used to compute gc->shm_base, which is later dereferenced via readl() in mana_smc_poll_register(). If the hardware returns an unaligned or out-of-range value, the driver must not blindly use it, as this would propagate the hardware error into a kernel crash. The following crash was observed on an arm64 Hyper-V guest running kernel 6.17.0-3013-azure during VF reset recovery triggered by HWC timeout. [13291.785274] Unable to handle kernel paging request at virtual address ffff8000a200001b [13291.785311] Mem abort info: [13291.785332] ESR = 0x0000000096000021 [13291.785343] EC = 0x25: DABT (current EL), IL = 32 bits [13291.785355] SET = 0, FnV = 0 [13291.785363] EA = 0, S1PTW = 0 [13291.785372] FSC = 0x21: alignment fault [13291.785382] Data abort info: [13291.785391] ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000 [13291.785404] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [13291.785412] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [13291.785421] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000014df3a1000 [13291.785432] [ffff8000a200001b] pgd=1000000100438403, p4d=1000000100438403, pud=1000000100439403, pmd=0068000fc2000711 [13291.785703] Internal error: Oops: 0000000096000021 [#1] SMP [13291.830975] Modules linked in: tls qrtr mana_ib ib_uverbs ib_core xt_owner xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables cfg80211 8021q garp mrp stp llc binfmt_misc joydev serio_raw nls_iso8859_1 hid_generic aes_ce_blk aes_ce_cipher polyval_ce ghash_ce sm4_ce_gcm sm4_ce_ccm sm4_ce sm4_ce_cipher hid_hyperv sm4 sm3_ce sha3_ce hv_netvsc hid vmgenid hyperv_keyboard hyperv_drm sch_fq_codel nvme_fabrics efi_pstore dm_multipath nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vmw_vmci vsock dmi_sysfs ip_tables x_tables autofs4 [13291.862630] CPU: 122 UID: 0 PID: 61796 Comm: kworker/122:2 Tainted: G W 6.17.0-3013-azure #13-Ubuntu VOLUNTARY [13291.869902] Tainted: [W]=WARN [13291.871901] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 01/08/2026 [13291.878086] Workqueue: events mana_serv_func [13291.880718] pstate: 62400005 (nZCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) [13291.884835] pc : mana_smc_poll_register+0x48/0xb0 [13291.887902] lr : mana_smc_setup_hwc+0x70/0x1c0 [13291.890493] sp : ffff8000ab79bbb0 [13291.892364] x29: ffff8000ab79bbb0 x28: ffff00410c8b5900 x27: ffff00410d630680 [13291.896252] x26: ffff004171f9fd80 x25: 000000016ed55000 x24: 000000017f37e000 [13291.899990] x23: 0000000000000000 x22: 000000016ed55000 x21: 0000000000000000 [13291.904497] x20: ffff8000a200001b x19: 0000000000004e20 x18: ffff8000a6183050 [13291.908308] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000000a [13291.912542] x14: 0000000000000004 x13: 0000000000000000 x12: 0000000000000000 [13291.916298] x11: 0000000000000000 x10: 0000000000000001 x9 : ffffc45006af1bd8 [13291.920945] x8 : ffff000151129000 x7 : 0000000000000000 x6 : 0000000000000000 [13291.925293] x5 : 000000015f214000 x4 : 000000017217a000 x3 : 000000016ed50000 [13291.930436] x2 : 000000016ed55000 x1 : 0000000000000000 x0 : ffff8000a1ffffff [13291.934342] Call trace: [13291.935736] mana_smc_poll_register+0x48/0xb0 (P) [13291.938611] mana_smc_setup_hwc+0x70/0x1c0 [13291.941113] mana_hwc_create_channel+0x1a0/0x3a0 [13291.944283] mana_gd_setup+0x16c/0x398 [13291.946584] mana_gd_resume+0x24/0x70 [13291.948917] mana_do_service+0x13c/0x1d0 [13291.951583] mana_serv_func+0x34/0x68 [13291.953732] process_one_work+0x168/0x3d0 [13291.956745] worker_thread+0x2ac/0x480 [13291.959104] kthread+0xf8/0x110 [13291.961026] ret_from_fork+0x10/0x20 [13291.963560] Code: d2807d00 9417c551 71000673 54000220 (b9400281) [13291.967299] ---[ end trace 0000000000000000 ]--- Disassembly of mana_smc_poll_register() around the crash site: Disassembly of section .text: 00000000000047c8 <mana_smc_poll_register>: 47c8: d503201f nop 47cc: d503201f nop 47d0: d503233f paciasp 47d4: f800865e str x30, [x18], #8 47d8: a9bd7bfd stp x29, x30, [sp, #-48]! 47dc: 910003fd mov x29, sp 47e0: a90153f3 stp x19, x20, [sp, #16] 47e4: 91007014 add x20, x0, #0x1c 47e8: 5289c413 mov w19, #0x4e20 47ec: f90013f5 str x21, [sp, #32] 47f0: 12001c35 and w21, w1, #0xff 47f4: 14000008 b 4814 <mana_smc_poll_register+0x4c> 47f8: 36f801e1 tbz w1, #31, 4834 <mana_smc_poll_register+0x6c> 47fc: 52800042 mov w2, #0x2 4800: d280fa01 mov x1, #0x7d0 4804: d2807d00 mov x0, #0x3e8 4808: 94000000 bl 0 <usleep_range_state> 480c: 71000673 subs w19, w19, #0x1 4810: 54000200 b.eq 4850 <mana_smc_poll_register+0x88> 4814: b9400281 ldr w1, [x20] <-- ** CRASHED HERE *** 4818: d50331bf dmb oshld 481c: 2a0103e2 mov w2, w1 ... From the crash signature x20 = ffff8000a200001b, this address ends in 0x1b which is not 4-byte aligned, so the 'ldr w1, [x20]' instruction (readl) triggers the arm64 alignment fault (FSC = 0x21). The root cause is in mana_gd_init_vf_regs(), which computes: gc->shm_base = gc->bar0_va + mana_gd_r64(gc, GDMA_REG_SHM_OFFSET); The offset is used without any validation. The same problem exists in mana_gd_init_pf_regs() for sriov_base_off and sriov_shm_off. Fix this by validating all offsets before use: - VF: check shm_off is within BAR0, properly aligned to 4 bytes (readl requirement), and leaves room for the full 256-bit (32-byte) SMC aperture. - PF: check sriov_base_off is within BAR0, aligned to 8 bytes (readq requirement), and leaves room to safely read the sriov_shm_off register at sriov_base_off + GDMA_PF_REG_SHM_OFF. Then check sriov_shm_off leaves room for the full SMC aperture. All arithmetic uses subtraction rather than addition to avoid integer overflow on garbage values. Define SMC_APERTURE_SIZE (32 bytes, derived from the 256-bit aperture width) Return -EPROTO on invalid values. The existing recovery path in mana_serv_reset() already handles -EPROTO by falling through to PCI device rescan, giving the hardware another chance to present valid register values after reset. Fixes: `9bf66036d6` ("net: mana: Handle hardware recovery events when probing the device") Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/afQUMClyjmBVfD+u@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-05 15:43:08 +02:00
Dipayaan Roy	3985c9a56d	net: mana: remove double CQ cleanup in mana_create_rxq error path In mana_create_rxq(), the error cleanup path calls mana_destroy_rxq() followed by mana_deinit_cq(). This is incorrect for two reasons: 1. mana_destroy_rxq() already calls mana_deinit_cq() internally, so the CQ's GDMA queue is destroyed twice. 2. mana_destroy_rxq() frees the rxq via kfree(rxq) before returning. The subsequent mana_deinit_cq(apc, cq) then operates on freed memory since cq points to &rxq->rx_cq, which is embedded in the already-freed rxq structure — a use-after-free. Remove the redundant mana_deinit_cq() call from the error path since mana_destroy_rxq() already handles CQ cleanup. mana_deinit_cq() is itself safe for an uninitialized CQ as it checks for a NULL gdma_cq before proceeding. Fixes: `ca9c54d2d6` ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Reviewed-by: Aditya Garg <gargaditya@linux.microsoft.com> Link: https://patch.msgid.link/20260430035935.1859220-4-dipayanroy@linux.microsoft.com Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-05 12:16:23 +02:00
Dipayaan Roy	2a1c691182	net: mana: Skip WQ object destruction for uninitialized RXQ In mana_destroy_rxq(), mana_destroy_wq_obj() is called unconditionally even when the WQ object was never created (rxobj is still INVALID_MANA_HANDLE). When mana_create_rxq() fails before mana_create_wq_obj() succeeds, the error path calls mana_destroy_rxq() which sends a bogus destroy command to the hardware: mana 7870:00:00.0: HWC: Failed hw_channel req: 0x1d mana 7870:00:00.0: Failed to send mana message: -71, 0x1d mana 7870:00:00.0 eth7: Failed to destroy WQ object: -71 Guard mana_destroy_wq_obj() with an INVALID_MANA_HANDLE check so that mana_destroy_rxq() is safe to call at any stage of RXQ initialization. Fixes: `ca9c54d2d6` ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260430035935.1859220-3-dipayanroy@linux.microsoft.com Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-05 12:16:23 +02:00
Dipayaan Roy	e9e334f806	net: mana: check xdp_rxq registration before unreg in mana_destroy_rxq() When mana_create_rxq() fails at mana_create_wq_obj() or any step before xdp_rxq_info_reg() is called, the error path jumps to `out:` which calls mana_destroy_rxq(). mana_destroy_rxq() unconditionally calls xdp_rxq_info_unreg() on xilinx xdp_rxq that was never registered, triggering a WARN_ON in net/core/xdp.c: mana 7870:00:00.0: HWC: Failed hw_channel req: 0xc000009a mana 7870:00:00.0 eth7: Failed to create RXQ: err = -71 Driver BUG WARNING: CPU: 442 PID: 491615 at ../net/core/xdp.c:150 xdp_rxq_info_unreg+0x44/0x70 Modules linked in: tcp_bbr xsk_diag udp_diag raw_diag unix_diag af_packet_diag netlink_diag nf_tables nfnetlink tcp_diag inet_diag binfmt_misc rpcsec_gss_krb5 nfsv3 nfs_acl auth_rpcgss nfsv4 dns_resolver nfs lockd ext4 grace crc16 iscsi_tcp mbcache fscache libiscsi_tcp jbd2 netfs rpcrdma af_packet sunrpc rdma_ucm ib_iser rdma_cm iw_cm iscsi_ibft ib_cm iscsi_boot_sysfs libiscsi rfkill scsi_transport_iscsi mana_ib ib_uverbs ib_core mana hyperv_drm(X) drm_shmem_helper intel_rapl_msr drm_kms_helper intel_rapl_common syscopyarea nls_iso8859_1 sysfillrect intel_uncore_frequency_common nls_cp437 vfat fat nfit sysimgblt libnvdimm hv_netvsc(X) hv_utils(X) fb_sys_fops hv_balloon(X) joydev fuse drm dm_mod configfs ip_tables x_tables xfs libcrc32c sd_mod nvme nvme_core nvme_common t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 hid_generic serio_raw pci_hyperv(X) hv_storvsc(X) scsi_transport_fc hyperv_keyboard(X) hid_hyperv(X) pci_hyperv_intf(X) crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd hv_vmbus(X) softdog sg scsi_mod efivarfs Supported: Yes, External CPU: 442 PID: 491615 Comm: ethtool Kdump: loaded Tainted: G X 5.14.21-150500.55.136-default #1 SLE15-SP5 a627be1b53abbfd64ad16b2685e4308c52847f42 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 07/25/2025 RIP: 0010:xdp_rxq_info_unreg+0x44/0x70 Code: e8 91 fe ff ff c7 43 0c 02 00 00 00 48 c7 03 00 00 00 00 5b c3 cc cc cc cc e9 58 3a 1c 00 48 c7 c7 f6 5f 19 97 e8 5c a4 7e ff <0f> 0b 83 7b 0c 01 74 ca 48 c7 c7 d9 5f 19 97 e8 48 a4 7e ff 0f 0b RSP: 0018:ff3df6c8f7207818 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ff30d89f94808a80 RCX: 0000000000000027 RDX: 0000000000000000 RSI: 0000000000000002 RDI: ff30d94bdcca2908 RBP: 0000000000080000 R08: ffffffff98ed11a0 R09: ff3df6c8f72077a0 R10: dead000000000100 R11: 000000000000000a R12: 0000000000000000 R13: 0000000000002000 R14: 0000000000040000 R15: ff30d89f94800000 FS: 00007fe6d8432b80(0000) GS:ff30d94bdcc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe6d81a89b1 CR3: 00000b3b6d578001 CR4: 0000000000371ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 Call Trace: <TASK> mana_destroy_rxq+0x5b/0x2f0 [mana 267acf7006bcb696095bba4d810643d1db3b9e94] mana_create_rxq.isra.55+0x3db/0x720 [mana 267acf7006bcb696095bba4d810643d1db3b9e94] ? simple_lookup+0x36/0x50 ? current_time+0x42/0x80 ? __d_free_external+0x30/0x30 mana_alloc_queues+0x32a/0x470 [mana 267acf7006bcb696095bba4d810643d1db3b9e94] ? _raw_spin_unlock+0xa/0x30 ? d_instantiate.part.29+0x2e/0x40 ? _raw_spin_unlock+0xa/0x30 ? debugfs_create_dir+0xe4/0x140 mana_attach+0x5c/0xf0 [mana 267acf7006bcb696095bba4d810643d1db3b9e94] mana_set_ringparam+0xd5/0x1a0 [mana 267acf7006bcb696095bba4d810643d1db3b9e94] ethnl_set_rings+0x292/0x320 genl_family_rcv_msg_doit.isra.15+0x11b/0x150 genl_rcv_msg+0xe3/0x1e0 ? rings_prepare_data+0x80/0x80 ? genl_family_rcv_msg_doit.isra.15+0x150/0x150 netlink_rcv_skb+0x50/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1b6/0x280 netlink_sendmsg+0x365/0x4d0 sock_sendmsg+0x5f/0x70 __sys_sendto+0x112/0x140 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x5b/0x80 ? handle_mm_fault+0xd7/0x290 ? do_user_addr_fault+0x2d8/0x740 ? exc_page_fault+0x67/0x150 entry_SYSCALL_64_after_hwframe+0x6b/0xd5 RIP: 0033:0x7fe6d8122f06 Code: 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 f3 c3 41 57 41 56 4d 89 c7 41 55 41 54 41 RSP: 002b:00007fff2b66b068 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 000055771123d2a0 RCX: 00007fe6d8122f06 RDX: 0000000000000034 RSI: 000055771123d3b0 RDI: 0000000000000003 RBP: 00007fff2b66b100 R08: 00007fe6d8203360 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 000055771123d350 R13: 000055771123d340 R14: 0000000000000000 R15: 00007fff2b66b2b0 </TASK> Guard the xdp_rxq_info_unreg() call with xdp_rxq_info_is_reg() so that mana_destroy_rxq() is safe to call regardless of how far initialization progressed. Fixes: `ed5356b53f` ("net: mana: Add XDP support") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260430035935.1859220-2-dipayanroy@linux.microsoft.com Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-05 12:16:23 +02:00
Jason Gunthorpe	c9341307ea	RDMA/mlx4: Fix mis-use of RCU in mlx4_srq_event() Sashiko points out the radix_tree itself is RCU safe, but nothing ever frees the mlx4_srq struct with RCU, and it isn't even accessed within the RCU critical section. It also will crash if an event is delivered before the srq object is finished initializing. Use the spinlock since it isn't easy to make RCU work, use refcount_inc_not_zero() to protect against partially initialized objects, and order the refcount_set() to be after the srq is fully initialized. Cc: stable@vger.kernel.org Fixes: `30353bfc43` ("net/mlx4_core: Use RCU to perform radix tree lookup for SRQ") Link: https://sashiko.dev/#/patchset/0-v2-1c49eeb88c48%2B91-rdma_udata_rep_jgg%40nvidia.com?part=5 Link: https://patch.msgid.link/r/12-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-02 15:30:48 -03:00
Gregory Fuchedgi	383d0fb894	amd-xgbe: fix PTP addend overflow causing frozen clock XGBE_PTP_ACT_CLK_FREQ and XGBE_V2_PTP_ACT_CLK_FREQ were 10x too large (500MHz/1GHz instead of 50MHz/100MHz), causing the computed addend to overflow the 32-bit tstamp_addend. In the general case this would result in the clock advancing at the wrong rate. For v2 (PCI), ptpclk_rate is hardcoded to 125MHz, so the addend formula (ACT_CLK_FREQ << 32) / ptpclk_rate yields exactly 8 * 2^32, and when stored to the 32-bit tstamp_addend the value is zero. With addend = 0 the hardware accumulator never overflows and the PTP clock is fully stopped. For v1 (platform), ptpclk_rate is read from ACPI/DT so the exact overflow behavior depends on the firmware-reported frequency. Define the constants as NSEC_PER_SEC / SSINC so the relationship is explicit and cannot drift out of sync. Fixes: `fbd47be098` ("amd-xgbe: add hardware PTP timestamping support") Tested-by: Gregory Fuchedgi <gfuchedgi@gmail.com> Signed-off-by: Gregory Fuchedgi <gfuchedgi@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260429-fix-xgbe-ptp-addend-v1-1-fca5b0ca5e62@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 10:16:27 -07:00
Ratheesh Kannoth	bc968f61bf	octeontx2-af: npc: cn20k: Reject missing default-rule MCAM indices When cn20k default L2 rules are not installed, npc_cn20k_dft_rules_idx_get() leaves broadcast, multicast, promiscuous, and unicast slots at USHRT_MAX. npc_get_nixlf_mcam_index() previously returned that sentinel as a valid MCAM index, so callers could program hardware with an invalid index. Return -EINVAL from the cn20k branches of npc_get_nixlf_mcam_index() when the requested slot is still USHRT_MAX. Harden cn20k NPC MCAM entry helpers to reject out-of-range indices before touching hardware. Drop the early bounds check in npc_enable_mcam_entry() for cn20k so invalid indices are validated inside npc_cn20k_enable_mcam_entry() instead of being silently ignored. In rvu_npc_update_flowkey_alg_idx(), treat negative MCAM indices like out-of-range values, and only update RSS actions for promiscuous and all-multi paths when the resolved index is non-negative. Cc: Suman Ghosh <sumang@marvell.com> Fixes: `6d1e70282f` ("octeontx2-af: npc: cn20k: Use common APIs") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-11-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:17 -07:00
Ratheesh Kannoth	013717353c	octeontx2-af: npc: cn20k: Tear down default MCAM rules explicitly on free npc_cn20k_dft_rules_free() used the NPC MCAM mbox "free all" path, which does not match how cn20k tracks default-rule MCAM slots indexes. Resolve the default-rule indices, then for each valid slot clear the bitmap entry, drop the PF/VF map, disable the MCAM line, clear the target function, and npc_cn20k_idx_free(). Remove any matching software mcam_rules nodes. On hard failure from idx_free, WARN and stop so the box stays up for analysis. In npc_mcam_free_all_entries(), prefetch the same default-rule indices and, on cn20k, skip bitmap clear and idx_free when the scanned entry is one of those reserved defaults (they are released by npc_cn20k_dft_rules_free). Fixes: `09d3b7a140` ("octeontx2-af: npc: cn20k: Allocate default MCAM indexes") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-10-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:17 -07:00
Ratheesh Kannoth	afb474bd4f	octeontx2-af: npc: cn20k: Initialize default-rule index outputs up front npc_cn20k_dft_rules_idx_get() wrote USHRT_MAX into individual outputs only on some error paths (lbk promisc lookup, VF ucast lookup, and the PF rule walk), which could leave other caller slots stale across retries. Set every non-NULL bcast/mcast/promisc/ucast pointer to USHRT_MAX once at entry, then drop the duplicate assignments on failure. Successful lookups still overwrite the relevant slot before returning. Fixes: `09d3b7a140` ("octeontx2-af: npc: cn20k: Allocate default MCAM indexes") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-9-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:17 -07:00
Ratheesh Kannoth	f6803eb070	octeontx2-af: npc: cn20k: Fix MCAM actions read npc_cn20k_read_mcam_entry() always reloaded action and vtag_action from bank 0 after programming the CAM words. Use the bank returned by npc_get_bank() for the ACTION reads as well, and read those registers once up front so both X2 and X4 paths share the same metadata. Return directly from the X2 keyword path now that the action fields are already populated. Cc: Suman Ghosh <sumang@marvell.com> Fixes: `6d1e70282f` ("octeontx2-af: npc: cn20k: Use common APIs") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-8-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:16 -07:00
Ratheesh Kannoth	2b6d6bb728	octeontx2-af: npc: cn20k: Fix bank value For X4 keys its loop reused the bank parameter as the loop counter, so bank no longer reflected the caller's bank after the loop and the control flow was hard to follow. Program NPC_AF_CN20K_MCAMEX_BANKX_CFG_EXT directly in npc_cn20k_config_mcam_entry(): one CFG write for X2 using the computed bank, and one CFG write per bank inside the X4 action loop. Enable the entry at the end with npc_cn20k_enable_mcam_entry(..., true) instead of embedding the enable bit in bank_cfg via the removed helper. Cc: Suman Ghosh <sumang@marvell.com> Fixes: `4e527f1e5c` ("octeontx2-af: npc: cn20k: Add new mailboxes for CN20K silicon") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-7-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:16 -07:00

1 2 3 4 5 ...

56369 Commits