linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-17 20:06:58 -04:00

Author	SHA1	Message	Date
Linus Torvalds	7c6c4ed80b	Merge tag 'vfs-7.0-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "The kernfs rbtree is keyed by (hash, ns, name) where the hash is seeded with the raw namespace pointer via init_name_hash(ns). The resulting hash values are exposed to userspace through readdir seek positions, and the pointer-based ordering in kernfs_name_compare() is observable through entry order. Switch from raw pointers to ns_common::ns_id for both hashing and comparison. A preparatory commit first replaces all const void * namespace parameters with const struct ns_common * throughout kernfs, sysfs, and kobject so the code can access ns->ns_id. Also compare the ns_id when hashes match in the rbtree to handle crafted collisions. Also fix eventpoll RCU grace period issue and a cachefiles refcount problem" * tag 'vfs-7.0-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: kernfs: make directory seek namespace-aware kernfs: use namespace id instead of pointer for hashing and comparison kernfs: pass struct ns_common instead of const void * for namespace tags eventpoll: defer struct eventpoll free to RCU grace period cachefiles: fix incorrect dentry refcount in cachefiles_cull()	2026-04-10 08:40:49 -07:00
David Carlier	59c3d55a94	net: lan966x: fix use-after-free and leak in lan966x_fdma_reload() When lan966x_fdma_reload() fails to allocate new RX buffers, the restore path restarts DMA using old descriptors whose pages were already freed via lan966x_fdma_rx_free_pages(). Since page_pool_put_full_page() can release pages back to the buddy allocator, the hardware may DMA into memory now owned by other kernel subsystems. Additionally, on the restore path, the newly created page pool (if allocation partially succeeded) is overwritten without being destroyed, leaking it. Fix both issues by deferring the release of old pages until after the new allocation succeeds. Save the old page array before the allocation so old pages can be freed on the success path. On the failure path, the old descriptors, pages and page pool are all still valid, making the restore safe. Also ensure the restore path re-enables NAPI and wakes the netdev, matching the success path. Fixes: `89ba464fcf` ("net: lan966x: refactor buffer reload function") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Link: https://patch.msgid.link/20260405055241.35767-4-devnexen@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 15:17:23 +02:00
David Carlier	076344a6ad	net: lan966x: fix page pool leak in error paths lan966x_fdma_rx_alloc() creates a page pool but does not destroy it if the subsequent fdma_alloc_coherent() call fails, leaking the pool. Similarly, lan966x_fdma_init() frees the coherent DMA memory when lan966x_fdma_tx_alloc() fails but does not destroy the page pool that was successfully created by lan966x_fdma_rx_alloc(), leaking it. Add the missing page_pool_destroy() calls in both error paths. Fixes: `11871aba19` ("net: lan96x: Use page_pool API") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Link: https://patch.msgid.link/20260405055241.35767-3-devnexen@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 15:17:23 +02:00
David Carlier	3fd0da4fd8	net: lan966x: fix page_pool error handling in lan966x_fdma_rx_alloc_page_pool() page_pool_create() can return an ERR_PTR on failure. The return value is used unconditionally in the loop that follows, passing the error pointer through xdp_rxq_info_reg_mem_model() into page_pool_use_xdp_mem(), which dereferences it, causing a kernel oops. Add an IS_ERR check after page_pool_create() to return early on failure. Fixes: `11871aba19` ("net: lan96x: Use page_pool API") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Link: https://patch.msgid.link/20260405055241.35767-2-devnexen@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 15:17:23 +02:00
Christian Brauner	e3b2cf6e5d	kernfs: pass struct ns_common instead of const void * for namespace tags kernfs has historically used const void * to pass around namespace tags used for directory-level namespace filtering. The only current user of this is sysfs network namespace tagging where struct net pointers are cast to void . Replace all const void namespace parameters with const struct ns_common * throughout the kernfs, sysfs, and kobject namespace layers. This includes the kobj_ns_type_operations callbacks, kobject_namespace(), and all sysfs/kernfs APIs that accept or return namespace tags. Passing struct ns_common is needed because various codepaths require access to the underlying namespace. A struct ns_common can always be converted back to the concrete namespace type (e.g., struct net) via container_of() or to_ns_common() in the reverse direction. This is a preparatory change for switching to ns_id-based directory iteration to prevent a KASLR pointer leak through the current use of raw namespace pointers as hash seeds and comparison keys. Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-04-09 14:36:52 +02:00
Alexander Koskovich	56007972c0	net: ipa: fix event ring index not programmed for IPA v5.0+ For IPA v5.0+, the event ring index field moved from CH_C_CNTXT_0 to CH_C_CNTXT_1. The v5.0 register definition intended to define this field in the CH_C_CNTXT_1 fmask array but used the old identifier of ERINDEX instead of CH_ERINDEX. Without a valid event ring, GSI channels could never signal transfer completions. This caused gsi_channel_trans_quiesce() to block forever in wait_for_completion(). At least for IPA v5.2 this resolves an issue seen where runtime suspend, system suspend, and remoteproc stop all hanged forever. It also meant the IPA data path was completely non functional. Fixes: `faf0678ec8` ("net: ipa: add IPA v5.0 GSI register definitions") Signed-off-by: Alexander Koskovich <akoskovich@pm.me> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260403-milos-ipa-v1-2-01e9e4e03d3e@fairphone.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 09:47:31 +02:00
Alexander Koskovich	9709b56d90	net: ipa: fix GENERIC_CMD register field masks for IPA v5.0+ Fix the field masks to match the hardware layout documented in downstream GSI (GSI_V3_0_EE_n_GSI_EE_GENERIC_CMD_*). Notably this fixes a WARN I was seeing when I tried to send "stop" to the MPSS remoteproc while IPA was up. Fixes: `faf0678ec8` ("net: ipa: add IPA v5.0 GSI register definitions") Signed-off-by: Alexander Koskovich <akoskovich@pm.me> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260403-milos-ipa-v1-1-01e9e4e03d3e@fairphone.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 09:47:31 +02:00
Jakub Kicinski	2607c0907d	Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2026-04-06 (idpf, ice, ixgbe, ixgbevf, igb, e1000) Emil converts to use spinlock_t for virtchnl transactions to make consistent use of the xn_bm_lock when accessing the free_xn_bm bitmap, while also avoiding nested raw/bh spinlock issue on PREEMPT_RT kernels. He also sets payload size before calling the async handler, to make sure it doesn't error out prematurely due to invalid size check for idpf. Kohei Enju changes WARN_ON for missing PTP control PF to a dev_info() on ice as there are cases where this is expected and acceptable. Petr Oros fixes conditions in which error paths failed to call ice_ptp_port_phy_restart() breaking PTP functionality on ice. Alex significantly reduces reporting of driver information, and time under RTNL locl, on ixgbe e610 devices by reducing reads of flash info only on events that could change it. Michal Schmidt adds missing Hyper-V op on ixgbevf. Alex Dvoretsky removes call to napi_synchronize() in igb_down() to resolve a deadlock. Agalakov Daniil adds error check on e1000 for failed EEPROM read. * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: e1000: check return value of e1000_read_eeprom igb: remove napi_synchronize() in igb_down() ixgbevf: add missing negotiate_features op to Hyper-V ops table ixgbe: stop re-reading flash on every get_drvinfo for e610 ice: fix PTP timestamping broken by SyncE code on E825C ice: ptp: don't WARN when controlling PF is unavailable idpf: set the payload size before calling the async handler idpf: improve locking around idpf_vc_xn_push_free() idpf: fix PREEMPT_RT raw/bh spinlock nesting for async VC handling ==================== Link: https://patch.msgid.link/20260406213038.444732-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 20:05:10 -07:00
Fabio Baltieri	5a37d22879	net: txgbe: leave space for null terminators on property_entry Lists of struct property_entry are supposed to be terminated with an empty property, this driver currently seems to be allocating exactly the amount of entry used. Change the struct definition to leave an extra element for all property_entry. Fixes: `c3e382ad6d` ("net: txgbe: Add software nodes to support phylink") Signed-off-by: Fabio Baltieri <fabio.baltieri@gmail.com> Tested-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20260405222013.5347-1-fabio.baltieri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:11:37 -07:00
Jakub Kicinski	d65b175cfa	Merge tag 'wireless-2026-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== A few last-minute fixes: - rfkill: prevent boundless event list - rt2x00: fix USB resource management - brcmfmac: validate firmware IDs - brcmsmac: fix DMA free size * tag 'wireless-2026-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: net: rfkill: prevent unlimited numbers of rfkill events from being created wifi: rt2x00usb: fix devres lifetime wifi: brcmfmac: validate bsscfg indices in IF events wifi: brcmsmac: Fix dma_free_coherent() size ==================== Link: https://patch.msgid.link/20260408081802.111623-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:56:17 -07:00
Felix Gu	c09ea768bd	net: mdio: realtek-rtl9300: use scoped device_for_each_child_node loop Switch to device_for_each_child_node_scoped() to auto-release fwnode references on early exit. Fixes: `24e31e4747` ("net: mdio: Add RTL9300 MDIO driver") Signed-off-by: Felix Gu <ustc.gu@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260405-rtl9300-v1-1-08e4499cf944@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:42:08 -07:00
Johan Alvarado	f2777d5cb5	net: stmmac: dwmac-motorcomm: fix eFUSE MAC address read failure This patch fixes an issue where reading the MAC address from the eFUSE fails due to a race condition. The root cause was identified by comparing the driver's behavior with a custom U-Boot port. In U-Boot, the MAC address was read successfully every time because the driver was loaded later in the boot process, giving the hardware ample time to initialize. In Linux, reading the eFUSE immediately returns all zeros, resulting in a fallback to a random MAC address. Hardware cold-boot testing revealed that the eFUSE controller requires a short settling time to load its internal data. Adding a 2000-5000us delay after the reset ensures the hardware is fully ready, allowing the native MAC address to be read consistently. Fixes: `02ff155ea2` ("net: stmmac: Add glue driver for Motorcomm YT6801 ethernet controller") Reported-by: Georg Gottleuber <ggo@tuxedocomputers.com> Closes: https://lore.kernel.org/24cfefff-1233-4745-8c47-812b502d5d19@tuxedocomputers.com Signed-off-by: Johan Alvarado <contact@c127.dev> Reviewed-by: Yao Zi <me@ziyao.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/fc5992a4-9532-49c3-8ec1-c2f8c5b84ca1@smtp-relay.sendinblue.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 18:21:00 -07:00
John Pavlick	95aca8602e	net: sfp: add quirks for Hisense and HSGQ GPON ONT SFP modules Several GPON ONT SFP sticks based on Realtek RTL960x report 1000BASE-LX at 1300MBd in their EEPROM but can operate at 2500base-X. On hosts capable of 2500base-X (e.g. Banana Pi R3 / MT7986), the kernel negotiates only 1G because it trusts the incorrect EEPROM data. Add quirks for: - Hisense-Leox LXT-010S-H - Hisense ZNID-GPON-2311NA - HSGQ HSGQ-XPON-Stick Each quirk advertises 2500base-X and ignores TX_FAULT during the module's ~40s Linux boot time. Tested on Banana Pi R3 (MT7986) with OpenWrt 25.12.1, confirmed 2.5Gbps link and full throughput with flow offloading. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Suggested-by: Marcin Nita <marcin.nita@leolabs.pl> Signed-off-by: John Pavlick <jspavlick@posteo.net> Link: https://patch.msgid.link/20260406132321.72563-1-jspavlick@posteo.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 18:13:51 -07:00
Johan Hovold	25369b2222	wifi: rt2x00usb: fix devres lifetime USB drivers bind to USB interfaces and any device managed resources should have their lifetime tied to the interface rather than parent USB device. This avoids issues like memory leaks when drivers are unbound without their devices being physically disconnected (e.g. on probe deferral or configuration changes). Fix the USB anchor lifetime so that it is released on driver unbind. Fixes: `8b4c000931` ("rt2x00usb: Use usb anchor to manage URB") Cc: stable@vger.kernel.org # 4.7 Cc: Vishal Thanki <vishalthanki@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Stanislaw Gruszka <stf_xl@wp.pl> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260327113219.1313748-1-johan@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-04-07 12:34:52 +02:00
Pengpeng Hou	304950a467	wifi: brcmfmac: validate bsscfg indices in IF events brcmf_fweh_handle_if_event() validates the firmware-provided interface index before it touches drvr->iflist[], but it still uses the raw bsscfgidx field as an array index without a matching range check. Reject IF events whose bsscfg index does not fit in drvr->iflist[] before indexing the interface array. Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://patch.msgid.link/20260323074551.93530-1-pengpeng@iscas.ac.cn [add missing wifi prefix] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-04-07 12:34:20 +02:00
Thomas Fourier	12cd763275	wifi: brcmsmac: Fix dma_free_coherent() size dma_alloc_consistent() may change the size to align it. The new size is saved in alloced. Change the free size to match the allocation size. Fixes: `5b435de0d7` ("net: wireless: add brcm80211 drivers") Cc: <stable@vger.kernel.org> Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://patch.msgid.link/20260218130741.46566-3-fourier.thomas@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-04-07 12:34:03 +02:00
Michael Guralnik	a9d4f4f6e6	net/mlx5: Update the list of the PCI supported devices Add the upcoming ConnectX-10 NVLink-C2C device ID to the table of supported PCI device IDs. Cc: stable@vger.kernel.org Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260403091756.139583-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 19:17:42 -07:00
Agalakov Daniil	d3baa34a47	e1000: check return value of e1000_read_eeprom [Why] e1000_set_eeprom() performs a read-modify-write operation when the write range is not word-aligned. This requires reading the first and last words of the range from the EEPROM to preserve the unmodified bytes. However, the code does not check the return value of e1000_read_eeprom(). If the read fails, the operation continues using uninitialized data from eeprom_buff. This results in corrupted data being written back to the EEPROM for the boundary words. Add the missing error checks and abort the operation if reading fails. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Co-developed-by: Iskhakov Daniil <dish@amicon.ru> Signed-off-by: Iskhakov Daniil <dish@amicon.ru> Signed-off-by: Agalakov Daniil <ade@amicon.ru> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2026-04-06 14:13:38 -07:00
Alex Dvoretsky	b1e0672403	igb: remove napi_synchronize() in igb_down() When an AF_XDP zero-copy application terminates abruptly (e.g., kill -9), the XSK buffer pool is destroyed but NAPI polling continues. igb_clean_rx_irq_zc() repeatedly returns the full budget, preventing napi_complete_done() from clearing NAPI_STATE_SCHED. igb_down() calls napi_synchronize() before napi_disable() for each queue vector. napi_synchronize() spins waiting for NAPI_STATE_SCHED to clear, which never happens. igb_down() blocks indefinitely, the TX watchdog fires, and the TX queue remains permanently stalled. napi_disable() already handles this correctly: it sets NAPI_STATE_DISABLE. After a full-budget poll, __napi_poll() checks napi_disable_pending(). If set, it forces completion and clears NAPI_STATE_SCHED, breaking the loop that napi_synchronize() cannot. napi_synchronize() was added in commit `41f149a285` ("igb: Fix possible panic caused by Rx traffic arrival while interface is down"). napi_disable() provides stronger guarantees: it prevents further scheduling and waits for any active poll to exit. Other Intel drivers (ixgbe, ice, i40e) use napi_disable() without a preceding napi_synchronize() in their down paths. Remove redundant napi_synchronize() call and reorder napi_disable() before igb_set_queue_napi() so the queue-to-NAPI mapping is only cleared after polling has fully stopped. Fixes: `2c6196013f` ("igb: Add AF_XDP zero-copy Rx support") Cc: stable@vger.kernel.org Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Patryk Holda <patryk.holda@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2026-04-06 14:13:19 -07:00
Michal Schmidt	4821d563cd	ixgbevf: add missing negotiate_features op to Hyper-V ops table Commit `a7075f501b` ("ixgbevf: fix mailbox API compatibility by negotiating supported features") added the .negotiate_features callback to ixgbe_mac_operations and populated it in ixgbevf_mac_ops, but forgot to add it to ixgbevf_hv_mac_ops. This leaves the function pointer NULL on Hyper-V VMs. During probe, ixgbevf_negotiate_api() calls ixgbevf_set_features(), which unconditionally dereferences hw->mac.ops.negotiate_features(). On Hyper-V this results in a NULL pointer dereference: BUG: kernel NULL pointer dereference, address: 0000000000000000 [...] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine [...] Workqueue: events work_for_cpu_fn RIP: 0010:0x0 [...] Call Trace: ixgbevf_negotiate_api+0x66/0x160 [ixgbevf] ixgbevf_sw_init+0xe4/0x1f0 [ixgbevf] ixgbevf_probe+0x20f/0x4a0 [ixgbevf] local_pci_probe+0x50/0xa0 work_for_cpu_fn+0x1a/0x30 [...] Add ixgbevf_hv_negotiate_features_vf() that returns -EOPNOTSUPP and wire it into ixgbevf_hv_mac_ops. The caller already handles -EOPNOTSUPP gracefully. Fixes: `a7075f501b` ("ixgbevf: fix mailbox API compatibility by negotiating supported features") Reported-by: Xiaoqiang Xiong <xxiong@redhat.com> Closes: https://issues.redhat.com/browse/RHEL-155455 Assisted-by: Claude:claude-4.6-opus-high Cursor Tested-by: Xiaoqiang Xiong <xxiong@redhat.com> Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2026-04-06 13:39:23 -07:00
Aleksandr Loktionov	d8ae40dc20	ixgbe: stop re-reading flash on every get_drvinfo for e610 ixgbe_get_drvinfo() calls ixgbe_refresh_fw_version() on every ethtool query for e610 adapters. That ends up in ixgbe_discover_flash_size(), which bisects the full 16 MB NVM space issuing one ACI command per step (~20 ms each, ~24 steps total = ~500 ms). Profiling on an idle E610-XAT2 system with telegraf scraping ethtool stats every 10 seconds: kretprobe:ixgbe_get_drvinfo took 527603 us kretprobe:ixgbe_get_drvinfo took 523978 us kretprobe:ixgbe_get_drvinfo took 552975 us kretprobe:ice_get_drvinfo took 3 us kretprobe:igb_get_drvinfo took 2 us kretprobe:i40e_get_drvinfo took 5 us The half-second stall happens under the RTNL lock, causing visible latency on ip-link and friends. The FW version can only change after an EMPR reset. All flash data is already populated at probe time and the cached adapter->eeprom_id is what get_drvinfo should be returning. The only place that needs to trigger a re-read is ixgbe_devlink_reload_empr_finish(), right after the EMPR completes and new firmware is running. Additionally, refresh the FW version in ixgbe_reinit_locked() so that any PF that undergoes a reinit after an EMPR (e.g. triggered by another PF's devlink reload) also picks up the new version in adapter->eeprom_id. ixgbe_devlink_info_get() keeps its refresh call for explicit "devlink dev info" queries, which is fine given those are user-initiated. Fixes: `c9e563cae1` ("ixgbe: add support for devlink reload") Co-developed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2026-04-06 13:39:23 -07:00
Petr Oros	bf6dbadb72	ice: fix PTP timestamping broken by SyncE code on E825C The E825C SyncE support added in commit `ad1df4f2d5` ("ice: dpll: Support E825-C SyncE and dynamic pin discovery") introduced a SyncE reconfiguration block in ice_ptp_link_change() that prevents ice_ptp_port_phy_restart() from being called in several error paths. Without the PHY restart, PTP timestamps stop working after any link change event. There are three ways the PHY restart gets blocked: 1. When DPLL initialization fails (e.g. missing ACPI firmware node properties), ICE_FLAG_DPLL is not set and the function returns early before reaching the PHY restart. 2. When ice_tspll_bypass_mux_active_e825c() fails to read the CGU register, WARN_ON_ONCE fires and the function returns early. 3. When ice_tspll_cfg_synce_ethdiv_e825c() fails to configure the clock divider for an active pin, same early return. SyncE and PTP are independent features. SyncE reconfiguration failures must not prevent the PTP PHY restart that is essential for timestamp recovery after link changes. Fix by making the entire SyncE block conditional on ICE_FLAG_DPLL without an early return, and replacing the WARN_ON_ONCE + return error handling inside the loop with dev_err_once + break. The function always proceeds to ice_ptp_port_phy_restart() regardless of SyncE errors. Fixes: `ad1df4f2d5` ("ice: dpll: Support E825-C SyncE and dynamic pin discovery") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2026-04-06 13:39:23 -07:00
Kohei Enju	bb3f21edc7	ice: ptp: don't WARN when controlling PF is unavailable In VFIO passthrough setups, it is possible to pass through only a PF which doesn't own the source timer. In that case the PTP controlling PF (adapter->ctrl_pf) is never initialized in the VM, so ice_get_ctrl_ptp() returns NULL and triggers WARN_ON() in ice_ptp_setup_pf(). Since this is an expected behavior in that configuration, replace WARN_ON() with an informational message and return -EOPNOTSUPP. Fixes: `e800654e85` ("ice: Use ice_adapter for PTP shared data instead of auxdev") Signed-off-by: Kohei Enju <kohei@enjuk.jp> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2026-04-06 13:39:23 -07:00
Emil Tantilov	8e2a2420e2	idpf: set the payload size before calling the async handler Set the payload size before forwarding the reply to the async handler. Without this, xn->reply_sz will be 0 and idpf_mac_filter_async_handler() will never get past the size check. Fixes: `34c21fa894` ("idpf: implement virtchnl transaction manager") Cc: stable@vger.kernel.org Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Li Li <boolli@google.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2026-04-06 13:39:15 -07:00
Emil Tantilov	d086fae650	idpf: improve locking around idpf_vc_xn_push_free() Protect the set_bit() operation for the free_xn bitmask in idpf_vc_xn_push_free(), to make the locking consistent with rest of the code and avoid potential races in that logic. Fixes: `34c21fa894` ("idpf: implement virtchnl transaction manager") Cc: stable@vger.kernel.org Reported-by: Ray Zhang <sgzhang@google.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2026-04-06 13:39:08 -07:00
Emil Tantilov	5914781182	idpf: fix PREEMPT_RT raw/bh spinlock nesting for async VC handling Switch from using the completion's raw spinlock to a local lock in the idpf_vc_xn struct. The conversion is safe because complete/_all() are called outside the lock and there is no reason to share the completion lock in the current logic. This avoids invalid wait context reported by the kernel due to the async handler taking BH spinlock: [ 805.726977] ============================= [ 805.726991] [ BUG: Invalid wait context ] [ 805.727006] 7.0.0-rc2-net-devq-031026+ #28 Tainted: G S OE [ 805.727026] ----------------------------- [ 805.727038] kworker/u261:0/572 is trying to lock: [ 805.727051] ff190da6a8dbb6a0 (&vport_config->mac_filter_list_lock){+...}-{3:3}, at: idpf_mac_filter_async_handler+0xe9/0x260 [idpf] [ 805.727099] other info that might help us debug this: [ 805.727111] context-{5:5} [ 805.727119] 3 locks held by kworker/u261:0/572: [ 805.727132] #0: ff190da6db3e6148 ((wq_completion)idpf-0000:83:00.0-mbx){+.+.}-{0:0}, at: process_one_work+0x4b5/0x730 [ 805.727163] #1: ff3c6f0a6131fe50 ((work_completion)(&(&adapter->mbx_task)->work)){+.+.}-{0:0}, at: process_one_work+0x1e5/0x730 [ 805.727191] #2: ff190da765190020 (&x->wait#34){+.+.}-{2:2}, at: idpf_recv_mb_msg+0xc8/0x710 [idpf] [ 805.727218] stack backtrace: ... [ 805.727238] Workqueue: idpf-0000:83:00.0-mbx idpf_mbx_task [idpf] [ 805.727247] Call Trace: [ 805.727249] <TASK> [ 805.727251] dump_stack_lvl+0x77/0xb0 [ 805.727259] __lock_acquire+0xb3b/0x2290 [ 805.727268] ? __irq_work_queue_local+0x59/0x130 [ 805.727275] lock_acquire+0xc6/0x2f0 [ 805.727277] ? idpf_mac_filter_async_handler+0xe9/0x260 [idpf] [ 805.727284] ? _printk+0x5b/0x80 [ 805.727290] _raw_spin_lock_bh+0x38/0x50 [ 805.727298] ? idpf_mac_filter_async_handler+0xe9/0x260 [idpf] [ 805.727303] idpf_mac_filter_async_handler+0xe9/0x260 [idpf] [ 805.727310] idpf_recv_mb_msg+0x1c8/0x710 [idpf] [ 805.727317] process_one_work+0x226/0x730 [ 805.727322] worker_thread+0x19e/0x340 [ 805.727325] ? __pfx_worker_thread+0x10/0x10 [ 805.727328] kthread+0xf4/0x130 [ 805.727333] ? __pfx_kthread+0x10/0x10 [ 805.727336] ret_from_fork+0x32c/0x410 [ 805.727345] ? __pfx_kthread+0x10/0x10 [ 805.727347] ret_from_fork_asm+0x1a/0x30 [ 805.727354] </TASK> Fixes: `34c21fa894` ("idpf: implement virtchnl transaction manager") Cc: stable@vger.kernel.org Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reported-by: Ray Zhang <sgzhang@google.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2026-04-06 13:38:48 -07:00
Jon Hunter	1345e9f4e3	net: stmmac: Fix PTP ref clock for Tegra234 Since commit `030ce919e1` ("net: stmmac: make sure that ptp_rate is not 0 before configuring timestamping") was added the following error is observed on Tegra234: ERR KERN tegra-mgbe 6800000.ethernet eth0: Invalid PTP clock rate WARNING KERN tegra-mgbe 6800000.ethernet eth0: PTP init failed It turns out that the Tegra234 device-tree binding defines the PTP ref clock name as 'ptp-ref' and not 'ptp_ref' and the above commit now exposes this and that the PTP clock is not configured correctly. In order to update device-tree to use the correct 'ptp_ref' name, update the Tegra MGBE driver to use 'ptp_ref' by default and fallback to using 'ptp-ref' if this clock name is present. Fixes: `d8ca113724` ("net: stmmac: tegra: Add MGBE support") Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260401102941.17466-2-jonathanh@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 16:02:21 -07:00
Pengpeng Hou	b76254c55d	net: qualcomm: qca_uart: report the consumed byte on RX skb allocation failure qca_tty_receive() consumes each input byte before checking whether a completed frame needs a fresh receive skb. When the current byte completes a frame, the driver delivers that frame and then allocates a new skb for the next one. If that allocation fails, the current code returns i even though data[i] has already been consumed and may already have completed the delivered frame. Since serdev interprets the return value as the number of accepted bytes, this under-reports progress by one byte and can replay the final byte of the completed frame into a fresh parser state on the next call. Return i + 1 in that failure path so the accepted-byte count matches the actual receive-state progress. Fixes: `dfc768fbe6` ("net: qualcomm: add QCA7000 UART driver") Cc: stable@vger.kernel.org Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Reviewed-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260402071207.4036-1-pengpeng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:32:56 -07:00
Lorenzo Bianconi	285fa6b1e0	net: airoha: Fix memory leak in airoha_qdma_rx_process() If an error occurs on the subsequents buffers belonging to the non-linear part of the skb (e.g. due to an error in the payload length reported by the NIC or if we consumed all the available fragments for the skb), the page_pool fragment will not be linked to the skb so it will not return to the pool in the airoha_qdma_rx_process() error path. Fix the memory leak partially reverting commit 'd6d2b0e1538d ("net: airoha: Fix page recycling in airoha_qdma_rx_process()")' and always running page_pool_put_full_page routine in the airoha_qdma_rx_process() error path. Fixes: `d6d2b0e153` ("net: airoha: Fix page recycling in airoha_qdma_rx_process()") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260402-airoha_qdma_rx_process-mem-leak-fix-v1-1-b5706f402d3c@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 14:42:51 -07:00
Eric Dumazet	b120e4432f	net: lapbether: handle NETDEV_PRE_TYPE_CHANGE lapbeth_data_transmit() expects the underlying device type to be ARPHRD_ETHER. Returning NOTIFY_BAD from lapbeth_device_event() makes sure bonding driver can not break this expectation. Fixes: `872254dd6b` ("net/bonding: Enable bonding to enslave non ARPHRD_ETHER") Reported-by: syzbot+d8c285748fa7292580a9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69cd22a1.050a0220.70c3a.0002.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Martin Schiller <ms@dev.tdt.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260402103519.1201565-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 14:40:36 -07:00
Arnd Bergmann	e16a0d3677	net: fec: make FIXED_PHY dependency unconditional When CONFIG_FIXED_PHY is in a loadable module, the fec driver cannot be built-in any more: x86_64-linux-ld: vmlinux.o: in function `fec_enet_mii_probe': fec_main.c:(.text+0xc4f367): undefined reference to `fixed_phy_unregister' x86_64-linux-ld: vmlinux.o: in function `fec_enet_close': fec_main.c:(.text+0xc59591): undefined reference to `fixed_phy_unregister' x86_64-linux-ld: vmlinux.o: in function `fec_enet_mii_probe.cold': Select the fixed phy support on all targets to make this build correctly, not just on coldfire. Notat that Essentially the stub helpers in include/linux/phy_fixed.h cannot be used correctly because of this build time dependency, and we could just remove them to hit the build failure more often when a driver uses them without the 'select FIXED_PHY'. Fixes: `dc86b621e1` ("net: fec: register a fixed phy using fixed_phy_register_100fd if needed") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260402141048.2713445-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 14:39:37 -07:00
Tyllis Xu	51f4e090b9	net: stmmac: fix integer underflow in chain mode The jumbo_frm() chain-mode implementation unconditionally computes len = nopaged_len - bmax; where nopaged_len = skb_headlen(skb) (linear bytes only) and bmax is BUF_SIZE_8KiB or BUF_SIZE_2KiB. However, the caller stmmac_xmit() decides to invoke jumbo_frm() based on skb->len (total length including page fragments): is_jumbo = stmmac_is_jumbo_frm(priv, skb->len, enh_desc); When a packet has a small linear portion (nopaged_len <= bmax) but a large total length due to page fragments (skb->len > bmax), the subtraction wraps as an unsigned integer, producing a huge len value (~0xFFFFxxxx). This causes the while (len != 0) loop to execute hundreds of thousands of iterations, passing skb->data + bmax * i pointers far beyond the skb buffer to dma_map_single(). On IOMMU-less SoCs (the typical deployment for stmmac), this maps arbitrary kernel memory to the DMA engine, constituting a kernel memory disclosure and potential memory corruption from hardware. Fix this by introducing a buf_len local variable clamped to min(nopaged_len, bmax). Computing len = nopaged_len - buf_len is then always safe: it is zero when the linear portion fits within a single descriptor, causing the while (len != 0) loop to be skipped naturally, and the fragment loop in stmmac_xmit() handles page fragments afterward. Fixes: `286a837217` ("stmmac: add CHAINED descriptor mode support (V4)") Cc: stable@vger.kernel.org Signed-off-by: Tyllis Xu <LivelyCarpet87@gmail.com> Link: https://patch.msgid.link/20260401044708.1386919-1-LivelyCarpet87@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:28:10 -07:00
David Carlier	6dede39676	net: altera-tse: fix skb leak on DMA mapping error in tse_start_xmit() When dma_map_single() fails in tse_start_xmit(), the function returns NETDEV_TX_OK without freeing the skb. Since NETDEV_TX_OK tells the stack the packet was consumed, the skb is never freed, leaking memory on every DMA mapping failure. Add dev_kfree_skb_any() before returning to properly free the skb. Fixes: `bbd2190ce9` ("Altera TSE: Add main and header file for Altera Ethernet Driver") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Link: https://patch.msgid.link/20260401211218.279185-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:25:23 -07:00
Dimitri Daskalakis	ec7067e661	eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64 On systems with 64K pages, RX queues will be wedged if users set the descriptor count to the current minimum (16). Fbnic fragments large pages into 4K chunks, and scales down the ring size accordingly. With 64K pages and 16 descriptors, the ring size mask is 0 and will never be filled. 32 descriptors is another special case that wedges the RX rings. Internally, the rings track pages for the head/tail pointers, not page fragments. So with 32 descriptors, there's only 1 usable page as one ring slot is kept empty to disambiguate between an empty/full ring. As a result, the head pointer never advances and the HW stalls after consuming 16 page fragments. Fixes: `0cb4c0a137` ("eth: fbnic: Implement Rx queue alloc/start/stop/free") Signed-off-by: Dimitri Daskalakis <daskald@meta.com> Link: https://patch.msgid.link/20260401162848.2335350-1-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:38:34 -07:00
Pavan Chebbi	071dbfa304	bnxt_en: Restore default stat ctxs for ULP when resource is available During resource reservation, if the L2 driver does not have enough MSIX vectors to provide to the RoCE driver, it sets the stat ctxs for ULP also to 0 so that we don't have to reserve it unnecessarily. However, subsequently the user may reduce L2 rings thereby freeing up some resources that the L2 driver can now earmark for RoCE. In this case, the driver should restore the default ULP stat ctxs to make sure that all RoCE resources are ready for use. The RoCE driver may fail to initialize in this scenario without this fix. Fixes: `d630624ebd` ("bnxt_en: Utilize ulp client resources if RoCE is not registered") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260331065138.948205-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 20:12:56 -07:00
Michael Chan	e4bf81dcad	bnxt_en: Don't assume XDP is never enabled in bnxt_init_dflt_ring_mode() The original code made the assumption that when we set up the initial default ring mode, we must be just loading the driver and XDP cannot be enabled yet. This is not true when the FW goes through a resource or capability change. Resource reservations will be cancelled and reinitialized with XDP already enabled. devlink reload with XDP enabled will also have the same issue. This scenario will cause the ring arithmetic to be all wrong in the bnxt_init_dflt_ring_mode() path causing failure: bnxt_en 0000:a1:00.0 ens2f0np0: bnxt_setup_int_mode err: ffffffea bnxt_en 0000:a1:00.0 ens2f0np0: bnxt_request_irq err: ffffffea bnxt_en 0000:a1:00.0 ens2f0np0: nic open fail (rc: ffffffea) Fix it by properly accounting for XDP in the bnxt_init_dflt_ring_mode() path by using the refactored helper functions in the previous patch. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Fixes: `ec5d31e3c1` ("bnxt_en: Handle firmware reset status during IF_UP.") Fixes: `228ea8c187` ("bnxt_en: implement devlink dev reload driver_reinit") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260331065138.948205-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 20:12:56 -07:00
Michael Chan	ceee35e567	bnxt_en: Refactor some basic ring setup and adjustment logic Refactor out the basic code that trims the default rings, sets up and adjusts XDP TX rings and CP rings. There is no change in behavior. This is to prepare for the next bug fix patch. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260331065138.948205-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 20:12:56 -07:00
Saeed Mahameed	403186400a	net/mlx5: Fix switchdev mode rollback in case of failure If for some internal reason switchdev mode fails, we rollback to legacy mode, before this patch, rollback will unregister the uplink netdev and leave it unregistered causing the below kernel bug. To fix this, we need to avoid netdev unregister by setting the proper rollback flag 'MLX5_PRIV_FLAGS_SWITCH_LEGACY' to indicate legacy mode. devlink (431) used greatest stack depth: 11048 bytes left mlx5_core 0000:00:03.0: E-Switch: Disable: mode(LEGACY), nvfs(0), \ necvfs(0), active vports(0) mlx5_core 0000:00:03.0: E-Switch: Supported tc chains and prios offload mlx5_core 0000:00:03.0: Loading uplink representor for vport 65535 mlx5_core 0000:00:03.0: mlx5_cmd_out_err:816:(pid 456): \ QUERY_HCA_CAP(0x100) op_mod(0x0) failed, \ status bad parameter(0x3), syndrome (0x3a3846), err(-22) mlx5_core 0000:00:03.0 enp0s3np0 (unregistered): Unloading uplink \ representor for vport 65535 ------------[ cut here ]------------ kernel BUG at net/core/dev.c:12070! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 2 UID: 0 PID: 456 Comm: devlink Not tainted 6.16.0-rc3+ \ #9 PREEMPT(voluntary) RIP: 0010:unregister_netdevice_many_notify+0x123/0xae0 ... Call Trace: [ 90.923094] unregister_netdevice_queue+0xad/0xf0 [ 90.923323] unregister_netdev+0x1c/0x40 [ 90.923522] mlx5e_vport_rep_unload+0x61/0xc6 [ 90.923736] esw_offloads_enable+0x8e6/0x920 [ 90.923947] mlx5_eswitch_enable_locked+0x349/0x430 [ 90.924182] ? is_mp_supported+0x57/0xb0 [ 90.924376] mlx5_devlink_eswitch_mode_set+0x167/0x350 [ 90.924628] devlink_nl_eswitch_set_doit+0x6f/0xf0 [ 90.924862] genl_family_rcv_msg_doit+0xe8/0x140 [ 90.925088] genl_rcv_msg+0x18b/0x290 [ 90.925269] ? __pfx_devlink_nl_pre_doit+0x10/0x10 [ 90.925506] ? __pfx_devlink_nl_eswitch_set_doit+0x10/0x10 [ 90.925766] ? __pfx_devlink_nl_post_doit+0x10/0x10 [ 90.926001] ? __pfx_genl_rcv_msg+0x10/0x10 [ 90.926206] netlink_rcv_skb+0x52/0x100 [ 90.926393] genl_rcv+0x28/0x40 [ 90.926557] netlink_unicast+0x27d/0x3d0 [ 90.926749] netlink_sendmsg+0x1f7/0x430 [ 90.926942] __sys_sendto+0x213/0x220 [ 90.927127] ? __sys_recvmsg+0x6a/0xd0 [ 90.927312] __x64_sys_sendto+0x24/0x30 [ 90.927504] do_syscall_64+0x50/0x1c0 [ 90.927687] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 90.927929] RIP: 0033:0x7f7d0363e047 Fixes: `2a4f56fbcc` ("net/mlx5e: Keep netdev when leave switchdev for devlink set legacy only") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260330194015.53585-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 20:10:41 -07:00
Saeed Mahameed	10dc35f6a4	net/mlx5: Avoid "No data available" when FW version queries fail Avoid printing the misleading "kernel answers: No data available" devlink output when querying firmware or pending firmware version fails (e.g. MLX5 fw state errors / flash failures). FW can fail on loading the pending flash image and get its version due to various reasons, examples: mlxfw: Firmware flash failed: key not applicable, err (7) mlx5_fw_image_pending: can't read pending fw version while fw state is 1 and the resulting: $ devlink dev info kernel answers: No data available Instead, just report 0 or 0xfff.. versions in case of failure to indicate a problem, and let other information be shown. after the fix: $ devlink dev info pci/0000:00:06.0: driver mlx5_core serial_number xxx... board.serial_number MT2225300179 versions: fixed: fw.psid MT_0000000436 running: fw.version 22.41.0188 fw 22.41.0188 stored: fw.version 255.255.65535 fw 255.255.65535 Fixes: `9c86b07e30` ("net/mlx5: Added fw version query command") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260330194015.53585-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 20:10:41 -07:00
Shay Drory	bf16bca665	net/mlx5: lag: Check for LAG device before creating debugfs __mlx5_lag_dev_add_mdev() may return 0 (success) even when an error occurs that is handled gracefully. Consequently, the initialization flow proceeds to call mlx5_ldev_add_debugfs() even when there is no valid LAG context. mlx5_ldev_add_debugfs() blindly created the debugfs directory and attributes. This exposed interfaces (like the members file) that rely on a valid ldev pointer, leading to potential NULL pointer dereferences if accessed when ldev is NULL. Add a check to verify that mlx5_lag_dev(dev) returns a valid pointer before attempting to create the debugfs entries. Fixes: `7f46a0b732` ("net/mlx5: Lag, add debugfs to query hardware lag state") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260330194015.53585-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 20:10:40 -07:00
Fedor Pchelkin	f0f367a4f4	net: macb: properly unregister fixed rate clocks The additional resources allocated with clk_register_fixed_rate() need to be released with clk_unregister_fixed_rate(), otherwise they are lost. Fixes: `83a77e9ec4` ("net: macb: Added PCI wrapper for Platform Driver.") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Link: https://patch.msgid.link/20260330184542.626619-2-pchelkin@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 19:57:20 -07:00
Fedor Pchelkin	ce8fe5287b	net: macb: fix clk handling on PCI glue driver removal platform_device_unregister() may still want to use the registered clks during runtime resume callback. Note that there is a commit `d82d5303c4` ("net: macb: fix use after free on rmmod") that addressed the similar problem of clk vs platform device unregistration but just moved the bug to another place. Save the pointers to clks into local variables for reuse after platform device is unregistered. BUG: KASAN: use-after-free in clk_prepare+0x5a/0x60 Read of size 8 at addr ffff888104f85e00 by task modprobe/597 CPU: 2 PID: 597 Comm: modprobe Not tainted 6.1.164+ #114 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x8d/0xba print_report+0x17f/0x496 kasan_report+0xd9/0x180 clk_prepare+0x5a/0x60 macb_runtime_resume+0x13d/0x410 [macb] pm_generic_runtime_resume+0x97/0xd0 __rpm_callback+0xc8/0x4d0 rpm_callback+0xf6/0x230 rpm_resume+0xeeb/0x1a70 __pm_runtime_resume+0xb4/0x170 bus_remove_device+0x2e3/0x4b0 device_del+0x5b3/0xdc0 platform_device_del+0x4e/0x280 platform_device_unregister+0x11/0x50 pci_device_remove+0xae/0x210 device_remove+0xcb/0x180 device_release_driver_internal+0x529/0x770 driver_detach+0xd4/0x1a0 bus_remove_driver+0x135/0x260 driver_unregister+0x72/0xb0 pci_unregister_driver+0x26/0x220 __do_sys_delete_module+0x32e/0x550 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 </TASK> Allocated by task 519: kasan_save_stack+0x2c/0x50 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x8e/0x90 __clk_register+0x458/0x2890 clk_hw_register+0x1a/0x60 __clk_hw_register_fixed_rate+0x255/0x410 clk_register_fixed_rate+0x3c/0xa0 macb_probe+0x1d8/0x42e [macb_pci] local_pci_probe+0xd7/0x190 pci_device_probe+0x252/0x600 really_probe+0x255/0x7f0 __driver_probe_device+0x1ee/0x330 driver_probe_device+0x4c/0x1f0 __driver_attach+0x1df/0x4e0 bus_for_each_dev+0x15d/0x1f0 bus_add_driver+0x486/0x5e0 driver_register+0x23a/0x3d0 do_one_initcall+0xfd/0x4d0 do_init_module+0x18b/0x5a0 load_module+0x5663/0x7950 __do_sys_finit_module+0x101/0x180 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Freed by task 597: kasan_save_stack+0x2c/0x50 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x50 __kasan_slab_free+0x106/0x180 __kmem_cache_free+0xbc/0x320 clk_unregister+0x6de/0x8d0 macb_remove+0x73/0xc0 [macb_pci] pci_device_remove+0xae/0x210 device_remove+0xcb/0x180 device_release_driver_internal+0x529/0x770 driver_detach+0xd4/0x1a0 bus_remove_driver+0x135/0x260 driver_unregister+0x72/0xb0 pci_unregister_driver+0x26/0x220 __do_sys_delete_module+0x32e/0x550 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: `d82d5303c4` ("net: macb: fix use after free on rmmod") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Link: https://patch.msgid.link/20260330184542.626619-1-pchelkin@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 19:57:16 -07:00
Srujana Challa	b4e5f04c58	virtio_net: clamp rss_max_key_size to NETDEV_RSS_KEY_LEN rss_max_key_size in the virtio spec is the maximum key size supported by the device, not a mandatory size the driver must use. Also the value 40 is a spec minimum, not a spec maximum. The current code rejects RSS and can fail probe when the device reports a larger rss_max_key_size than the driver buffer limit. Instead, clamp the effective key length to min(device rss_max_key_size, NETDEV_RSS_KEY_LEN) and keep RSS enabled. This keeps probe working on devices that advertise larger maximum key sizes while respecting the netdev RSS key buffer size limit. Fixes: `3f7d9c1964` ("virtio_net: Add hash_key_length check") Cc: stable@vger.kernel.org Signed-off-by: Srujana Challa <schalla@marvell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20260326142344.1171317-1-schalla@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 19:47:44 -07:00
Michal Piekos	48b3cd6926	net: stmmac: skip VLAN restore when VLAN hash ops are missing stmmac_vlan_restore() unconditionally calls stmmac_vlan_update() when NETIF_F_VLAN_FEATURES is set. On platforms where priv->hw->vlan (or ->update_vlan_hash) is not provided, stmmac_update_vlan_hash() returns -EINVAL via stmmac_do_void_callback(), resulting in a spurious "Failed to restore VLANs" error even when no VLAN filtering is in use. Remove not needed comment. Remove not used return value from stmmac_vlan_restore(). Tested on Orange Pi Zero 3. Fixes: `bd7ad51253` ("net: stmmac: Fix VLAN HW state restore") Signed-off-by: Michal Piekos <michal.piekos@mmpsystems.pl> Link: https://patch.msgid.link/20260328-vlan-restore-error-v4-1-f88624c530dc@mmpsystems.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-31 19:45:26 -07:00
Yufan Chen	c0fd0fe745	net: ftgmac100: fix ring allocation unwind on open failure ftgmac100_alloc_rings() allocates rx_skbs, tx_skbs, rxdes, txdes, and rx_scratch in stages. On intermediate failures it returned -ENOMEM directly, leaking resources allocated earlier in the function. Rework the failure path to use staged local unwind labels and free allocated resources in reverse order before returning -ENOMEM. This matches common netdev allocation cleanup style. Fixes: `d72e01a043` ("ftgmac100: Use a scratch buffer for failed RX allocations") Cc: stable@vger.kernel.org Signed-off-by: Yufan Chen <yufan.chen@linux.dev> Link: https://patch.msgid.link/20260328163257.60836-1-yufan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-31 19:38:36 -07:00
Suraj Gupta	d1978d03e8	net: xilinx: axienet: Fix BQL accounting for multi-BD TX packets When a TX packet spans multiple buffer descriptors (scatter-gather), axienet_free_tx_chain sums the per-BD actual length from descriptor status into a caller-provided accumulator. That sum is reset on each NAPI poll. If the BDs for a single packet complete across different polls, the earlier bytes are lost and never credited to BQL. This causes BQL to think bytes are permanently in-flight, eventually stalling the TX queue. The SKB pointer is stored only on the last BD of a packet. When that BD completes, use skb->len for the byte count instead of summing per-BD status lengths. This matches netdev_sent_queue(), which debits skb->len, and naturally survives across polls because no partial packet contributes to the accumulator. Fixes: `c900e49d58` ("net: xilinx: axienet: Implement BQL") Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20260327073238.134948-3-suraj.gupta2@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-31 12:09:12 +02:00
Suraj Gupta	393e0b4f17	net: xilinx: axienet: Correct BD length masks to match AXIDMA IP spec The XAXIDMA_BD_CTRL_LENGTH_MASK and XAXIDMA_BD_STS_ACTUAL_LEN_MASK macros were defined as 0x007FFFFF (23 bits), but the AXI DMA IP product guide (PG021) specifies the buffer length field as bits 25:0 (26 bits). Update both masks to match the IP documentation. In practice this had no functional impact, since Ethernet frames are far smaller than 2^23 bytes and the extra bits were always zero, but the masks should still reflect the hardware specification. Fixes: `8a3b7a252d` ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20260327073238.134948-2-suraj.gupta2@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-31 12:09:12 +02:00
Xiang Mei	2884bf72fb	net: bonding: fix use-after-free in bond_xmit_broadcast() bond_xmit_broadcast() reuses the original skb for the last slave (determined by bond_is_last_slave()) and clones it for others. Concurrent slave enslave/release can mutate the slave list during RCU-protected iteration, changing which slave is "last" mid-loop. This causes the original skb to be double-consumed (double-freed). Replace the racy bond_is_last_slave() check with a simple index comparison (i + 1 == slaves_count) against the pre-snapshot slave count taken via READ_ONCE() before the loop. This preserves the zero-copy optimization for the last slave while making the "last" determination stable against concurrent list mutations. The UAF can trigger the following crash: ================================================================== BUG: KASAN: slab-use-after-free in skb_clone Read of size 8 at addr ffff888100ef8d40 by task exploit/147 CPU: 1 UID: 0 PID: 147 Comm: exploit Not tainted 7.0.0-rc3+ #4 PREEMPTLAZY Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) print_report (mm/kasan/report.c:379 mm/kasan/report.c:482) kasan_report (mm/kasan/report.c:597) skb_clone (include/linux/skbuff.h:1724 include/linux/skbuff.h:1792 include/linux/skbuff.h:3396 net/core/skbuff.c:2108) bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5334) bond_start_xmit (drivers/net/bonding/bond_main.c:5567 drivers/net/bonding/bond_main.c:5593) dev_hard_start_xmit (include/linux/netdevice.h:5325 include/linux/netdevice.h:5334 net/core/dev.c:3871 net/core/dev.c:3887) __dev_queue_xmit (include/linux/netdevice.h:3601 net/core/dev.c:4838) ip6_finish_output2 (include/net/neighbour.h:540 include/net/neighbour.h:554 net/ipv6/ip6_output.c:136) ip6_finish_output (net/ipv6/ip6_output.c:208 net/ipv6/ip6_output.c:219) ip6_output (net/ipv6/ip6_output.c:250) ip6_send_skb (net/ipv6/ip6_output.c:1985) udp_v6_send_skb (net/ipv6/udp.c:1442) udpv6_sendmsg (net/ipv6/udp.c:1733) __sys_sendto (net/socket.c:730 net/socket.c:742 net/socket.c:2206) __x64_sys_sendto (net/socket.c:2209) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) </TASK> Allocated by task 147: Freed by task 147: The buggy address belongs to the object at ffff888100ef8c80 which belongs to the cache skbuff_head_cache of size 224 The buggy address is located 192 bytes inside of freed 224-byte region [ffff888100ef8c80, ffff888100ef8d60) Memory state around the buggy address: ffff888100ef8c00: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ffff888100ef8c80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888100ef8d00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ^ ffff888100ef8d80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ffff888100ef8e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: `4e5bd03ae3` ("net: bonding: fix bond_xmit_broadcast return value error bug") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260326075553.3960562-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-31 10:49:25 +02:00
Pengpeng Hou	4ee937107d	bnxt_en: set backing store type from query type bnxt_hwrm_func_backing_store_qcaps_v2() stores resp->type from the firmware response in ctxm->type and later uses that value to index fixed backing-store metadata arrays such as ctx_arr[] and bnxt_bstore_to_trace[]. ctxm->type is fixed by the current backing-store query type and matches the array index of ctx->ctx_arr. Set ctxm->type from the current loop variable instead of depending on resp->type. Also update the loop to advance type from next_valid_type in the for statement, which keeps the control flow simpler for non-valid and unchanged entries. Fixes: `6a4d0774f0` ("bnxt_en: Add support for new backing store query firmware API") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Tested-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260328234357.43669-1-pengpeng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-30 17:59:13 -07:00
Lorenzo Bianconi	cedc1bf327	net: airoha: Delay offloading until all net_devices are fully registered Netfilter flowtable can theoretically try to offload flower rules as soon as a net_device is registered while all the other ones are not registered or initialized, triggering a possible NULL pointer dereferencing of qdma pointer in airoha_ppe_set_cpu_port routine. Moreover, if register_netdev() fails for a particular net_device, there is a small race if Netfilter tries to offload flowtable rules before all the net_devices are properly unregistered in airoha_probe() error patch, triggering a NULL pointer dereferencing in airoha_ppe_set_cpu_port routine. In order to avoid any possible race, delay offloading until all net_devices are registered in the networking subsystem. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260329-airoha-regiser-race-fix-v2-1-f4ebb139277b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-30 17:58:40 -07:00

1 2 3 4 5 ...

138078 Commits