For ENETC v4 these settings are controlled by the global ENETC
command cache attribute registers (EnCAR), from the IERB register
block.
The hardcoded CDBR cacheability settings were inherited from LS1028A,
and should be removed from the ENETC v4 driver as they conflict
with the global IERB settings.
Fixes: e3f4a0a8dd ("net: enetc: add command BD ring support for i.MX95 ENETC")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260130141035.272471-3-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For ENETC v4 these settings are controlled by the global ENETC
message and buffer cache attribute registers (EnBCAR and EnMCAR),
from the IERB register block.
The hardcoded cacheability settings were inherited from LS1028A,
and should be removed from the ENETC v4 driver as they conflict
with the global IERB settings.
Fixes: 99100d0d99 ("net: enetc: add preliminary support for i.MX95 ENETC PF")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260130141035.272471-2-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marek reported that suspending stm32 causes the following errors when
the interface is administratively down:
$ echo devices > /sys/power/pm_test
$ echo mem > /sys/power/state
...
ck_ker_eth2stp already disabled
...
ck_ker_eth2stp already unprepared
...
On suspend, stm32 starts the eth2stp clock in its suspend method, and
stops it in the resume method. This is because the blamed commit omits
the call to the platform glue ->suspend() method, but does make the
call to the platform glue ->resume() method.
This problem affects all other converted drivers as well - e.g. looking
at the PCIe drivers, pci_save_state() will not be called, but
pci_restore_state() will be. Similar issues affect all other drivers.
Fix this by always calling the ->suspend() method, even when the network
interface is down. This fixes all the conversions to the platform glue
->suspend() and ->resume() methods.
Link: https://lore.kernel.org/r/20260114081809.12758-1-marex@nabladev.com
Fixes: 07bbbfe7ad ("net: stmmac: add suspend()/resume() platform ops")
Reported-by: Marek Vasut <marex@nabladev.com>
Tested-by: Marek Vasut <marex@nabladev.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vlujh-00000007Hkw-2p6r@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rtl8152 can trigger device reset during reset which
potentially can result in a deadlock:
**** DPM device timeout after 10 seconds; 15 seconds until panic ****
Call Trace:
<TASK>
schedule+0x483/0x1370
schedule_preempt_disabled+0x15/0x30
__mutex_lock_common+0x1fd/0x470
__rtl8152_set_mac_address+0x80/0x1f0
dev_set_mac_address+0x7f/0x150
rtl8152_post_reset+0x72/0x150
usb_reset_device+0x1d0/0x220
rtl8152_resume+0x99/0xc0
usb_resume_interface+0x3e/0xc0
usb_resume_both+0x104/0x150
usb_resume+0x22/0x110
The problem is that rtl8152 resume calls reset under
tp->control mutex while reset basically re-enters rtl8152
and attempts to acquire the same tp->control lock once
again.
Reset INACCESSIBLE device outside of tp->control mutex
scope to avoid recursive mutex_lock() deadlock.
Fixes: 4933b066fe ("r8152: If inaccessible at resume time, issue a reset")
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Link: https://patch.msgid.link/20260129031106.3805887-1-senozhatsky@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
valis provided a nice repro to crash the kernel:
ip link add p1 type veth peer p2
ip link set address 00:00:00:00:00:20 dev p1
ip link set up dev p1
ip link set up dev p2
ip link add mv0 link p2 type macvlan mode source
ip link add invalid% link p2 type macvlan mode source macaddr add 00:00:00:00:00:20
ping -c1 -I p1 1.2.3.4
He also gave a very detailed analysis:
<quote valis>
The issue is triggered when a new macvlan link is created with
MACVLAN_MODE_SOURCE mode and MACVLAN_MACADDR_ADD (or
MACVLAN_MACADDR_SET) parameter, lower device already has a macvlan
port and register_netdevice() called from macvlan_common_newlink()
fails (e.g. because of the invalid link name).
In this case macvlan_hash_add_source is called from
macvlan_change_sources() / macvlan_common_newlink():
This adds a reference to vlan to the port's vlan_source_hash using
macvlan_source_entry.
vlan is a pointer to the priv data of the link that is being created.
When register_netdevice() fails, the error is returned from
macvlan_newlink() to rtnl_newlink_create():
if (ops->newlink)
err = ops->newlink(dev, ¶ms, extack);
else
err = register_netdevice(dev);
if (err < 0) {
free_netdev(dev);
goto out;
}
and free_netdev() is called, causing a kvfree() on the struct
net_device that is still referenced in the source entry attached to
the lower device's macvlan port.
Now all packets sent on the macvlan port with a matching source mac
address will trigger a use-after-free in macvlan_forward_source().
</quote valis>
With all that, my fix is to make sure we call macvlan_flush_sources()
regardless of @create value whenever "goto destroy_macvlan_port;"
path is taken.
Many thanks to valis for following up on this issue.
Fixes: aa5fd0fb77 ("driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: valis <sec@valis.email>
Reported-by: syzbot+7182fbe91e58602ec1fe@syzkaller.appspotmail.com
Closes: https: //lore.kernel.org/netdev/695fb1e8.050a0220.1c677c.039f.GAE@google.com/T/#u
Cc: Boudewijn van der Heide <boudewijn@delta-utec.com>
Link: https://patch.msgid.link/20260129204359.632556-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit fd580c9830 ("net: sfp: augment SFP parsing with
phy_interface_t bitmap") did not add augumentation for the interface
bitmap in the quirk for Ubiquiti U-Fiber Instant.
The subsequent commit f81fa96d8a ("net: phylink: use
phy_interface_t bitmaps for optical modules") then changed phylink code
for selection of SFP interface: instead of using link mode bitmap, the
interface bitmap is used, and the fastest interface mode supported by
both SFP module and MAC is chosen.
Since the interface bitmap contains also modes faster than 1000base-x,
this caused a regression wherein this module stopped working
out-of-the-box.
Fix this.
Fixes: fd580c9830 ("net: sfp: augment SFP parsing with phy_interface_t bitmap")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260129082227.17443-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The IRQ handler extracts if_id from the upper 16 bits of the hardware
status register and uses it to index into ethsw->ports[] without
validation. Since if_id can be any 16-bit value (0-65535) but the ports
array is only allocated with sw_attr.num_ifs elements, this can lead to
an out-of-bounds read potentially.
Add a bounds check before accessing the array, consistent with the
existing validation in dpaa2_switch_rx().
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Reported-by: Junrui Luo <moonafterrain@outlook.com>
Fixes: 24ab724f8a ("dpaa2-switch: use the port index in the IRQ handler")
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Link: https://patch.msgid.link/SYBPR01MB7881D420AB43FF1A227B84AFAF91A@SYBPR01MB7881.ausprd01.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In setup_nic_devices(), the initialization loop jumps to the label
setup_nic_dev_free on failure. The current cleanup loop while(i--)
skip the failing index i, causing a memory leak.
Fix this by changing the loop to iterate from the current index i
down to 0.
Compile tested only. Issue found using code review.
Fixes: 846b46873e ("liquidio CN23XX: VF offload features")
Suggested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260128154440.278369-4-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In setup_nic_devices(), the initialization loop jumps to the label
setup_nic_dev_free on failure. The current cleanup loop while(i--)
skip the failing index i, causing a memory leak.
Fix this by changing the loop to iterate from the current index i
down to 0.
Also, decrement i in the devlink_alloc failure path to point to the
last successfully allocated index.
Compile tested only. Issue found using code review.
Fixes: f21fb3ed36 ("Add support of Cavium Liquidio ethernet adapters")
Suggested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260128154440.278369-3-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In setup_nic_devices(), the netdev is allocated using alloc_etherdev_mq().
However, the pointer to this structure is stored in oct->props[i].netdev
only after the calls to netif_set_real_num_rx_queues() and
netif_set_real_num_tx_queues().
If either of these functions fails, setup_nic_devices() returns an error
without freeing the allocated netdev. Since oct->props[i].netdev is still
NULL at this point, the cleanup function liquidio_destroy_nic_device()
will fail to find and free the netdev, resulting in a memory leak.
Fix this by initializing oct->props[i].netdev before calling the queue
setup functions. This ensures that the netdev is properly accessible for
cleanup in case of errors.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Fixes: c33c997346 ("liquidio: enhanced ethtool --set-channels feature")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260128154440.278369-2-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver allocates arrays for ports, FDBs, and filter blocks using
kcalloc() with ethsw->sw_attr.num_ifs as the element count. When the
device reports zero interfaces (either due to hardware configuration
or firmware issues), kcalloc(0, ...) returns ZERO_SIZE_PTR (0x10)
instead of NULL.
Later in dpaa2_switch_probe(), the NAPI initialization unconditionally
accesses ethsw->ports[0]->netdev, which attempts to dereference
ZERO_SIZE_PTR (address 0x10), resulting in a kernel panic.
Add a check to ensure num_ifs is greater than zero after retrieving
device attributes. This prevents the zero-sized allocations and
subsequent invalid pointer dereference.
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Reported-by: Junrui Luo <moonafterrain@outlook.com>
Fixes: 0b1b713704 ("staging: dpaa2-switch: handle Rx path on control interface")
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/SYBPR01MB7881BEABA8DA896947962470AF91A@SYBPR01MB7881.ausprd01.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit a5e400a985 ("net/mlx5e: Honor user choice of IPsec replay
window size") introduced logic to setup the ESN replay window size.
This logic is only valid for packet offload.
However, the check to skip this block only covered outbound offloads.
It was not skipped for crypto offload, causing it to fall through to
the new switch statement and trigger its WARN_ON default case (for
instance, if a window larger than 256 bits was configured).
Fix this by amending the condition to also skip the replay window
setup if the offload type is not XFRM_DEV_OFFLOAD_PACKET.
Fixes: a5e400a985 ("net/mlx5e: Honor user choice of IPsec replay window size")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1769503961-124173-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the commit 25c6a5ab15 ("net: phy: micrel: Dynamically control
external clock of KSZ PHY"), the clock of Micrel PHY has been enabled
by phy_driver::resume() and disabled by phy_driver::suspend(). However,
devm_clk_get_optional_enabled() is used in kszphy_probe(), so the clock
will automatically be disabled when the device is unbound from the bus.
Therefore, this could cause the clock to be disabled twice, resulting
in clk driver warnings.
For example, this issue can be reproduced on i.MX6ULL platform, and we
can see the following logs when removing the FEC MAC drivers.
$ echo 2188000.ethernet > /sys/bus/platform/drivers/fec/unbind
$ echo 20b4000.ethernet > /sys/bus/platform/drivers/fec/unbind
[ 109.758207] ------------[ cut here ]------------
[ 109.758240] WARNING: drivers/clk/clk.c:1188 at clk_core_disable+0xb4/0xd0, CPU#0: sh/639
[ 109.771011] enet2_ref already disabled
[ 109.793359] Call trace:
[ 109.822006] clk_core_disable from clk_disable+0x28/0x34
[ 109.827340] clk_disable from clk_disable_unprepare+0xc/0x18
[ 109.833029] clk_disable_unprepare from devm_clk_release+0x1c/0x28
[ 109.839241] devm_clk_release from devres_release_all+0x98/0x100
[ 109.845278] devres_release_all from device_unbind_cleanup+0xc/0x70
[ 109.851571] device_unbind_cleanup from device_release_driver_internal+0x1a4/0x1f4
[ 109.859170] device_release_driver_internal from bus_remove_device+0xbc/0xe4
[ 109.866243] bus_remove_device from device_del+0x140/0x458
[ 109.871757] device_del from phy_mdio_device_remove+0xc/0x24
[ 109.877452] phy_mdio_device_remove from mdiobus_unregister+0x40/0xac
[ 109.883918] mdiobus_unregister from fec_enet_mii_remove+0x40/0x78
[ 109.890125] fec_enet_mii_remove from fec_drv_remove+0x4c/0x158
[ 109.896076] fec_drv_remove from device_release_driver_internal+0x17c/0x1f4
[ 109.962748] WARNING: drivers/clk/clk.c:1047 at clk_core_unprepare+0xfc/0x13c, CPU#0: sh/639
[ 109.975805] enet2_ref already unprepared
[ 110.002866] Call trace:
[ 110.031758] clk_core_unprepare from clk_unprepare+0x24/0x2c
[ 110.037440] clk_unprepare from devm_clk_release+0x1c/0x28
[ 110.042957] devm_clk_release from devres_release_all+0x98/0x100
[ 110.048989] devres_release_all from device_unbind_cleanup+0xc/0x70
[ 110.055280] device_unbind_cleanup from device_release_driver_internal+0x1a4/0x1f4
[ 110.062877] device_release_driver_internal from bus_remove_device+0xbc/0xe4
[ 110.069950] bus_remove_device from device_del+0x140/0x458
[ 110.075469] device_del from phy_mdio_device_remove+0xc/0x24
[ 110.081165] phy_mdio_device_remove from mdiobus_unregister+0x40/0xac
[ 110.087632] mdiobus_unregister from fec_enet_mii_remove+0x40/0x78
[ 110.093836] fec_enet_mii_remove from fec_drv_remove+0x4c/0x158
[ 110.099782] fec_drv_remove from device_release_driver_internal+0x17c/0x1f4
After analyzing the process of removing the FEC driver, as shown below,
it can be seen that the clock was disabled twice by the PHY driver.
fec_drv_remove()
--> fec_enet_close()
--> phy_stop()
--> phy_suspend()
--> kszphy_suspend() #1 The clock is disabled
--> fec_enet_mii_remove()
--> mdiobus_unregister()
--> phy_mdio_device_remove()
--> device_del()
--> devm_clk_release() #2 The clock is disabled again
Therefore, devm_clk_get_optional() is used to fix the above issue. And
to avoid the issue mentioned by the commit 9853294627 ("net: phy:
micrel: use devm_clk_get_optional_enabled for the rmii-ref clock"), the
clock is enabled by clk_prepare_enable() to get the correct clock rate.
Fixes: 25c6a5ab15 ("net: phy: micrel: Dynamically control external clock of KSZ PHY")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260126081544.983517-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-01-27 (ixgbe, ice)
For ixgbe:
Kohei Enju adjusts the cleanup path on firmware error to resolve some
memory leaks and removes an instance of double init, free on ACI mutex.
For ice:
Aaron Ma adds NULL checks for q_vectors to avoid NULL pointer
dereference.
Jesse Brandeburg removes UDP checksum mismatch from being counted in Rx
errors.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: stop counting UDP csum mismatch as rx_errors
ice: Fix NULL pointer dereference in ice_vsi_set_napi_queues
ixgbe: don't initialize aci lock in ixgbe_recovery_probe()
ixgbe: fix memory leaks in the ixgbe_recovery_probe() path
====================
Link: https://patch.msgid.link/20260127223047.3979404-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If timestamping is supported, GVE reads the clock during probe,
which can fail for various reasons. Previously, this failure would
abort the driver probe, rendering the device unusable. This behavior
has been observed on production GCP VMs, causing driver initialization
to fail completely.
This patch allows the driver to degrade gracefully. If gve_init_clock()
fails, it logs a warning and continues loading the driver without PTP
support.
Cc: stable@vger.kernel.org
Fixes: a479a27f4d ("gve: Move gve_init_clock to after AQ CONFIGURE_DEVICE_RESOURCES call")
Signed-off-by: Jordan Rhee <jordanrhee@google.com>
Reviewed-by: Shachar Raindel <shacharr@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20260127010210.969823-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver's ndo_get_stats64 callback is only reporting mlx5 counters,
without accounting for the netdev stats, causing errors from the network
stack to be invisible in statistics.
Add netdev_stats_to_stats64() call to first populate the counters, then
add mlx5 counters on top, ensuring both are accounted for (where
appropriate).
Fixes: f62b8bb8f2 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1769411695-18820-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It is possible to unbind the uplink ETH driver while the E-Switch is
in switchdev mode. This leads to netdevice reference counting issues[1],
as the driver removal path was not designed to clean up from this state.
During uplink ETH driver removal (_mlx5e_remove), the code now waits for
any concurrent E-Switch mode transition to finish. It then removes the
REPs auxiliary device, if exists. This ensures a graceful cleanup.
[1]
unregister_netdevice: waiting for eth2 to become free. Usage count = 2
ref_tracker: netdev@00000000c912e04b has 1/1 users at
ib_device_set_netdev+0x130/0x270 [ib_core]
mlx5_ib_vport_rep_load+0xf4/0x3e0 [mlx5_ib]
mlx5_esw_offloads_rep_load+0xc7/0xe0 [mlx5_core]
esw_offloads_enable+0x583/0x900 [mlx5_core]
mlx5_eswitch_enable_locked+0x1b2/0x290 [mlx5_core]
mlx5_devlink_eswitch_mode_set+0x107/0x3e0 [mlx5_core]
devlink_nl_eswitch_set_doit+0x60/0xd0
genl_family_rcv_msg_doit+0xe0/0x130
genl_rcv_msg+0x183/0x290
netlink_rcv_skb+0x4b/0xf0
genl_rcv+0x24/0x40
netlink_unicast+0x255/0x380
netlink_sendmsg+0x1f3/0x420
__sock_sendmsg+0x38/0x60
__sys_sendto+0x119/0x180
__x64_sys_sendto+0x20/0x30
Fixes: 7a9fb35e8c ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1769411695-18820-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the beginning, the Intel ice driver has counted receive checksum
offload mismatches into the rx_errors member of the rtnl_link_stats64
struct. In ethtool -S these show up as rx_csum_bad.nic.
I believe counting these in rx_errors is fundamentally wrong, as it's
pretty clear from the comments in if_link.h and from every other statistic
the driver is summing into rx_errors, that all of them would cause a
"hardware drop" except for the UDP checksum mismatch, as well as the fact
that all the other causes for rx_errors are L2 reasons, and this L4 UDP
"mismatch" is an outlier.
A last nail in the coffin is that rx_errors is monitored in production and
can indicate a bad NIC/cable/Switch port, but instead some random series of
UDP packets with bad checksums will now trigger this alert. This false
positive makes the alert useless and affects us as well as other companies.
This packet with presumably a bad UDP checksum is *already* passed to the
stack, just not marked as offloaded by the hardware/driver. If it is
dropped by the stack it will show up as UDP_MIB_CSUMERRORS.
And one more thing, none of the other Intel drivers, and at least bnxt_en
and mlx5 both don't appear to count UDP offload mismatches as rx_errors.
Here is a related customer complaint:
https://community.intel.com/t5/Ethernet-Products/ice-rx-errros-is-too-sensitive-to-IP-TCP-attack-packets-Intel/td-p/1662125
Fixes: 4f1fe43c92 ("ice: Add more Rx errors to netdev's rx_error counter")
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Jake Keller <jacob.e.keller@intel.com>
Cc: IWL <intel-wired-lan@lists.osuosl.org>
Signed-off-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When ixgbe_recovery_probe() is invoked and this function fails,
allocated resources in advance are not completely freed, because
ixgbe_probe() returns ixgbe_recovery_probe() directly and
ixgbe_recovery_probe() only frees partial resources, resulting in memory
leaks including:
- adapter->io_addr
- adapter->jump_tables[0]
- adapter->mac_table
- adapter->rss_key
- adapter->af_xdp_zc_qps
The leaked MMIO region can be observed in /proc/vmallocinfo, and the
remaining leaks are reported by kmemleak.
Don't return ixgbe_recovery_probe() directly, and instead let
ixgbe_probe() to clean up resources on failures.
Fixes: 29cb3b8d95 ("ixgbe: add E610 implementation of FW recovery mode")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix a use-after-free which happens due to enslave failure after the new
slave has been added to the array. Since the new slave can be used for Tx
immediately, we can use it after it has been freed by the enslave error
cleanup path which frees the allocated slave memory. Slave update array is
supposed to be called last when further enslave failures are not expected.
Move it after xdp setup to avoid any problems.
It is very easy to reproduce the problem with a simple xdp_pass prog:
ip l add bond1 type bond mode balance-xor
ip l set bond1 up
ip l set dev bond1 xdp object xdp_pass.o sec xdp_pass
ip l add dumdum type dummy
Then run in parallel:
while :; do ip l set dumdum master bond1 1>/dev/null 2>&1; done;
mausezahn bond1 -a own -b rand -A rand -B 1.1.1.1 -c 0 -t tcp "dp=1-1023, flags=syn"
The crash happens almost immediately:
[ 605.602850] Oops: general protection fault, probably for non-canonical address 0xe0e6fc2460000137: 0000 [#1] SMP KASAN NOPTI
[ 605.602916] KASAN: maybe wild-memory-access in range [0x07380123000009b8-0x07380123000009bf]
[ 605.602946] CPU: 0 UID: 0 PID: 2445 Comm: mausezahn Kdump: loaded Tainted: G B 6.19.0-rc6+ #21 PREEMPT(voluntary)
[ 605.602979] Tainted: [B]=BAD_PAGE
[ 605.602998] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 605.603032] RIP: 0010:netdev_core_pick_tx+0xcd/0x210
[ 605.603063] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 3e 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 6b 08 49 8d 7d 30 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 25 01 00 00 49 8b 45 30 4c 89 e2 48 89 ee 48 89
[ 605.603111] RSP: 0018:ffff88817b9af348 EFLAGS: 00010213
[ 605.603145] RAX: dffffc0000000000 RBX: ffff88817d28b420 RCX: 0000000000000000
[ 605.603172] RDX: 00e7002460000137 RSI: 0000000000000008 RDI: 07380123000009be
[ 605.603199] RBP: ffff88817b541a00 R08: 0000000000000001 R09: fffffbfff3ed8c0c
[ 605.603226] R10: ffffffff9f6c6067 R11: 0000000000000001 R12: 0000000000000000
[ 605.603253] R13: 073801230000098e R14: ffff88817d28b448 R15: ffff88817b541a84
[ 605.603286] FS: 00007f6570ef67c0(0000) GS:ffff888221dfa000(0000) knlGS:0000000000000000
[ 605.603319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 605.603343] CR2: 00007f65712fae40 CR3: 000000011371b000 CR4: 0000000000350ef0
[ 605.603373] Call Trace:
[ 605.603392] <TASK>
[ 605.603410] __dev_queue_xmit+0x448/0x32a0
[ 605.603434] ? __pfx_vprintk_emit+0x10/0x10
[ 605.603461] ? __pfx_vprintk_emit+0x10/0x10
[ 605.603484] ? __pfx___dev_queue_xmit+0x10/0x10
[ 605.603507] ? bond_start_xmit+0xbfb/0xc20 [bonding]
[ 605.603546] ? _printk+0xcb/0x100
[ 605.603566] ? __pfx__printk+0x10/0x10
[ 605.603589] ? bond_start_xmit+0xbfb/0xc20 [bonding]
[ 605.603627] ? add_taint+0x5e/0x70
[ 605.603648] ? add_taint+0x2a/0x70
[ 605.603670] ? end_report.cold+0x51/0x75
[ 605.603693] ? bond_start_xmit+0xbfb/0xc20 [bonding]
[ 605.603731] bond_start_xmit+0x623/0xc20 [bonding]
Fixes: 9e2ee5c7e7 ("net, bonding: Add XDP support to the bonding driver")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reported-by: Chen Zhen <chenzhen126@huawei.com>
Closes: https://lore.kernel.org/netdev/fae17c21-4940-5605-85b2-1d5e17342358@huawei.com/
CC: Jussi Maki <joamaki@gmail.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260123120659.571187-1-razor@blackwall.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Some PHYs stop the refclk for power saving, usually while link down.
This causes reading stats to time out.
Therefore, in emac_stats_update(), also don't update and reschedule if
!netif_carrier_ok(). But that means we could be missing later updates if
the link comes back up, so also reschedule when link up is detected in
emac_adjust_link().
While we're at it, improve the comments and error message prints around
this to reflect the better understanding of how this could happen.
Hopefully if this happens again on new hardware, these comments will
direct towards a solution.
Closes: https://lore.kernel.org/r/20260119141620.1318102-1-amadeus@jmu.edu.cn/
Fixes: bfec6d7f20 ("net: spacemit: Add K1 Ethernet MAC")
Co-developed-by: Chukun Pan <amadeus@jmu.edu.cn>
Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20260123-k1-ethernet-clarify-stat-timeout-v3-1-93b9df627e87@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In rocker_world_port_pre_init(), rocker_port->wpriv is allocated with
kzalloc(wops->port_priv_size, GFP_KERNEL). However, in
rocker_world_port_post_fini(), the memory is only freed when
wops->port_post_fini callback is set:
if (!wops->port_post_fini)
return;
wops->port_post_fini(rocker_port);
kfree(rocker_port->wpriv);
Since rocker_ofdpa_ops does not implement port_post_fini callback
(it is NULL), the wpriv memory allocated for each port is never freed
when ports are removed. This leads to a memory leak of
sizeof(struct ofdpa_port) bytes per port on every device removal.
Fix this by always calling kfree(rocker_port->wpriv) regardless of
whether the port_post_fini callback exists.
Fixes: e420114eef ("rocker: introduce worlds infrastructure")
Signed-off-by: Kery Qi <qikeyu2017@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260123211030.2109-2-qikeyu2017@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function mlx5_esw_vport_vhca_id() is declared to return bool,
but returns -EOPNOTSUPP (-45), which is an int error code. This
causes a signedness bug as reported by smatch.
This patch fixes this smatch report:
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:981 mlx5_esw_vport_vhca_id()
warn: signedness bug returning '(-45)'
Fixes: 1baf304265 ("net/mlx5: E-Switch, Set/Query hca cap via vhca id")
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Zeng Chi <zengchi@kylinos.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260123085749.1401969-1-zeng_chi911@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When receiving data in the DPMAIF RX path,
the t7xx_dpmaif_set_frag_to_skb() function adds
page fragments to an skb without checking if the number of
fragments has exceeded MAX_SKB_FRAGS. This could lead to a buffer overflow
in skb_shinfo(skb)->frags[] array, corrupting adjacent memory and
potentially causing kernel crashes or other undefined behavior.
This issue was identified through static code analysis by comparing with a
similar vulnerability fixed in the mt76 driver commit b102f0c522 ("mt76:
fix array overflow on receiving too many fragments for a packet").
The vulnerability could be triggered if the modem firmware sends packets
with excessive fragments. While under normal protocol conditions (MTU 3080
bytes, BAT buffer 3584 bytes),
a single packet should not require additional
fragments, the kernel should not blindly trust firmware behavior.
Malicious, buggy, or compromised firmware could potentially craft packets
with more fragments than the kernel expects.
Fix this by adding a bounds check before calling skb_add_rx_frag() to
ensure nr_frags does not exceed MAX_SKB_FRAGS.
The check must be performed before unmapping to avoid a page leak
and double DMA unmap during device teardown.
Fixes: d642b012df ("net: wwan: t7xx: Add data path interface")
Signed-off-by: Kery Qi <qikeyu2017@gmail.com>
Link: https://patch.msgid.link/20260122170401.1986-2-qikeyu2017@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In mvpp2_ethtool_cls_rule_ins(), the ethtool_rule is allocated by
ethtool_rx_flow_rule_create(). If the subsequent conversion to flow
type fails, the function jumps to the clean_rule label.
However, the clean_rule label only frees efs, skipping the cleanup
of ethtool_rule, which leads to a memory leak.
Fix this by jumping to the clean_eth_rule label, which properly calls
ethtool_rx_flow_rule_destroy() before freeing efs.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Fixes: f4f1ba1819 ("net: mvpp2: cls: Report an error for unsupported flow types")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260123065716.2248324-1-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde says:
====================
pull-request: can 2026-01-23
The first patch is by Zilin Guan and fixes a memory leak in the error
path of the at91_can driver's probe function.
The last patch is by me and fixes yet another error in the gs_usb's
gs_usb_receive_bulk_callback() function.
* tag 'linux-can-fixes-for-6.19-20260123' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: gs_usb: gs_usb_receive_bulk_callback(): fix error message
can: at91_can: Fix memory leak in at91_can_probe()
====================
Link: https://patch.msgid.link/20260123173241.1026226-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In octep_device_setup(), if octep_ctrl_net_init() fails, the function
returns directly without unmapping the mapped resources and freeing the
allocated configuration memory.
Fix this by jumping to the unsupported_dev label, which performs the
necessary cleanup. This aligns with the error handling logic of other
paths in this function.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Fixes: 577f0d1b1c ("octeon_ep: add separate mailbox command and response queues")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260121130551.3717090-1-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In esw_acl_ingress_lgcy_setup(), if esw_acl_table_create() fails,
the function returns directly without releasing the previously
created counter, leading to a memory leak.
Fix this by jumping to the out label instead of returning directly,
which aligns with the error handling logic of other paths in this
function.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Fixes: 07bab95026 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260120134640.2717808-1-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In at91_can_probe(), the dev structure is allocated via alloc_candev().
However, if the subsequent call to devm_phy_optional_get() fails, the
code jumps directly to exit_iounmap, missing the call to free_candev().
This results in a memory leak of the allocated net_device structure.
Fix this by jumping to the exit_free label instead, which ensures that
free_candev() is called to properly release the memory.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Fixes: 3ecc09856a ("can: at91_can: add CAN transceiver support")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Link: https://patch.msgid.link/20260122114128.643752-1-zilin@seu.edu.cn
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Johannes Berg says:
====================
Another set of updates:
- various small fixes for ath10k/ath12k/mwifiex/rsi
- cfg80211 fix for HE bitrate overflow
- mac80211 fixes
- S1G beacon handling in scan
- skb tailroom handling for HW encryption
- CSA fix for multi-link
- handling of disabled links during association
* tag 'wireless-2026-11-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: ignore link disabled flag from userspace
wifi: mac80211: apply advertised TTLM from association response
wifi: mac80211: parse all TTLM entries
wifi: mac80211: don't increment crypto_tx_tailroom_needed_cnt twice
wifi: mac80211: don't perform DA check on S1G beacon
wifi: ath12k: Fix wrong P2P device link id issue
wifi: ath12k: fix dead lock while flushing management frames
wifi: ath12k: Fix scan state stuck in ABORTING after cancel_remain_on_channel
wifi: ath12k: cancel scan only on active scan vdev
wifi: mwifiex: Fix a loop in mwifiex_update_ampdu_rxwinsize()
wifi: mac80211: correctly check if CSA is active
wifi: cfg80211: Fix bitrate calculation overflow for HE rates
wifi: rsi: Fix memory corruption due to not set vif driver data size
wifi: ath12k: don't force radio frequency check in freq_to_idx()
wifi: ath12k: fix dma_free_coherent() pointer
wifi: ath10k: fix dma_free_coherent() pointer
====================
Link: https://patch.msgid.link/20260122110248.15450-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MAX_FL (maximum frame length) and related calculations used ETH_HLEN,
which does not account for the 4-byte VLAN tag in tagged frames. This
caused the hardware to reject valid VLAN frames as oversized, resulting
in RX errors and dropped packets.
Use VLAN_ETH_HLEN instead of ETH_HLEN in the MAX_FL register setup,
cut-through mode threshold, buffer allocation, and max_mtu calculation.
Cc: stable@kernel.org # v6.18+
Fixes: 62b5bb7be7 ("net: fec: update MAX_FL based on the current MTU")
Fixes: d466c16026 ("net: fec: enable the Jumbo frame support for i.MX8QM")
Fixes: 59e9bf037d ("net: fec: add change_mtu to support dynamic buffer allocation")
Fixes: ec2a1681ed ("net: fec: use a member variable for maximum buffer size")
Signed-off-by: Clemens Gruber <mail@clemensgruber.at>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260121083751.66997-1-mail@clemensgruber.at
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-01-20 (ice, idpf)
For ice:
Cody Haas breaks dependency of needing both RSS key and LUT for
ice_get_rxfh() as ethtool ioctls do not always supply both.
Paul fixes issues related to devlink reload; adding missing deinit HW
call and moving hwmon exit function to the proper call chain.
For idpf:
Mina Almasry moves a register read call into the time sandwich to ensure
the register is properly flushed.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
idpf: read lower clock bits inside the time sandwich
ice: fix devlink reload call trace
ice: add missing ice_deinit_hw() in devlink reinit path
ice: Fix persistent failure in ice_get_rxfh
====================
Link: https://patch.msgid.link/20260120224430.410377-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After 3cbf4ffba5 ("net: plumb network namespace into __skb_flow_dissect")
we have to provide a net pointer to __skb_flow_dissect(),
either via skb->dev, skb->sk, or a user provided pointer.
In the following case, syzbot was able to cook a bare skb.
WARNING: net/core/flow_dissector.c:1131 at __skb_flow_dissect+0xb57/0x68b0 net/core/flow_dissector.c:1131, CPU#1: syz.2.1418/11053
Call Trace:
<TASK>
bond_flow_dissect drivers/net/bonding/bond_main.c:4093 [inline]
__bond_xmit_hash+0x2d7/0xba0 drivers/net/bonding/bond_main.c:4157
bond_xmit_hash_xdp drivers/net/bonding/bond_main.c:4208 [inline]
bond_xdp_xmit_3ad_xor_slave_get drivers/net/bonding/bond_main.c:5139 [inline]
bond_xdp_get_xmit_slave+0x1fd/0x710 drivers/net/bonding/bond_main.c:5515
xdp_master_redirect+0x13f/0x2c0 net/core/filter.c:4388
bpf_prog_run_xdp include/net/xdp.h:700 [inline]
bpf_test_run+0x6b2/0x7d0 net/bpf/test_run.c:421
bpf_prog_test_run_xdp+0x795/0x10e0 net/bpf/test_run.c:1390
bpf_prog_test_run+0x2c7/0x340 kernel/bpf/syscall.c:4703
__sys_bpf+0x562/0x860 kernel/bpf/syscall.c:6182
__do_sys_bpf kernel/bpf/syscall.c:6274 [inline]
__se_sys_bpf kernel/bpf/syscall.c:6272 [inline]
__x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:6272
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
Fixes: 58deb77cc5 ("bonding: balance ICMP echoes in layer3+4 mode")
Reported-by: syzbot+c46409299c70a221415e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/696faa23.050a0220.4cb9c.001f.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Matteo Croce <mcroce@redhat.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260120161744.1893263-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the parameter pmac_id_valid argument of be_cmd_get_mac_from_list() is
set to false, the driver may request the PMAC_ID from the firmware of the
network card, and this function will store that PMAC_ID at the provided
address pmac_id. This is the contract of this function.
However, there is a location within the driver where both
pmac_id_valid == false and pmac_id == NULL are being passed. This could
result in dereferencing a NULL pointer.
To resolve this issue, it is necessary to pass the address of a stub
variable to the function.
Fixes: 95046b927a ("be2net: refactor MAC-addr setup code")
Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru>
Link: https://patch.msgid.link/20260120113734.20193-1-a.vatoropin@crpt.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>