Refine error handling in EEPROM and OTP read/write functions by:
- Return error values immediately upon detection.
- Avoid overwriting correct error codes with `-EIO`.
- Preserve initial error codes as they were appropriate for specific
failures.
- Use `-ETIMEDOUT` for timeout conditions instead of `-EIO`.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-7-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move following functions to avoid forward declarations in the code:
- lan78xx_start_hw()
- lan78xx_stop_hw()
- lan78xx_flush_fifo()
- lan78xx_start_tx_path()
- lan78xx_stop_tx_path()
- lan78xx_flush_tx_fifo()
- lan78xx_start_rx_path()
- lan78xx_stop_rx_path()
- lan78xx_flush_rx_fifo()
These functions will be used in an upcoming PHYlink migration patch.
No modifications to the functionality of the code are made.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the KSZ9031RNX PHY fixup from the lan78xx driver. The fixup applied
specific RGMII pad skew configurations globally, but these settings violate the
RGMII specification and cause more harm than benefit.
Key issues with the fixup:
1. **Non-Compliant Timing**: The fixup's delay settings fall outside the RGMII
specification requirements of 1.5 ns to 2.0 ns:
- RX Path: Total delay of **2.16 ns** (PHY internal delay of 1.2 ns + 0.96
ns skew).
- TX Path: Total delay of **0.96 ns**, significantly below the RGMII minimum
of 1.5 ns.
2. **Redundant or Incorrect Configurations**:
- The RGMII skew registers written by the fixup do not meaningfully alter
the PHY's default behavior and fail to account for its internal delays.
- The TX_DATA pad skew was not configured, relying on power-on defaults
that are insufficient for RGMII compliance.
3. **Micrel Driver Support**: By setting `PHY_INTERFACE_MODE_RGMII_ID`, the
Micrel driver can calculate and assign appropriate skew values for the
KSZ9031 PHY. This ensures better timing configurations without relying on
external fixups.
4. **System Interference**: The fixup applied globally, reconfiguring all
KSZ9031 PHYs in the system, even those unrelated to the LAN78xx adapter.
This could lead to unintended and harmful behavior on unrelated interfaces.
While the fixup is removed, a better mechanism is still needed to dynamically
determine the optimal combination of PHY and MAC delays to fully meet RGMII
requirements without relying on Device Tree or global fixups. This would allow
for robust operation across different hardware configurations.
The Micrel driver is capable of using the interface mode value to calculate and
apply better skew values, providing a configuration much closer to the RGMII
specification than the fixup. Removing the fixup ensures better default
behavior and prevents harm to other system interfaces.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the PHY fixup for the LAN8835 PHY in the lan78xx driver due to
the following reasons:
- There is no publicly available information about the LAN8835 PHY.
However, it appears to be the integrated PHY used in the LAN7800 and
LAN7850 USB Ethernet controllers. These PHYs use the GMII interface,
not RGMII as configured by the fixup.
- The correct driver for handling the LAN8835 PHY functionality is the
Microchip PHY driver (`drivers/net/phy/microchip.c`), which properly
supports these integrated PHYs.
- The PHY ID `0x0007C130` is actually used by the LAN8742A PHY, which
only supports RMII. This interface is incompatible with the LAN78xx
MAC, as the LAN7801 (the only LAN78xx version without an integrated
PHY) supports only RGMII.
- The mask applied for this fixup is overly broad, inadvertently
covering both Microchip LAN88xx PHYs and unrelated SMSC LAN8742A PHYs,
leading to potential conflicts with other devices.
- Testing has shown that removing this fixup for LAN7800 and LAN7850
does not result in any noticeable difference in functionality, as the
Microchip PHY driver (`drivers/net/phy/microchip.c`) handles all
necessary configurations for these integrated PHYs.
- Registering this fixup globally (not limited to USB devices) risks
conflicts by unintentionally modifying other interfaces whenever a
LAN7801 adapter is connected to the system.
Note that both LAN7800 and LAN7850 USB Ethernet controllers use an
integrated PHY with the ID `0x0007C132`. Additionally, the LAN7515, a
specialized part for Raspberry Pi, includes an integrated LAN7800 USB
Ethernet controller and USB hub in a multifunctional chip design, and it
also uses the same PHY ID (`0x0007C132`).
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King says:
====================
net: phylib EEE cleanups
Clean up phylib's EEE support. Patches previously posted as RFC as part
of the phylink EEE series.
Patch 1 changes the Marvell driver to use the state we store in
struct phy_device, rather than manually calling
phydev->eee_cfg.eee_enabled.
Patch 2 avoids genphy_c45_ethtool_get_eee() setting ->eee_enabled, as
we copy that from phydev->eee_cfg.eee_enabled later, and after patch 3
mo one uses this after calling genphy_c45_ethtool_get_eee(). In fact,
the only caller of this function now is phy_ethtool_get_eee().
As all callers to genphy_c45_eee_is_active() now pass NULL as its
is_enabled flag, this is no longer useful. Remove the argument in
patch 3.
Patch 4 updates the phylib documentation to make it absolutely clear
that phy_ethtool_get_eee() now fills in all members of struct
ethtool_keee, which is why we now have so many buggy network drivers.
====================
Link: https://patch.msgid.link/Z1GDZlFyF2fsFa3S@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update the phy_ethtool_get_eee() documentation to make it clear that
all members of struct ethtool_keee are written by this function.
keee.supported, keee.advertised, keee.lp_advertised and keee.eee_active
are all written by genphy_c45_ethtool_get_eee().
keee.tx_lpi_timer, keee.tx_lpi_enabled and keee.eee_enabled are all
written by eeecfg_to_eee().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tJ9JH-006LIz-SO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix various integer type conversions by using strtoull and a temporary
variable which is bounds checked before being casted into the
appropriate cfg_* variable for use by the test program.
While here:
- free the strdup'd cfg string for overall hygenie.
- initialize napi_id = 0 in setup_queue to avoid warnings on some
compilers.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20241204163239.294123-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tipc_exit_net() is very slow and is abused by syzbot.
tipc_nametbl_stop() is called for each netns being dismantled.
Calling synchronize_net() right before freeing tn->nametbl
is a big hammer.
Replace this with kfree_rcu().
Note that RCU is not properly used here, otherwise
tn->nametbl should be cleared before the synchronize_net()
or kfree_rcu(), or even before the cleanup loop.
We might need to fix this at some point.
Also note tipc uses other synchronize_rcu() calls,
more work is needed to make tipc_exit_net() much faster.
List of remaining calls to synchronize_rcu()
tipc_detach_loopback() (dev_remove_pack())
tipc_bcast_stop()
tipc_sk_rht_destroy()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241204210234.319484-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxime Chevallier says:
====================
net: freescale: ucc_geth: Phylink conversion
This is V3 of the phylink conversion for ucc_geth.
The main changes in this V3 are related to error handling in the patches
1 and 10 to report an error when the deprecated "interface" property is
found in DT. Doing so, I found and addressed some issues with the jump
labels in the error paths, impacting patches 1 and 10.
The rest of the changes are just a rebase on net-next.
Some of the V2 changes haven't been reviewed, so I stress out that I'm
still uncertain about the way WoL is handled is patches 4 and 10.
Thanks,
Maxime
Link to V1: https://lore.kernel.org/netdev/20241107170255.1058124-1-maxime.chevallier@bootlin.com/
Link to V2: https://lore.kernel.org/netdev/20241114153603.307872-1-maxime.chevallier@bootlin.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ucc_geth is quite capable in terms of supported interfaces, and even
includes an externally controlled PCS (well, TBI). Port that driver to
phylink.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A number of parallel MII interfaces also exist in a "Reduced" mode,
usually with higher clock rates and fewer data lines, to ease the
hardware design. This is what the 'R' stands for in RGMII, RMII, RTBI,
RXAUI, etc.
The UCC Geth controller has a special configuration bit that needs to be
set when the MII mode is one of the supported reduced modes.
Add a local helper for that.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The uec_configure_serdes() function deals with serialized linkmodes
settings. It's used during the link bringup sequence. It is planned to
be used during the phylink conversion for mac configuration, but it
needs to me moved around in the process. To make the phylink port
clearer, this commit moves the function without any feature change.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The preamble length can be configured in ucc_geth, however it just
ends-up always being configured to 7 bytes, as nothing ever changes the
default value of 7.
Make that value the default value when the MACCFG2 register gets
initialized, and remove the code to configure that value altogether.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The frame length check is configured when the phy interface is setup.
However, it's configured according to an internal flag that is always
false. So, just make so that we disable the relevant bit in the MACCFG2
register upon accessing it for other MAC configuration operations.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The WoL opts are represented through a bitmask stored in a u32. As this
mask is copied as-is in the driver, make sure we use the exact same type
to store them internally.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The get/set_wol ethtool ops rely on querying the PHY for its WoL
capabilities, checking for the presence of a PHY and a PHY interrupts
isn't enough. Address that by cleaning up the WoL configuration
sequence.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
As this driver pre-dates phylib, it uses a private pointer to get a
reference to the attached phy_device. Drop that pointer and use the
netdev's pointer instead.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preparing the phylink conversion, split the adjust_link callbaclk, by
clearly separating the mac configuration, link_up and link_down phases.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In april 2007, ucc_geth was converted to phylib with :
commit 728de4c927 ("ucc_geth: migrate ucc_geth to phylib").
In that commit, the device-tree property "interface", that could be used to
retrieve the PHY interface mode was deprecated.
DTS files that still used that property were converted along the way, in
the following commit, also dating from april 2007 :
commit 0fd8c47ccc ("[POWERPC] Replace undocumented interface properties in dts files")
17 years later, there's no users of that property left and I hope it's
safe to say we can remove support from that in the ucc_geth driver,
making the probe() function a bit simpler.
Should there be any users that have a DT that was generated when 2.6.21 was
cutting-edge, print an error message with hints on how to convert the
devicetree if the 'interface' property is found.
With that property gone, we can greatly simplify the parsing of the
phy-interface-mode from the devicetree by using of_get_phy_mode(),
allowing the removal of the open-coded parsing in the driver.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Lobakin says:
====================
xdp: a fistful of generic changes pt. I
XDP for idpf is currently 6 chapters:
* convert Rx to libeth;
* convert Tx and stats to libeth;
* generic XDP and XSk code changes (you are here);
* generic XDP and XSk code additions;
* actual XDP for idpf via new libeth_xdp;
* XSk for idpf (via ^).
Part III does the following:
* improve &xdp_buff_xsk cacheline placement;
* does some cleanups with marking read-only bpf_prog and xdp_buff
arguments const for some generic functions;
* allows attaching already registered XDP memory model to RxQ info;
* makes system percpu page_pools valid XDP memory models;
* starts using netmems in the XDP core code (1 function);
* allows mixing pages from several page_pools within one XDP frame;
* optimizes &xdp_frame layout and removes no-more-used field.
Bullets 4-6 are the most important ones. All of them are prereqs to
libeth_xdp.
====================
Link: https://patch.msgid.link/20241203173733.3181246-1-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, page_pool_put_page_bulk() indeed takes an array of pointers
to the data, not pages, despite the name. As one side effect, when
you're freeing frags from &skb_shared_info, xdp_return_frame_bulk()
converts page pointers to virtual addresses and then
page_pool_put_page_bulk() converts them back. Moreover, data pointers
assume every frag is placed in the host memory, making this function
non-universal.
Make page_pool_put_page_bulk() handle array of netmems. Pass frag
netmems directly and use virt_to_netmem() when freeing xdpf->data,
so that the PP core will then get the compound netmem and take care
of the rest.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241203173733.3181246-9-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the following netmem counterparts:
* virt_to_netmem() -- simple page_to_netmem(virt_to_page()) wrapper;
* netmem_is_pfmemalloc() -- page_is_pfmemalloc() for page-backed
netmems, false otherwise;
and the following "unsafe" versions:
* __netmem_to_page()
* __netmem_get_pp()
* __netmem_address()
They do the same as their non-underscored buddies, but assume the netmem
is always page-backed. When working with header &page_pools, you don't
need to check whether netmem belongs to the host memory and you can
never get NULL instead of &page. Checks for the LSB, clearing the LSB,
branches take cycles and increase object code size, sometimes
significantly. When you're sure your PP is always host, you can avoid
this by using the underscored counterparts.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241203173733.3181246-8-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To make the system page pool usable as a source for allocating XDP
frames, we need to register it with xdp_reg_mem_model(), so that page
return works correctly. This is done in preparation for using the system
page_pool to convert XDP_PASS XSk frames to skbs; for the same reason,
make the per-cpu variable non-static so we can access it from other
source files as well (but w/o exporting).
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20241203173733.3181246-7-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When you register an XSk pool as XDP Rxq info memory model, you then
need to manually attach it after the registration.
Let the user combine both actions into one by just passing a pointer
to the pool directly to xdp_rxq_info_reg_mem_model(), which will take
care of calling xsk_pool_set_rxq_info(). This looks similar to how a
&page_pool gets registered and reduce repeating driver code.
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20241203173733.3181246-6-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
One may need to register memory model separately from xdp_rxq_info. One
simple example may be XDP test run code, but in general, it might be
useful when memory model registering is managed by one layer and then
XDP RxQ info by a different one.
Allow such scenarios by adding a simple helper which "attaches"
already registered memory model to the desired xdp_rxq_info. As this
is mostly needed for Page Pool, add a special function to do that for
a &page_pool pointer.
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20241203173733.3181246-5-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In lots of places, bpf_prog pointer is used only for tracing or other
stuff that doesn't modify the structure itself. Same for net_device.
Address at least some of them and add `const` attributes there. The
object code didn't change, but that may prevent unwanted data
modifications and also allow more helpers to have const arguments.
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After the series "XSk buff on a diet" by Maciej, the greatest pow-2
which &xdp_buff_xsk can be divided got reduced from 16 to 8 on x86_64.
Also, sizeof(xdp_buff_xsk) now is 120 bytes, which, taking the previous
sentence into account, leads to that it leaves 8 bytes at the end of
cacheline, which means an array of buffs will have its elements
messed between the cachelines chaotically.
Use __aligned_largest for this struct. This alignment is usually 16
bytes, which makes it fill two full cachelines and align an array
nicely. ___cacheline_aligned may be excessive here, especially on
arches with 128-256 byte CLs, as well as 32-bit arches (76 -> 96
bytes on MIPS32R2), while not doing better than _largest.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241203173733.3181246-2-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Octavian Purdila says:
====================
net_sched: sch_sfq: reject limit of 1
The implementation does not properly support limits of 1. Add an
in-kernel check, in addition to existing iproute2 check, since other
tools may be used for configuration.
This patch set also adds a selfcheck to test that a limit of 1 is
rejected.
An alternative (or in addition) we could fix the implementation by
setting q->tail to NULL in sfq_drop if this is the last slot we marked
empty, e.g.:
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -317,8 +317,11 @@ static unsigned int sfq_drop(struct Qdisc *sch, struct sk_buff **to_free)
/* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */
x = q->tail->next;
slot = &q->slots[x];
- q->tail->next = slot->next;
q->ht[slot->hash] = SFQ_EMPTY_SLOT;
+ if (x == slot->next)
+ q->tail = NULL; /* no more active slots */
+ else
+ q->tail->next = slot->next;
goto drop;
}
====================
Link: https://patch.msgid.link/20241204030520.2084663-1-tavip@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current implementation does not work correctly with a limit of
1. iproute2 actually checks for this and this patch adds the check in
kernel as well.
This fixes the following syzkaller reported crash:
UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:210:6
index 65535 is out of range for type 'struct sfq_head[128]'
CPU: 0 PID: 2569 Comm: syz-executor101 Not tainted 5.10.0-smp-DEV #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x125/0x19f lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:148 [inline]
__ubsan_handle_out_of_bounds+0xed/0x120 lib/ubsan.c:347
sfq_link net/sched/sch_sfq.c:210 [inline]
sfq_dec+0x528/0x600 net/sched/sch_sfq.c:238
sfq_dequeue+0x39b/0x9d0 net/sched/sch_sfq.c:500
sfq_reset+0x13/0x50 net/sched/sch_sfq.c:525
qdisc_reset+0xfe/0x510 net/sched/sch_generic.c:1026
tbf_reset+0x3d/0x100 net/sched/sch_tbf.c:319
qdisc_reset+0xfe/0x510 net/sched/sch_generic.c:1026
dev_reset_queue+0x8c/0x140 net/sched/sch_generic.c:1296
netdev_for_each_tx_queue include/linux/netdevice.h:2350 [inline]
dev_deactivate_many+0x6dc/0xc20 net/sched/sch_generic.c:1362
__dev_close_many+0x214/0x350 net/core/dev.c:1468
dev_close_many+0x207/0x510 net/core/dev.c:1506
unregister_netdevice_many+0x40f/0x16b0 net/core/dev.c:10738
unregister_netdevice_queue+0x2be/0x310 net/core/dev.c:10695
unregister_netdevice include/linux/netdevice.h:2893 [inline]
__tun_detach+0x6b6/0x1600 drivers/net/tun.c:689
tun_detach drivers/net/tun.c:705 [inline]
tun_chr_close+0x104/0x1b0 drivers/net/tun.c:3640
__fput+0x203/0x840 fs/file_table.c:280
task_work_run+0x129/0x1b0 kernel/task_work.c:185
exit_task_work include/linux/task_work.h:33 [inline]
do_exit+0x5ce/0x2200 kernel/exit.c:931
do_group_exit+0x144/0x310 kernel/exit.c:1046
__do_sys_exit_group kernel/exit.c:1057 [inline]
__se_sys_exit_group kernel/exit.c:1055 [inline]
__x64_sys_exit_group+0x3b/0x40 kernel/exit.c:1055
do_syscall_64+0x6c/0xd0
entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7fe5e7b52479
Code: Unable to access opcode bytes at RIP 0x7fe5e7b5244f.
RSP: 002b:00007ffd3c800398 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe5e7b52479
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 00007fe5e7bcd2d0 R08: ffffffffffffffb8 R09: 0000000000000014
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe5e7bcd2d0
R13: 0000000000000000 R14: 00007fe5e7bcdd20 R15: 00007fe5e7b24270
The crash can be also be reproduced with the following (with a tc
recompiled to allow for sfq limits of 1):
tc qdisc add dev dummy0 handle 1: root tbf rate 1Kbit burst 100b lat 1s
../iproute2-6.9.0/tc/tc qdisc add dev dummy0 handle 2: parent 1:10 sfq limit 1
ifconfig dummy0 up
ping -I dummy0 -f -c2 -W0.1 8.8.8.8
sleep 1
Scenario that triggers the crash:
* the first packet is sent and queued in TBF and SFQ; qdisc qlen is 1
* TBF dequeues: it peeks from SFQ which moves the packet to the
gso_skb list and keeps qdisc qlen set to 1. TBF is out of tokens so
it schedules itself for later.
* the second packet is sent and TBF tries to queues it to SFQ. qdisc
qlen is now 2 and because the SFQ limit is 1 the packet is dropped
by SFQ. At this point qlen is 1, and all of the SFQ slots are empty,
however q->tail is not NULL.
At this point, assuming no more packets are queued, when sch_dequeue
runs again it will decrement the qlen for the current empty slot
causing an underflow and the subsequent out of bounds access.
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Octavian Purdila <tavip@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241204030520.2084663-2-tavip@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stanislav Fomichev says:
====================
ethtool: generate uapi header from the spec
We keep expanding ethtool netlink api surface and this leads to
constantly playing catchup on the ynl spec side. There are a couple
of things that prevent us from fully converting to generating
the header from the spec (stats and cable tests), but we can
generate 95% of the header which is still better than maintaining
c header and spec separately. The series adds a couple of missing
features on the ynl-gen-c side and separates the parts
that we can generate into new ethtool_netlink_generated.h.
====================
Link: https://patch.msgid.link/20241204155549.641348-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reshuffle definitions that are gonna be generated into
ethtool_netlink_generated.h and match ynl spec order.
This should make it easier to compare the output of the ynl-gen-c
to the existing uapi header. No functional changes.
Things that are still remaining to be manually defined:
- ETHTOOL_FLAG_ALL - probably no good way to add to spec?
- some of the cable test bits (not sure whether it's possible to move to
spec)
- some of the stats definitions (no way currently to move to spec)
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20241204155549.641348-7-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Essentially reverse the order of headers for userspace generated files.
Before (make -C tools/net/ynl/; cat tools/net/ynl/ethtool-user.h):
#include <linux/ethtool_netlink_generated.h>
#include <linux/ethtool.h>
#include <linux/ethtool.h>
#include <linux/ethtool.h>
After:
#include <linux/ethtool.h>
#include <linux/ethtool_netlink_generated.h>
While at it, make sure we track which headers we've already included
and include the headers only once.
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20241204155549.641348-6-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- __ETHTOOL_UDP_TUNNEL_TYPE_CNT and render max
- skip rendering stringset (empty enum)
- skip rendering c33-pse-ext-state (defined in ethtool.h)
- rename header flags to ethtool-flag-
- add attr-cnt-name to each attribute to use XXX_CNT instead of XXX_MAX
- add unspec 0 entry to each attribute
- carry some doc entries from the existing header
- tcp-header-split
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20241204155549.641348-5-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Paolo Abeni:
"Including fixes from can and netfilter.
Current release - regressions:
- rtnetlink: fix double call of rtnl_link_get_net_ifla()
- tcp: populate XPS related fields of timewait sockets
- ethtool: fix access to uninitialized fields in set RXNFC command
- selinux: use sk_to_full_sk() in selinux_ip_output()
Current release - new code bugs:
- net: make napi_hash_lock irq safe
- eth:
- bnxt_en: support header page pool in queue API
- ice: fix NULL pointer dereference in switchdev
Previous releases - regressions:
- core: fix icmp host relookup triggering ip_rt_bug
- ipv6:
- avoid possible NULL deref in modify_prefix_route()
- release expired exception dst cached in socket
- smc: fix LGR and link use-after-free issue
- hsr: avoid potential out-of-bound access in fill_frame_info()
- can: hi311x: fix potential use-after-free
- eth: ice: fix VLAN pruning in switchdev mode
Previous releases - always broken:
- netfilter:
- ipset: hold module reference while requesting a module
- nft_inner: incorrect percpu area handling under softirq
- can: j1939: fix skb reference counting
- eth:
- mlxsw: use correct key block on Spectrum-4
- mlx5: fix memory leak in mlx5hws_definer_calc_layout"
* tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
net :mana :Request a V2 response version for MANA_QUERY_GF_STAT
net: avoid potential UAF in default_operstate()
vsock/test: verify socket options after setting them
vsock/test: fix parameter types in SO_VM_SOCKETS_* calls
vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
net/mlx5e: Remove workaround to avoid syndrome for internal port
net/mlx5e: SD, Use correct mdev to build channel param
net/mlx5: E-Switch, Fix switching to switchdev mode in MPV
net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabled
net/mlx5: HWS: Properly set bwc queue locks lock classes
net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layout
bnxt_en: handle tpa_info in queue API implementation
bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()
bnxt_en: refactor tpa_info alloc/free into helpers
geneve: do not assume mac header is set in geneve_xmit_skb()
mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4
ethtool: Fix wrong mod state in case of verbose and no_mask bitset
ipmr: tune the ipmr_can_free_table() checks.
netfilter: nft_set_hash: skip duplicated elements pending gc run
netfilter: ipset: Hold module reference while requesting a module
...
Pull tracing fixes from Steven Rostedt:
- Fix trace histogram sort function cmp_entries_dup()
The sort function cmp_entries_dup() returns either 1 or 0, and not -1
if parameter "a" is less than "b" by memcmp().
- Fix archs that call trace_hardirqs_off() without RCU watching
Both x86 and arm64 no longer call any tracepoints with RCU not
watching. It was assumed that it was safe to get rid of
trace_*_rcuidle() version of the tracepoint calls. This was needed to
get rid of the SRCU protection and be able to implement features like
faultable traceponits and add rust tracepoints.
Unfortunately, there were a few architectures that still relied on
that logic. There's only one file that has tracepoints that are
called without RCU watching. Add macro logic around the tracepoints
for architectures that do not have CONFIG_ARCH_WANTS_NO_INSTR defined
will check if the code is in the idle path (the only place RCU isn't
watching), and enable RCU around calling the tracepoint, but only do
it if the tracepoint is enabled.
* tag 'trace-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix archs that still call tracepoints without RCU watching
tracing: Fix cmp_entries_dup() to respect sort() comparison rules