linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-22 14:27:53 -04:00

Author	SHA1	Message	Date
Liming Wu	cfeb7cd80f	virtio_net: enhance wake/stop tx queue statistics accounting This patch refines and strengthens the statistics collection of TX queue wake/stop events introduced by commit `c39add9b24` ("virtio_net: Add TX stopped and wake counters"). Previously, the driver only recorded partial wake/stop statistics for TX queues. Some wake events triggered by 'skb_xmit_done()' or resume operations were not counted, which made the per-queue metrics incomplete. Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251120015320.1418-1-liming.wu@jaguarmicro.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:38:01 -08:00
Jakub Kicinski	8ccd116016	Merge branch 'tcp-provide-better-locality-for-retransmit-timer' Eric Dumazet says: ==================== tcp: provide better locality for retransmit timer TCP stack uses three timers per flow, currently spread this way: - sk->sk_timer : keepalive timer - icsk->icsk_retransmit_timer : retransmit timer - icsk->icsk_delack_timer : delayed ack timer This series moves the retransmit timer to sk->sk_timer location, to increase data locality in TX paths. keepalive timers are not often used, this change should be neutral for them. After the series we have following fields: - sk->tcp_retransmit_timer : retransmit timer, in sock_write_tx group - icsk->icsk_delack_timer : delayed ack timer - icsk->icsk_keepalive_timer : keepalive timer Moving icsk_delack_timer in a beter location would also be welcomed. ==================== Link: https://patch.msgid.link/20251124175013.1473655-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:28:33 -08:00
Eric Dumazet	9a5e5334ad	tcp: remove icsk->icsk_retransmit_timer Now sk->sk_timer is no longer used by TCP keepalive, we can use its storage for TCP and MPTCP retransmit timers for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:28:29 -08:00
Eric Dumazet	08dfe37023	tcp: introduce icsk->icsk_keepalive_timer sk->sk_timer has been used for TCP keepalives. Keepalive timers are not in fast path, we want to use sk->sk_timer storage for retransmit timers, for better cache locality. Create icsk->icsk_keepalive_timer and change keepalive code to no longer use sk->sk_timer. Added space is reclaimed in the following patch. This includes changes to MPTCP, which was also using sk_timer. Alias icsk->mptcp_tout_timer and icsk->icsk_keepalive_timer for inet_sk_diag_fill() sake. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:28:29 -08:00
Eric Dumazet	27e8257a86	net: move sk_dst_pending_confirm and sk_pacing_status to sock_read_tx group These two fields are mostly read in TCP tx path, move them in an more appropriate group for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:28:29 -08:00
Eric Dumazet	3a6e8fd0bf	tcp: rename icsk_timeout() to tcp_timeout_expires() In preparation of sk->tcp_timeout_timer introduction, rename icsk_timeout() helper and change its argument to plain 'const struct sock *sk'. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:28:28 -08:00
Alexander Lobakin	436fa8e7d1	ice: fix broken Rx on VFs Since the tagged commit, ice stopped respecting Rx buffer length passed from VFs. At that point, the buffer length was hardcoded in ice, so VFs still worked up to some point (until, for example, a VF wanted an MTU larger than its PF). The next commit `93f53db9f9` ("ice: switch to Page Pool"), broke Rx on VFs completely since ice started accounting per-queue buffer lengths again, but now VF queues always had their length zeroed, as ice was already ignoring what iavf was passing to it. Restore the line that initializes the buffer length on VF queues basing on the virtchnl messages. Fixes: `3a4f419f75` ("ice: drop page splitting and recycling") Reported-by: Jakub Slepecki <jakub.slepecki@intel.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Jakub Slepecki <jakub.slepecki@intel.com> Link: https://patch.msgid.link/20251124170735.3077425-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:26:27 -08:00
Gustavo A. R. Silva	d696c73716	chtls: Avoid -Wflex-array-member-not-at-end warning -Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Use the `DEFINE_RAW_FLEX()` helper for on-stack definitions of a flexible structure where the size of the flexible-array member is known at compile-time, and refactor the rest of the code, accordingly. So, with these changes, fix the following warning: drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c:163:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/aSQocKoJGkN0wzEj@kspp Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:25:24 -08:00
Jakub Kicinski	864f3eda00	Merge branch 'tools-ynl-gen-regeneration-comment-function-prefix' Asbjørn Sloth Tønnesen says: ==================== tools: ynl-gen: regeneration comment + function prefix It looks like these two patches are the last ones needed for YNL, before the WireGuard patches can go in. These patches was both requested by Jason, during review of the WireGuard YNL conversion patchset[1]. ==================== Link: https://patch.msgid.link/20251120174429.390574-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:20:47 -08:00
Asbjørn Sloth Tønnesen	68e83f3472	tools: ynl-gen: add regeneration comment Add a comment on regeneration to the generated files. The comment is placed after the YNL-GEN line[1], as to not interfere with ynl-regen.sh's detection logic. [1] and after the optional YNL-ARG line. Link: https://lore.kernel.org/r/aR5m174O7pklKrMR@zx2c4.com/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251120174429.390574-3-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:20:42 -08:00
Asbjørn Sloth Tønnesen	17fa6ee35b	tools: ynl-gen: add function prefix argument This patch adds a new CLI argument for overriding the default function prefix, as used for naming the doit/dumpit functions in the generated kernel code. When not specified the default "$(FAMILY)-nl" is used. This can also be specified persistently in generated files: /* YNL-ARG --function-prefix wg */ In the above example it causes the following changes: wireguard_nl_get_device_dumpit() -> wg_get_device_dumpit() wireguard_nl_get_device_doit() -> wg_get_device_doit() The variable name fn_prefix, was chosen as it relates to op_prefix which is used to prefix the UAPI commands enum entries. Link: https://lore.kernel.org/r/aRvWzC8qz3iXDAb3@zx2c4.com/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Link: https://patch.msgid.link/20251120174429.390574-2-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:20:42 -08:00
Jakub Kicinski	97a88d9e2a	Merge branch 'ptp-ocp-a-fix-and-refactoring' Andy Shevchenko says: ==================== ptp: ocp: A fix and refactoring Here is the fix for incorrect use of %ptT with the associated refactoring and additional cleanups. Note, %ptS, which is introduced in another series that is already applied to PRINTK tree, doesn't fit here, that's why this fix is separated from that series. ==================== Link: https://patch.msgid.link/20251124084816.205035-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:17:23 -08:00
Andy Shevchenko	648282e2d1	ptp: ocp: Reuse META's PCI vendor ID The META's PCI vendor ID is listed already in the pci_ids.h. Reuse it here. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-5-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:17:22 -08:00
Andy Shevchenko	4c84a5c7b0	ptp: ocp: Apply standard pattern for cleaning up loop The while (i--) is a standard pattern for the cleaning up loops. Apply this pattern where it makes sense in the driver. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-4-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:17:21 -08:00
Andy Shevchenko	590f5d1fa6	ptp: ocp: Make ptp_ocp_unregister_ext() NULL-aware It's a common practice to make resource release functions be NULL-aware. Make ptp_ocp_unregister_ext() NULL-aware. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-3-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:17:21 -08:00
Andy Shevchenko	622cc66ed7	ptp: ocp: Refactor signal_show() and fix %ptT misuse Refactor signal_show() to avoid sequential calls to sysfs_emit*() and use the same pattern to get the index of a signal as it's done in signal_store(). While at it, fix wrong use of %ptT against struct timespec64. It's kinda lucky that it worked just because the first member there 64-bit and it's of time64_t type. Now with %ptS it may be used correctly. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:17:21 -08:00
Michal Luczaj	b796632fc8	vsock/test: Extend transport change null-ptr-deref test syzkaller reported a lockdep lock order inversion warning[1] due to commit `687aa0c558` ("vsock: Fix transport_* TOCTOU"). This was fixed in commit `f7c877e753` ("vsock: fix lock inversion in vsock_assign_transport()"). Redo syzkaller's repro by piggybacking on a somewhat related test implemented in commit `3a764d9338` ("vsock/test: Add test for null ptr deref when transport changes"). [1]: https://lore.kernel.org/netdev/68f6cdb0.a70a0220.205af.0039.GAE@google.com/ Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251123-vsock_test-linger-lockdep-warn-v1-1-4b1edf9d8cdc@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:16:21 -08:00
Heiner Kallweit	87ad869fea	r8169: improve MAC EEE handling Let phydev->enable_tx_lpi control whether MAC enables TX LPI, instead of enabling it unconditionally. This way TX LPI is disabled if e.g. link partner doesn't support EEE. This helps to avoid potential issues like link flaps. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/91bcb837-3fab-4b4e-b495-038df0932e44@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:14:34 -08:00
Daniel Golle	de1e5c9333	net: phy: mxl-gpy: add support for MxL86252 and MxL86282 Add PHY driver support for Maxlinear MxL86252 and MxL86282 switches. The PHYs built-into those switches are just like any other GPY 2.5G PHYs with the exception of the temperature sensor data being encoded in a different way. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/a6cd7fe461b011cec2b59dffaf34e9c8b0819059.1763818120.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:11:38 -08:00
Chad Monroe	9d844da693	net: phy: mxl-gpy: add support for MxL86211C MxL86211C is a smaller and more efficient version of the GPY211C. Add the PHY ID and phy_driver instance to the mxl-gpy driver. Signed-off-by: Chad Monroe <chad@monroe.io> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/cabf3559d6511bed6b8a925f540e3162efc20f6b.1763818120.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:11:38 -08:00
Buday Csaba	ce28e333d6	net: mdio: remove redundant fwnode cleanup Remove redundant fwnode cleanup in of_mdiobus_register_device() and xpcs_plat_init_dev(). mdio_device_free() eventually calls mdio_device_release(), which already performs fwnode_handle_put(), making the manual cleanup unnecessary. Combine fwnode_handle_get() with device_set_node() in of_mdiobus_register_device() for clarity. Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/00847693daa8f7c8ff5dfa19dd35fc712fa4e2b5.1763995734.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 18:45:37 -08:00
Buday Csaba	a11e0d467d	net: mdio: eliminate kdoc warnings in mdio_device.c and mdio_bus.c Fix all warnings reported by scripts/kernel-doc in mdio_device.c and mdio_bus.c Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/7ef7b80669da2b899d38afdb6c45e122229c3d8c.1763968667.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 18:43:56 -08:00
Jakub Kicinski	652eb5afce	Merge branch 'net-enetc-add-port-mdio-support-for-both-i-mx94-and-i-mx95' Wei Fang says: ==================== net: enetc: add port MDIO support for both i.MX94 and i.MX95 The NETC IP has one external master MDIO interface (eMDIO) for managing external PHYs, all ENETC ports share this eMDIO. The EMDIO function and the ENETC port MDIO are the virtual ports of this eMDIO, ENETC can use these virtual ports to access their PHYs. The difference is that EMDIO function is a 'global port', it can access all the PHYs on the eMDIO, so it provides a means for different software modules to share a single set of MDIO signals to access their PHYs. The ENETC port MDIO can only access its own external PHY. Furthermore, its PHY address must be set to its corresponding LaBCR register in IERB module, which is is a 64 KB size page containing registers that are used for pre-boot initialization for all NETC PCIe functions. And this IERB is owned by the host OS and it will be locked after the initialization, so it cannot be configured at running time any more. The port MDIO can only work properly when the PHY address accessed by it matches the value of its corresponding LaBCR[MDIO_PHYAD_PRTAD]. Otherwise, the MDIO access by the port MDIO will not take effect. Note that the same PHY is either controlled by port MDIO or by the EMDIO function. The netc-blk-ctrl driver will only set the PHY address in the LaBCR register corresponding to the ENETC when the ENETC node contains an mdio child node, and the ENETC driver will only create the port MDIO bus then. An example in DTS is as follows, the EMDIO function will not\ access this PHY. enetc_port0 { phy-handle = <&ethphy0>; phy-mode = "rgmii-id"; mdio { #address-cells = <1>; #size-cells = <0>; ethphy0: ethernet-phy@1 { reg = <1>; }; }; }; If users want to use EMDIO funtion to manage the PHY, they only need to place the PHY node in the emdio node. The same PHY must not be placed simultaneously within the ENETC node. An example in DTS to use EMDIO is as below. netc_emdio { ethphy0: ethernet-phy@1 { reg = <1>; }; ethphy2: ethernet-phy@8 { reg = <8>; }; }; In the host OS, when there are multiple ENETCs, they can all access their PHYs using their own port MDIO, or they can all access their PHYs using the EMDIO function, or they can partially use port MDIO and partially use the EMDIO function. Another typical use case of port MDIO is the Jailhouse usage. An ENETC is assigned to a guest OS. The EMDIO function will be unavailable in the guest OS because EMDIO is controlled by the host OS. Therefore, the ENETC can use its port MDIO to manage its external PHY in this situation. In this use case, the host OS's root dtb will disable the ENETC node, so the host OS's ENETC driver will not probe the ENETC and its PHY. In addition, this series also adds the internal MDIO bus support, each ENETC has an internal MDIO interface for managing on-die PHY (PCS) if it has PCS layer. ==================== Link: https://patch.msgid.link/20251119102557.1041881-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 17:44:50 -08:00
Wei Fang	10ba23a7f6	net: enetc: update the base address of port MDIO registers for ENETC v4 Each ENETC has a set of external MDIO registers to access its external PHY based on its port EMDIO bus, these registers are used for MDIO bus access, such as setting the PHY address, PHY register address and value, read or write operations, C22 or C45 format, etc. The base address of this set of registers has been modified in ENETC v4 and is different from that in ENETC v1. So the base address needs to be updated so that ENETC v4 can use port MDIO to manage its own external PHY. Additionally, if ENETC has the PCS layer, it also has a set of internal MDIO registers for managing its on-die PHY (PCS/Serdes). The base address of this set of registers is also different from that of ENETC v1, so the base address also needs to be updated so that ENETC v4 can support the management of on-die PHY through the internal MDIO bus. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20251119102557.1041881-4-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 17:44:48 -08:00
Wei Fang	50bfd9c06f	net: enetc: set external PHY address in IERB for i.MX94 ENETC NETC IP has only one external master MDIO interface (eMDIO) for managing the external PHYs. ENETC can use the interfaces provided by the EMDIO function or its port MDIO to access and manage its external PHY. Both the EMDIO function and the port MDIO are all virtual ports of the eMDIO. The difference is that the EMDIO function is a 'global port', it can access all the PHYs on the eMDIO, but port MDIO can only access its own PHY. To ensure that ENETC can only access its own PHY through port MDIO, LaBCR[MDIO_PHYAD_PRTAD] needs to be set, which represents the address of the external PHY connected to ENETC. If the accessed PHY address is not consistent with LaBCR[MDIO_PHYAD_PRTAD], then the MDIO access initiated by port MDIO will be invalid. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20251119102557.1041881-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 17:44:47 -08:00
Wei Fang	6633df05f3	net: enetc: set the external PHY address in IERB for port MDIO usage The ENETC supports managing its own external PHY through its port MDIO functionality. To use this function, the PHY address needs be set in the corresponding LaBCR register in the Integrated Endpoint Register Block (IERB), which is used for pre-boot initialization of NETC PCIe functions. The port MDIO can only work properly when the PHY address accessed by the port MDIO matches the corresponding LaBCR[MDIO_PHYAD_PRTAD] value. Because the ENETC driver only registers the MDIO bus (port MDIO bus) when it detects an MDIO child node in its node, similarly, the netc-blk-ctrl driver only resolves the PHY address and sets it in the corresponding LaBCR when it detects an MDIO child node in the ENETC node. Co-developed-by: Aziz Sellami <aziz.sellami@nxp.com> Signed-off-by: Aziz Sellami <aziz.sellami@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20251119102557.1041881-2-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 17:44:47 -08:00
Jakub Kicinski	f0054f7bb9	Merge branch 'improvements-over-dsa-conduit-ethtool-ops' Vladimir Oltean says: ==================== Improvements over DSA conduit ethtool ops DSA interceps 'ethtool -S eth0', where eth0 is the host port of the switch (called 'conduit'). It does this because otherwise there is no way to report port counters for the CPU port, which is a MAC like any other of that switch, except Linux exposes no net_device for it, thus no ethtool hook. Having understood all downsides of this debugging interface, when we need it we needed, so the proposed changes here are to make it more useful by dumping more counters in it: not just the switch CPU port, but all other switch ports in the tree which lack a net_device. Not reinventing any wheel, just putting more output in an existing command. That is patch 3/3. The other 2 are cleanup. ==================== Link: https://patch.msgid.link/20251122112311.138784-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 17:34:01 -08:00
Vladimir Oltean	f647ed2ca7	net: dsa: append ethtool counters of all hidden ports to conduit Currently there is no way to see packet counters on cascade ports, and no clarity on how the API for that would look like. Because it's something that is currently needed, just extend the hack where ethtool -S on the conduit interface dumps CPU port counters, and also use it to dump counters of cascade ports. Note that the "pXX_" naming convention changes to "sXX_pYY", to distinguish between ports having the same index but belonging to different switches. This has a slight chance of causing regressions to existing tooling: - grepping for "p04_counter_name" still works, but might return more than one string now - grepping for " p04_counter_name" no longer works Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 17:33:55 -08:00
Vladimir Oltean	8afabd27fe	net: dsa: use kernel data types for ethtool ops on conduit Suppress some checkpatch 'CHECK' messages about u8 being preferable over uint8_t, etc. No functional change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 17:33:55 -08:00
Vladimir Oltean	eba81b0a6d	net: dsa: cpu_dp->orig_ethtool_ops might be NULL In theory this would have been seen by now, but it seems that all drivers used as DSA conduit interfaces thus far have had ethtool_ops set, and it's hard to even find modern Ethernet drivers (and not VF ones) which don't use ethtool. Here is the unfiltered list of drivers which register any sort of net_device but don't set its ethtool_ops pointer. I don't think any of them 'risks' being used as a DSA conduit, maybe except for moxart, rnpbge and icssm, I'm not sure. - drivers/net/can/dev/dev.c - drivers/net/wwan/qcom_bam_dmux.c - drivers/net/wwan/t7xx/t7xx_netdev.c - drivers/net/arcnet/arcnet.c - drivers/net/hamradio/ - drivers/net/slip/slip.c - drivers/net/ethernet/ezchip/nps_enet.c - drivers/net/ethernet/moxa/moxart_ether.c - drivers/net/ethernet/wangxun/txgbevf/txgbevf_main.c - drivers/net/ethernet/wangxun/ngbevf/ngbevf_main.c - drivers/net/ethernet/huawei/hinic3/hinic3_main.c - drivers/net/ethernet/i825xx/ - drivers/net/ethernet/ti/icssm/icssm_prueth.c - drivers/net/ethernet/seeq/ - drivers/net/ethernet/litex/litex_liteeth.c - drivers/net/ethernet/sunplus/spl2sw_driver.c - drivers/net/ethernet/mucse/rnpgbe/rnpgbe_main.c - drivers/net/ipa/ - drivers/net/wireless/microchip/wilc1000/ - drivers/net/wireless/mediatek/mt76/dma.c - drivers/net/wireless/ath/ath12k/ - drivers/net/wireless/ath/ath11k/ - drivers/net/wireless/ath/ath6kl/ - drivers/net/wireless/ath/ath10k/ - drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c - drivers/net/wireless/virtual/mac80211_hwsim.c - drivers/net/wireless/quantenna/qtnfmac/pcie/pcie.c - drivers/net/wireless/realtek/rtw89/core.c - drivers/net/wireless/realtek/rtw88/pci.c - drivers/net/caif/ - drivers/net/plip/ - drivers/net/wan/ - drivers/net/mctp/ - drivers/net/ppp/ - drivers/net/thunderbolt/ Nonetheless, it's good for the framework not to make such assumptions, and not panic when coming across such kind of host device in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 17:33:55 -08:00
Alan Maguire	380d19db6e	cxgb4: Rename sched_class to avoid type clash drivers/net/ethernet/chelsio/cxgb4/sched.h declares a sched_class struct which has a type name clash with struct sched_class in kernel/sched/sched.h (a type used in a field in task_struct). When cxgb4 is a builtin we end up with both sched_class types, and as a result of this we wind up with DWARF (and derived from that BTF) with a duplicate incorrect task_struct representation. When cxgb4 is built-in this type clash can cause kernel builds to fail as resolve_btfids will fail when confused which task_struct to use. See [1] for more details. As such, renaming sched_class to ch_sched_class (in line with other structs like ch_sched_flowc) makes sense. [1] https://lore.kernel.org/bpf/2412725b-916c-47bd-91c3-c2d57e3e6c7b@acm.org/ Reported-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://patch.msgid.link/20251121181231.64337-1-alan.maguire@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 17:27:11 -08:00
Javen Xu	d6eea0048b	r8169: add support for RTL9151A This adds support for chip RTL9151A. Its XID is 0x68b. It is bascially basd on the one with XID 0x688, but with different firmware file. Signed-off-by: Javen Xu <javen_xu@realsil.com.cn> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20251121090104.3753-1-javen_xu@realsil.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 17:24:29 -08:00
Paolo Abeni	61e628023d	Merge branch 'net_sched-speedup-qdisc-dequeue' Eric Dumazet says: ==================== net_sched: speedup qdisc dequeue Avoid up to two cache line misses in qdisc dequeue() to fetch skb_shinfo(skb)->gso_segs/gso_size while qdisc spinlock is held. Idea is to cache gso_segs at enqueue time before spinlock is acquired, in the first skb cache line, where we already have qdisc_skb_cb(skb)->pkt_len. This series gives a 8 % improvement in a TX intensive workload. (120 Mpps -> 130 Mpps on a Turin host, IDPF with 32 TX queues) v2: https://lore.kernel.org/netdev/20251111093204.1432437-1-edumazet@google.com/ v1: https://lore.kernel.org/netdev/20251110094505.3335073-1-edumazet@google.com/T/#m8f562ed148f807c02fd02c6cd243604d449615b9 ==================== Link: https://patch.msgid.link/20251121083256.674562-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:35 +01:00
Eric Dumazet	a6efc273ab	net_sched: use qdisc_dequeue_drop() in cake, codel, fq_codel cake, codel and fq_codel can drop many packets from dequeue(). Use qdisc_dequeue_drop() so that the freeing can happen outside of the qdisc spinlock scope. Add TCQ_F_DEQUEUE_DROPS to sch->flags. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-15-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	191ff13e42	net_sched: add qdisc_dequeue_drop() helper Some qdisc like cake, codel, fq_codel might drop packets in their dequeue() method. This is currently problematic because dequeue() runs with the qdisc spinlock held. Freeing skbs can be extremely expensive. Add qdisc_dequeue_drop() method and a new TCQ_F_DEQUEUE_DROPS so that these qdiscs can opt-in to defer the skb frees after the socket spinlock is released. TCQ_F_DEQUEUE_DROPS is an attempt to not penalize other qdiscs with an extra cache line miss. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-14-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	0170d7f47c	net_sched: add tcf_kfree_skb_list() helper Using kfree_skb_list_reason() to free list of skbs from qdisc operations seems wrong as each skb might have a different drop reason. Cleanup __dev_xmit_skb() to call tcf_kfree_skb_list() once in preparation of the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-13-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	4792c3a4c1	net: annotate a data-race in __dev_xmit_skb() q->limit is read locklessly, add a READ_ONCE(). Fixes: `100dfa74ca` ("net: dev_queue_xmit() llist adoption") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-12-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	b2e9821cff	net: prefech skb->priority in __dev_xmit_skb() Most qdiscs need to read skb->priority at enqueue time(). In commit `100dfa74ca` ("net: dev_queue_xmit() llist adoption") I added a prefetch(next), lets add another one for the second half of skb. Note that skb->priority and skb->hash share a common cache line, so this patch helps qdiscs needing both fields. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-11-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	2f9babc04d	net_sched: sch_fq: prefetch one skb ahead in dequeue() prefetch the skb that we are likely to dequeue at the next dequeue(). Also call fq_dequeue_skb() a bit sooner in fq_dequeue(). This reduces the window between read of q.qlen and changes of fields in the cache line that could be dirtied by another cpu trying to queue a packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-10-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	3c1100f042	net_sched: sch_fq: move qdisc_bstats_update() to fq_dequeue_skb() Group together changes to qdisc fields to reduce chances of false sharing if another cpu attempts to acquire the qdisc spinlock. qdisc_qstats_backlog_dec(sch, skb); sch->q.qlen--; qdisc_bstats_update(sch, skb); Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-9-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	ad50d5a3fc	net_sched: add Qdisc_read_mostly and Qdisc_write groups It is possible to reorg Qdisc to avoid always dirtying 2 cache lines in fast path by reducing this to a single dirtied cache line. In current layout, we change only four/six fields in the first cache line: - q.spinlock - q.qlen - bstats.bytes - bstats.packets - some Qdisc also change q.next/q.prev In the second cache line we change in the fast path: - running - state - qstats.backlog /* --- cacheline 2 boundary (128 bytes) --- / struct sk_buff_head gso_skb __attribute__((__aligned__(64))); / 0x80 0x18 / struct qdisc_skb_head q; / 0x98 0x18 / struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); / 0xb0 0x10 / / --- cacheline 3 boundary (192 bytes) --- / struct gnet_stats_queue qstats; / 0xc0 0x14 / bool running; / 0xd4 0x1 / / XXX 3 bytes hole, try to pack / unsigned long state; / 0xd8 0x8 / struct Qdisc next_sched; /* 0xe0 0x8 / struct sk_buff_head skb_bad_txq; / 0xe8 0x18 / / --- cacheline 4 boundary (256 bytes) --- / Reorganize things to have a first cache line mostly read, then a mostly written one. This gives a ~3% increase of performance under tx stress. Note that there is an additional hole because @qstats now spans over a third cache line. / --- cacheline 2 boundary (128 bytes) --- / __u8 __cacheline_group_begin__Qdisc_read_mostly[0] __attribute__((__aligned__(64))); / 0x80 0 / struct sk_buff_head gso_skb; / 0x80 0x18 / struct Qdisc next_sched; /* 0x98 0x8 / struct sk_buff_head skb_bad_txq; / 0xa0 0x18 / __u8 __cacheline_group_end__Qdisc_read_mostly[0]; / 0xb8 0 / / XXX 8 bytes hole, try to pack / / --- cacheline 3 boundary (192 bytes) --- / __u8 __cacheline_group_begin__Qdisc_write[0] __attribute__((__aligned__(64))); / 0xc0 0 / struct qdisc_skb_head q; / 0xc0 0x18 / unsigned long state; / 0xd8 0x8 / struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); / 0xe0 0x10 / bool running; / 0xf0 0x1 / / XXX 3 bytes hole, try to pack / struct gnet_stats_queue qstats; / 0xf4 0x14 / / --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- / __u8 __cacheline_group_end__Qdisc_write[0]; / 0x108 0 / / XXX 56 bytes hole, try to pack */ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-8-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	c5d34f4583	net_sched: cake: use qdisc_pkt_segs() Use new qdisc_pkt_segs() to avoid a cache line miss in cake_enqueue() for non GSO packets. cake_overhead() does not have to recompute it. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-7-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	2773cb0b31	net_sched: use qdisc_skb_cb(skb)->pkt_segs in bstats_update() Avoid up to two cache line misses in qdisc dequeue() to fetch skb_shinfo(skb)->gso_segs/gso_size while qdisc spinlock is held. This gives a 5 % improvement in a TX intensive workload. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-6-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:32 +01:00
Eric Dumazet	f9e00e51e3	net: use qdisc_pkt_len_segs_init() in sch_handle_ingress() sch_handle_ingress() sets qdisc_skb_cb(skb)->pkt_len. We also need to initialize qdisc_skb_cb(skb)->pkt_segs. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-5-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:31 +01:00
Eric Dumazet	874c1928d3	net_sched: initialize qdisc_skb_cb(skb)->pkt_segs in qdisc_pkt_len_init() qdisc_pkt_len_init() is currently initalizing qdisc_skb_cb(skb)->pkt_len. Add qdisc_skb_cb(skb)->pkt_segs initialization and rename this function to qdisc_pkt_len_segs_init(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-4-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:31 +01:00
Eric Dumazet	be1b70ab21	net: init shinfo->gso_segs from qdisc_pkt_len_init() Qdisc use shinfo->gso_segs for their pkts stats in bstats_update(), but this field needs to be initialized for SKB_GSO_DODGY users. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-3-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:31 +01:00
Eric Dumazet	b2a38f6df9	net_sched: make room for (struct qdisc_skb_cb)->pkt_segs Add a new u16 field, next to pkt_len : pkt_segs This will cache shinfo->gso_segs to speed up qdisc deqeue(). Move slave_dev_queue_mapping at the end of qdisc_skb_cb, and move three bits from tc_skb_cb : - post_ct - post_ct_snat - post_ct_dnat Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-2-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 16:10:31 +01:00
Jacky Chou	e3daf0e7fe	dt-bindings: net: aspeed: add AST2700 MDIO compatible Add "aspeed,ast2700-mdio" compatible to the binding schema with a fallback to "aspeed,ast2600-mdio". Although the MDIO controller on AST2700 is functionally the same as the one on AST2600, it's good practice to add a SoC-specific compatible for new silicon. This allows future driver updates to handle any 2700-specific integration issues without requiring devicetree changes or complex runtime detection logic. For now, the driver continues to bind via the existing "aspeed,ast2600-mdio" compatible, so no driver changes are needed. Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20251120-aspeed_mdio_ast2700-v2-1-0d722bfb2c54@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-25 12:11:47 +01:00
Jakub Kicinski	cc1b62512a	Merge branch 'mptcp-memcg-accounting-for-passive-sockets-backlog-processing' Matthieu Baerts says: ==================== mptcp: memcg accounting for passive sockets & backlog processing This series is split in two: the 4 first patches are linked to memcg accounting for passive sockets, and the rest introduce the backlog processing. They are sent together, because the first one appeared to be needed to get the second one fully working. The second part includes RX path improvement built around backlog processing. The main goals are improving the RX performances _and_ increase the long term maintainability. - Patches 1-3: preparation work to ease the introduction of the next patch. - Patch 4: fix memcg accounting for passive sockets. Note that this is a (non-urgent) fix, but it depends on material that is currently only in net-next, e.g. commit `4a997d49d9` ("tcp: Save lock_sock() for memcg in inet_csk_accept()."). - Patches 5-6: preparation of the stack for backlog processing, removing assumptions that will not hold true any more after the backlog introduction. - Patches 7,8,10,11,12 are more cleanups that will make the backlog patch a little less huge. - Patch 9: somewhat an unrelated cleanup, included here not to forget about it. - Patches 13-14: The real work is done by them. Patch 13 introduces the helpers needed to manipulate the msk-level backlog, and the data struct itself, without any actual functional change. Patch 14 finally uses the backlog for RX skb processing. Note that MPTCP can't use the sk_backlog, as the MPTCP release callback can also release and re-acquire the msk-level spinlock and core backlog processing works under the assumption that such event is not possible. A relevant point is memory accounts for skbs in the backlog. It's somewhat "original" due to MPTCP constraints. Such skbs use space from the incoming subflow receive buffer, do not use explicitly any forward allocated memory, as we can't update the msk fwd mem while enqueuing, nor we want to acquire again the ssk socket lock while processing the skbs. Instead the msk borrows memory from the subflow and reserve it for the backlog, see patch 5 and 14 for the gory details. ==================== Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-24 20:23:43 -08:00
Paolo Abeni	6228efe0cc	mptcp: leverage the backlog for RX packet processing When the msk socket is owned or the msk receive buffer is full, move the incoming skbs in a msk level backlog list. This avoid traversing the joined subflows and acquiring the subflow level socket lock at reception time, improving the RX performances. When processing the backlog, use the fwd alloc memory borrowed from the incoming subflow. skbs exceeding the msk receive space are not dropped; instead they are kept into the backlog until the receive buffer is freed. Dropping packets already acked at the TCP level is explicitly discouraged by the RFC and would corrupt the data stream for fallback sockets. Special care is needed to avoid adding skbs to the backlog of a closed msk and to avoid leaving dangling references into the backlog at subflow closing time. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-14-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-24 19:49:43 -08:00

1 2 3 4 5 ...

1399532 Commits