Commit Graph

1382027 Commits

Author SHA1 Message Date
Paolo Abeni
ae76e8d2c2 Merge branch 'add-ppe-driver-for-qualcomm-ipq9574-soc'
Luo Jie says:

====================
Add PPE driver for Qualcomm IPQ9574 SoC

The PPE (packet process engine) hardware block is available in Qualcomm
IPQ chipsets that support PPE architecture, such as IPQ9574 and IPQ5332.
The PPE in the IPQ9574 SoC includes six Ethernet ports (6 GMAC and 6
XGMAC), which are used to connect with external PHY devices by PCS. The
PPE also includes packet processing offload capabilities for various
networking functions such as route and bridge flows, VLANs, different
tunnel protocols and VPN. It also includes an L2 switch function for
bridging packets among the 6 Ethernet ports and the CPU port. The CPU
port enables packet transfer between the Ethernet ports and the ARM
cores in the SoC, using the Ethernet DMA.

This patch series is the first part of a three part series that will
together enable Ethernet function for IPQ9574 SoC. While support is
initially being added for IPQ9574 SoC, the driver will be easily
extendable to enable Ethernet support for other IPQ SoC such as IPQ5332.
The driver can also be extended later for adding support for L2/L3
network offload features that the PPE can support. The functionality
to be enabled by each of the three series (to be posted sequentially)
is as below:

Part 1: The PPE patch series (this series), which enables the platform
driver, probe and initialization/configuration of different PPE hardware
blocks.

Part 2: The PPE MAC patch series, which enables the phylink operations
for the PPE Ethernet ports.

Part 3: The PPE EDMA patch series, which enables the Rx/Tx Ethernet DMA
and netdevice driver for the 6 PPE Ethernet ports.

A more detailed description of the functions enabled by part 1 is below:
1. Initialize PPE device hardware functions such as buffer management,
   queue management, scheduler and clocks in order to bring up PPE
   device.
2. Enable platform driver and probe functions
3. Register debugfs file to provide access to various PPE packet
   counters. These statistics are recorded by the various hardware
   process counters, such as port RX/TX, CPU code and hardware queue
   counters.
4. A detailed introduction of PPE along with the PPE hardware diagram
   in the first two patches (dt-bindings and documentation).

Below is a reference to an earlier RFC discussion with the community
about enabling Ethernet driver support for Qualcomm IPQ9574 SoC. This
writeup can help provide a higher level architectural view of various
other drivers that support the PPE such as clock and PCS drivers.
Topic: RFC: Advice on adding support for Qualcomm IPQ9574 SoC Ethernet.
https://lore.kernel.org/linux-arm-msm/d2929bd2-bc9e-4733-a89f-2a187e8bf917@quicinc.com/

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
====================

Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-0-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:44 +02:00
Luo Jie
ad5cef7ef0 MAINTAINERS: Add maintainer for Qualcomm PPE driver
Add maintainer entry for PPE (Packet Process Engine) driver supported
for Qualcomm IPQ SoCs.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-14-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:42 +02:00
Luo Jie
a2a7221dbd net: ethernet: qualcomm: Add PPE debugfs support for PPE counters
The PPE hardware counters maintain counters for packets handled by the
various functional blocks of PPE. They help in tracing the packets
passed through PPE and debugging any packet drops.

The counters displayed by this debugfs file are ones that are common
for all Ethernet ports, and they do not include the counters that are
specific for a MAC port. Hence they cannot be displayed using ethtool.
The per-MAC counters will be supported using "ethtool -S" along with
the netdevice driver.

The PPE hardware various type counters are made available through the
debugfs files under directory "/sys/kernel/debug/ppe/".

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-13-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:42 +02:00
Lei Wei
8cc72c6c92 net: ethernet: qualcomm: Initialize PPE L2 bridge settings
Initialize the L2 bridge settings for the PPE ports to only enable
L2 frame forwarding between CPU port and PPE Ethernet ports.

The per-port L2 bridge settings are initialized as follows:
For PPE CPU port, the PPE bridge TX is enabled and FDB learning is
disabled. For PPE physical ports, the default L2 forwarding action
is initialized to forward to CPU port only.

L2/FDB learning and forwarding will not be enabled for PPE physical
ports yet, since the port's VSI (Virtual Switch Instance) and VSI
membership are not yet configured, which are required for FDB
forwarding. The VSI and FDB forwarding will later be enabled when
switchdev is enabled.

Signed-off-by: Lei Wei <quic_leiwei@quicinc.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-12-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:42 +02:00
Luo Jie
fa99608a9a net: ethernet: qualcomm: Initialize PPE queue to Ethernet DMA ring mapping
Configure the selected queues to map with an Ethernet DMA ring for the
packet to receive on ARM cores.

As default initialization, all queues assigned to CPU port 0 are mapped
to the EDMA ring 0. This configuration is later updated during Ethernet
DMA initialization.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-11-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:41 +02:00
Luo Jie
1c46c3c007 net: ethernet: qualcomm: Initialize PPE RSS hash settings
The PPE RSS hash is generated during PPE receive, based on the packet
content (3 tuples or 5 tuples) and as per the configured RSS seed. The
hash is then used to select the queue to transmit the packet to the
ARM CPU.

This patch initializes the RSS hash settings that are used to generate
the hash for the packet during PPE packet receive.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-10-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:41 +02:00
Luo Jie
8821bb0f62 net: ethernet: qualcomm: Initialize PPE port control settings
Configure the default action as drop when the packet size is more than
the configured MTU of physical port. Also enable port specific counters
in PPE.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-9-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:41 +02:00
Luo Jie
73d05bdaf0 net: ethernet: qualcomm: Initialize PPE service code settings
PPE service code is a special code (0-255) that is defined by PPE for
PPE's packet processing stages, as per the network functions required
for the packet.

For packet being sent out by ARM cores on Ethernet ports, The service
code 1 is used as the default service code. This service code is used
to bypass most of packet processing stages of the PPE before the packet
transmitted out PPE port, since the software network stack has already
processed the packet.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-8-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:41 +02:00
Luo Jie
7a23a8af17 net: ethernet: qualcomm: Initialize PPE queue settings
Configure unicast and multicast hardware queues for the PPE ports to
enable packet forwarding between the ports.

Each PPE port is assigned with a range of queues. The queue ID selection
for the packet is decided by the queue base and queue offset that is
configured based on the internal priority and the RSS hash value of the
packet.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-7-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:41 +02:00
Luo Jie
3312279838 net: ethernet: qualcomm: Initialize the PPE scheduler settings
The PPE scheduler settings determine the priority of scheduling the
packet across the different hardware queues per PPE port.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-6-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:41 +02:00
Luo Jie
806268dc7e net: ethernet: qualcomm: Initialize PPE queue management for IPQ9574
QM (queue management) configurations decide the length of PPE queues
and the queue depth for these queues which are used to drop packets
in events of congestion.

There are two types of PPE queues - unicast queues (0-255) and multicast
queues (256-299). These queue types are used to forward different types
of traffic, and are configured with different lengths.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-5-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:41 +02:00
Luo Jie
8a971df98c net: ethernet: qualcomm: Initialize PPE buffer management for IPQ9574
The BM (Buffer Management) config controls the pause frame generated
on the PPE port. There are maximum 15 BM ports and 4 groups supported,
all BM ports are assigned to group 0 by default. The number of hardware
buffers configured for the port influence the threshold of the flow
control for that port.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-4-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:41 +02:00
Luo Jie
353a0f1d5b net: ethernet: qualcomm: Add PPE driver for IPQ9574 SoC
The PPE (Packet Process Engine) hardware block is available on Qualcomm
IPQ SoC that support PPE architecture, such as IPQ9574.

The PPE in IPQ9574 includes six integrated Ethernet MAC for 6 PPE ports,
buffer management, queue management and scheduler functions. The MACs
can connect with the external PHY or switch devices using the UNIPHY PCS
block available in the SoC.

The PPE also includes various packet processing offload capabilities
such as L3 routing and L2 bridging, VLAN and tunnel processing offload.
It also includes Ethernet DMA function for transferring packets between
ARM cores and PPE Ethernet ports.

This patch adds the base source files and Makefiles for the PPE driver
such as platform driver registration, clock initialization, and PPE
reset routines.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-3-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:41 +02:00
Lei Wei
6b9f301985 docs: networking: Add PPE driver documentation for Qualcomm IPQ9574 SoC
Add description and high-level diagram for PPE, driver overview and
module enable/debug information.

Signed-off-by: Lei Wei <quic_leiwei@quicinc.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-2-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:41 +02:00
Luo Jie
1898fc5721 dt-bindings: net: Add PPE for Qualcomm IPQ9574 SoC
The PPE (packet process engine) hardware block is available in Qualcomm
IPQ chipsets that support PPE architecture, such as IPQ9574. The PPE in
the IPQ9574 SoC includes six Ethernet ports (6 GMAC and 6 XGMAC), which
are used to connect with external PHY devices by PCS. It includes an L2
switch function for bridging packets among the 6 Ethernet ports and the
CPU port. The CPU port enables packet transfer between the Ethernet ports
and the ARM cores in the SoC, using the Ethernet DMA.

The PPE also includes packet processing offload capabilities for various
networking functions such as route and bridge flows, VLANs, different
tunnel protocols and VPN.

The PPE switch is modeled according to the Ethernet switch schema, with
additional properties defined for the switch node for interrupts, clocks,
resets, interconnects and Ethernet DMA. The switch port node is extended
with additional properties for clocks and resets.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-1-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 12:38:41 +02:00
Paolo Abeni
d051b1f9df Merge branch 'net-phy-micrel-add-support-for-lan8842'
Horatiu Vultur says:

====================
net: phy: micrel: Add support for lan8842

Add support for LAN8842 which supports industry-standard SGMII.
While add this the first 3 patches in the series cleans more the
driver, they should not introduce any functional changes.
====================

Link: https://patch.msgid.link/20250818075121.1298170-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 10:42:33 +02:00
Horatiu Vultur
5a774b64cd net: phy: micrel: Add support for lan8842
The LAN8842 is a low-power, single port triple-speed (10BASE-T/ 100BASE-TX/
1000BASE-T) ethernet physical layer transceiver (PHY) that supports
transmission and reception of data on standard CAT-5, as well as CAT-5e and
CAT-6, Unshielded Twisted Pair (UTP) cables.

The LAN8842 supports industry-standard SGMII (Serial Gigabit Media
Independent Interface) providing chip-to-chip connection to a Gigabit
Ethernet MAC using a single serialized link (differential pair) in each
direction.

There are 2 variants of the lan8842. The one that supports timestamping
(lan8842) and one that doesn't have timestamping (lan8832).

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250818075121.1298170-5-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 10:42:31 +02:00
Horatiu Vultur
d471793a9b net: phy: micrel: Replace hardcoded pages with defines
The functions lan_*_page_reg gets as a second parameter the page
where the register is. In all the functions the page was hardcoded.
Replace the hardcoded values with defines to make it more clear
what are those parameters.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250818075121.1298170-4-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 10:42:30 +02:00
Horatiu Vultur
a0de636ed7 net: phy: micrel: Introduce lanphy_modify_page_reg
As the name suggests this function modifies the register in an
extended page. It has the same parameters as phy_modify_mmd.
This function was introduce because there are many places in the
code where the registers was read then the value was modified and
written back. So replace all this code with this function to make
it clear.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250818075121.1298170-3-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 10:42:30 +02:00
Horatiu Vultur
54e974c715 net: phy: micrel: Start using PHY_ID_MATCH_MODEL
Start using PHY_ID_MATCH_MODEL for all the drivers.
While at this add also PHY_ID_KSZ8041RNLI to micrel_tbl.

It is safe to change the current of 0x00fffff0 to PHY_ID_MATCH_MODEL
because all the micrel PHYs have in MSB a value of 0.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250818075121.1298170-2-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 10:42:30 +02:00
Paolo Abeni
38dad812bb Merge tag 'mlx5-next-vhca-id' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
mlx5-next-vhca-id

A preparation patchset for adjacent function vports.

Adjacent functions can delegate their SR-IOV VFs to sibling PFs,
allowing for more flexible and scalable management in multi-host and
ECPF-to-host scenarios. Adjacent vports can be managed by the management
PF via their unique vhca id and can't be managed by function index as the
index can conflict with the local vports/vfs.

This series provides:

- Use the cached vcha id instead of querying it every time from fw
- Query hca cap using vhca id instead of function id when FW supports it
- Add HW capabilities and required definitions for adjacent function vports

* tag 'mlx5-next-vhca-id' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  {rdma,net}/mlx5: export mlx5_vport_get_vhca_id
  net/mlx5: E-Switch, Set/Query hca cap via vhca id
  net/mlx5: E-Switch, Cache vport vhca id on first cap query
  net/mlx5: mlx5_ifc, Add hardware definitions needed for adjacent vports
====================

Link: https://patch.msgid.link/20250815194901.298689-1-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 10:18:53 +02:00
Thorsten Blum
833e43171b net: pktgen: Use min()/min_t() to improve pktgen_finalize_skb()
Use min() and min_t() to improve pktgen_finalize_skb() and avoid
calculating 'datalen / frags' twice.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20250815153334.295431-3-thorsten.blum@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 10:12:11 +02:00
Yury Norov (NVIDIA)
62a2b35025 net: openvswitch: Use for_each_cpu() where appropriate
Due to legacy reasons, openswitch code opencodes for_each_cpu() to make
sure that CPU0 is always considered.

Since commit c4b2bf6b4a ("openvswitch: Optimize operations for OvS
flow_stats."), the corresponding  flow->cpu_used_mask is initialized
such that CPU0 is explicitly set.

So, switch the code to using plain for_each_cpu().

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://patch.msgid.link/20250818172806.189325-1-yury.norov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:47:22 -07:00
Kuniyuki Iwashima
a5c10aa3d1 selftests/net: packetdrill: Support single protocol test.
Currently, we cannot write IPv4 or IPv6 specific packetdrill tests
as ksft_runner.sh runs each .pkt file for both protocols.

Let's support single protocol test by checking --ip_version in the
.pkt file.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250819231527.1427361-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:39:25 -07:00
Eric Dumazet
a6d4f25888 net: set net.core.rmem_max and net.core.wmem_max to 4 MB
SO_RCVBUF and SO_SNDBUF have limited range today, unless
distros or system admins change rmem_max and wmem_max.

Even iproute2 uses 1 MB SO_RCVBUF which is capped by
the kernel.

Decouple [rw]mem_max and [rw]mem_default and increase
[rw]mem_max to 4 MB.

Before:

$ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.core.wmem_default = 212992
net.core.wmem_max = 212992

After:

$ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
net.core.rmem_default = 212992
net.core.rmem_max = 4194304
net.core.wmem_default = 212992
net.core.wmem_max = 4194304

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250819174030.1986278-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:35:00 -07:00
Jakub Kicinski
2a2e6e5375 Merge branch 'bnxt_en-updates-for-net-next'
Michael Chan says:

====================
bnxt_en: Updates for net-next

The first patch is the FW interface update, followed by 3 patches to
support the expanded pcie v2 structure for ethtool -d.  The last patch
adds a Hyper-V PCI ID for the 5760X chips (Thor2).

v1: https://lore.kernel.org/20250818004940.5663-1-michael.chan@broadcom.com
====================

Link: https://patch.msgid.link/20250819163919.104075-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:34:11 -07:00
Pavan Chebbi
5be7cb805b bnxt_en: Add Hyper-V VF ID
VFs of the P7 chip family created by Hyper-V will have the device ID of
0x181b.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250819163919.104075-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:34:08 -07:00
Shruti Parab
5a4cf42322 bnxt_en: Add pcie_ctx_v2 support for ethtool -d
Add support to dump the expanded v2 struct that contains PCIE read/write
latency and credit histogram data.

Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250819163919.104075-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:34:07 -07:00
Shruti Parab
b530173d3c bnxt_en: Add pcie_stat_len to struct bp
Add this length field to capture the length of the pcie stats structure
supported by the FW.  This length will be determined in
bnxt_ethtool_init().  The minimum of this FW length and the length
known to the driver will determine the actual ethtool -d length.

Suggested-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250819163919.104075-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:34:07 -07:00
Shruti Parab
1cc174d33a bnxt_en: Refactor bnxt_get_regs()
Separate the code that sends the FW message to retrieve pcie stats into
a new helper function.  The caller of the helper will call hwrm_req_hold()
beforehand and so the caller will call hwrm_req_drop() in all cases
afterwards.  This helper will be useful when adding the support for the
larger struct pcie_ctx_hw_stats_v2.

Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20250819163919.104075-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:34:07 -07:00
Michael Chan
5f8a4f34f6 bnxt_en: hsi: Update FW interface to 1.10.3.133
The major change is struct pcie_ctx_hw_stats_v2 which has new latency
histograms added.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250819163919.104075-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:34:07 -07:00
Hangbin Liu
781bf2cc06 selftests: rtnetlink: print device info on preferred_lft test failure
Even with slowwait used to avoid system sleep in the preferred_lft test,
failures can still occur after long runtimes.

Print the device address info when the test fails to provide better
troubleshooting data.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250819074749.388064-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:28:08 -07:00
Hangbin Liu
eacb6e408d selftests: net: bpf_offload: print loaded programs on mismatch
The test sometimes fails due to an unexpected number of loaded programs. e.g

  FAIL: 2 BPF programs loaded, expected 1
    File "/usr/libexec/kselftests/net/./bpf_offload.py", line 940, in <module>
      progs = bpftool_prog_list(expected=1)
    File "/usr/libexec/kselftests/net/./bpf_offload.py", line 187, in bpftool_prog_list
      fail(True, "%d BPF programs loaded, expected %d" %
    File "/usr/libexec/kselftests/net/./bpf_offload.py", line 89, in fail
      tb = "".join(traceback.extract_stack().format())

However, the logs do not show which programs were actually loaded, making it
difficult to debug the failure.

Add printing of the loaded programs when a mismatch is detected to help
troubleshoot such errors. The list is printed on a new line to avoid breaking
the current log format.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250819073348.387972-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:28:03 -07:00
Alex Tran
6b4b1d577e selftests/net/socket.c: removed warnings from unused returns
socket.c: In function ‘run_tests’:
socket.c:59:25: warning: ignoring return value of ‘strerror_r’ \
declared with attribute ‘warn_unused_result’ [-Wunused-result]
59 | strerror_r(-s->expect, err_string1, ERR_STRING_SZ);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

socket.c:60:25: warning: ignoring return value of ‘strerror_r’ \
declared with attribute ‘warn_unused_result’ [-Wunused-result]
60 | strerror_r(errno, err_string2, ERR_STRING_SZ);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

socket.c:73:33: warning: ignoring return value of ‘strerror_r’ \
declared with attribute ‘warn_unused_result’ [-Wunused-result]
73 | strerror_r(errno, err_string1, ERR_STRING_SZ);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

changelog:
v2
- const char* messages and fixed patch warnings of max 75 chars
  per line

Signed-off-by: Alex Tran <alex.t.tran@gmail.com>
Link: https://patch.msgid.link/20250819025227.239885-1-alex.t.tran@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:26:21 -07:00
Pengtao He
8f2c72f225 net: avoid one loop iteration in __skb_splice_bits
If *len is equal to 0 at the beginning of __splice_segment
it returns true directly. But when decreasing *len from
a positive number to 0 in __splice_segment, it returns false.
The __skb_splice_bits needs to call __splice_segment again.

Recheck *len if it changes, return true in time.
Reduce unnecessary calls to __splice_segment.

Signed-off-by: Pengtao He <hept.hept.hept@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250819021551.8361-1-hept.hept.hept@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:24:17 -07:00
Jakub Kicinski
c3199adbe4 Merge branch 'sctp-convert-to-use-crypto-lib-and-upgrade-cookie-auth'
Eric Biggers says:

====================
sctp: Convert to use crypto lib, and upgrade cookie auth

This series converts SCTP chunk and cookie authentication to use the
crypto library API instead of crypto_shash.  This is much simpler (the
diffstat should speak for itself), and also faster too.  In addition,
this series upgrades the cookie authentication to use HMAC-SHA256.

I've tested that kernels with this series applied can continue to
communicate using SCTP with older ones, in either direction, using any
choice of None, HMAC-SHA1, or HMAC-SHA256 chunk authentication.
====================

Link: https://patch.msgid.link/20250818205426.30222-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:36:30 -07:00
Eric Biggers
d5a253702a sctp: Stop accepting md5 and sha1 for net.sctp.cookie_hmac_alg
The upgrade of the cookie authentication algorithm to HMAC-SHA256 kept
some backwards compatibility for the net.sctp.cookie_hmac_alg sysctl by
still accepting the values 'md5' and 'sha1'.  Those algorithms are no
longer actually used, but rather those values were just treated as
requests to enable cookie authentication.

As requested at
https://lore.kernel.org/netdev/CADvbK_fmCRARc8VznH8cQa-QKaCOQZ6yFbF=1-VDK=zRqv_cXw@mail.gmail.com/
and https://lore.kernel.org/netdev/20250818084345.708ac796@kernel.org/ ,
go further and start rejecting 'md5' and 'sha1' completely.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250818205426.30222-6-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:36:26 -07:00
Eric Biggers
2f3dd6ec90 sctp: Convert cookie authentication to use HMAC-SHA256
Convert SCTP cookies to use HMAC-SHA256, instead of the previous choice
of the legacy algorithms HMAC-MD5 and HMAC-SHA1.  Simplify and optimize
the code by using the HMAC-SHA256 library instead of crypto_shash, and
by preparing the HMAC key when it is generated instead of per-operation.

This doesn't break compatibility, since the cookie format is an
implementation detail, not part of the SCTP protocol itself.

Note that the cookie size doesn't change either.  The HMAC field was
already 32 bytes, even though previously at most 20 bytes were actually
compared.  32 bytes exactly fits an untruncated HMAC-SHA256 value.  So,
although we could safely truncate the MAC to something slightly shorter,
for now just keep the cookie size the same.

I also considered SipHash, but that would generate only 8-byte MACs.  An
8-byte MAC *might* suffice here.  However, there's quite a lot of
information in the SCTP cookies: more than in TCP SYN cookies.  So
absent an analysis that occasional forgeries of all that information is
okay in SCTP, I errored on the side of caution.

Remove HMAC-MD5 and HMAC-SHA1 as options, since the new HMAC-SHA256
option is just better.  It's faster as well as more secure.  For
example, benchmarking on x86_64, cookie authentication is now nearly 3x
as fast as the previous default choice and implementation of HMAC-MD5.

Also just make the kernel always support cookie authentication if SCTP
is supported at all, rather than making it optional in the build.  (It
was sort of optional before, but it didn't really work properly.  E.g.,
a kernel with CONFIG_SCTP_COOKIE_HMAC_MD5=n still supported HMAC-MD5
cookie authentication if CONFIG_CRYPTO_HMAC and CONFIG_CRYPTO_MD5
happened to be enabled in the kconfig for other reasons.)

Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250818205426.30222-5-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:36:26 -07:00
Eric Biggers
bf40785fa4 sctp: Use HMAC-SHA1 and HMAC-SHA256 library for chunk authentication
For SCTP chunk authentication, use the HMAC-SHA1 and HMAC-SHA256 library
functions instead of crypto_shash.  This is simpler and faster.  There's
no longer any need to pre-allocate 'crypto_shash' objects; the SCTP code
now simply calls into the HMAC code directly.

As part of this, make SCTP always support both HMAC-SHA1 and
HMAC-SHA256.  Previously, it only guaranteed support for HMAC-SHA1.
However, HMAC-SHA256 tended to be supported too anyway, as it was
supported if CONFIG_CRYPTO_SHA256 was enabled elsewhere in the kconfig.

Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250818205426.30222-4-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:36:25 -07:00
Eric Biggers
dd91c79e4f sctp: Fix MAC comparison to be constant-time
To prevent timing attacks, MACs need to be compared in constant time.
Use the appropriate helper function for this.

Fixes: bbd0d59809 ("[SCTP]: Implement the receive and verification of AUTH chunk")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250818205426.30222-3-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:36:25 -07:00
Eric Biggers
490a9591b5 selftests: net: Explicitly enable CONFIG_CRYPTO_SHA1 for IPsec
xfrm_policy.sh, nft_flowtable.sh, and vrf-xfrm-tests.sh use 'ip xfrm'
with SHA-1, either 'auth sha1' or 'auth-trunc hmac(sha1)'.  That
requires CONFIG_CRYPTO_SHA1, which CONFIG_INET_ESP intentionally doesn't
select (as per its help text).  Previously, the config for these tests
relied on CONFIG_CRYPTO_SHA1 being selected by the unrelated option
CONFIG_IP_SCTP.  Since CONFIG_IP_SCTP is being changed to no longer do
that, instead add CONFIG_CRYPTO_SHA1 to the configs explicitly.

Reported-by: Paolo Abeni <pabeni@redhat.com>
Closes: https://lore.kernel.org/r/766e4508-aaba-4cdc-92b4-e116e52ae13b@redhat.com
Suggested-by: Florian Westphal <fw@strlen.de>
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250818205426.30222-2-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:36:24 -07:00
Jakub Kicinski
f9ca2820f5 Merge branch 'net-memcg-gather-memcg-code-under-config_memcg'
Kuniyuki Iwashima says:

====================
net-memcg: Gather memcg code under CONFIG_MEMCG.

This series converts most sk->sk_memcg access to helper functions
under CONFIG_MEMCG and finally defines sk_memcg under CONFIG_MEMCG.

This is v5 of the series linked below but without core changes
that decoupled memcg and global socket memory accounting.

I will defer the changes to a follow-up series that will use BPF
to store a flag in sk->sk_memcg.

Overview of the series:

  patch 1 is a trivial fix for MPTCP
  patch 2 ~ 9 move sk->sk_memcg accesses to a single place
  patch 10 moves sk_memcg under CONFIG_MEMCG

v4: https://lore.kernel.org/20250814200912.1040628-1-kuniyu@google.com
v3: https://lore.kernel.org/20250812175848.512446-1-kuniyu@google.com
v2: https://lore.kernel.org/20250811173116.2829786-1-kuniyu@google.com
v1: https://lore.kernel.org/20250721203624.3807041-1-kuniyu@google.com
====================

Link: https://patch.msgid.link/20250815201712.1745332-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:21:01 -07:00
Kuniyuki Iwashima
bf64002c94 net: Define sk_memcg under CONFIG_MEMCG.
Except for sk_clone_lock(), all accesses to sk->sk_memcg
is done under CONFIG_MEMCG.

As a bonus, let's define sk->sk_memcg under CONFIG_MEMCG.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://patch.msgid.link/20250815201712.1745332-11-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima
b2ffd10cdd net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure().
We will store a flag in the lowest bit of sk->sk_memcg.

Then, we cannot pass the raw pointer to mem_cgroup_under_socket_pressure().

Let's pass struct sock to it and rename the function to match other
functions starting with mem_cgroup_sk_.

Note that the helper is moved to sock.h to use mem_cgroup_from_sk().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://patch.msgid.link/20250815201712.1745332-10-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima
bb178c6bc0 net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge().
We will store a flag in the lowest bit of sk->sk_memcg.

Then, we cannot pass the raw pointer to mem_cgroup_charge_skmem()
and mem_cgroup_uncharge_skmem().

Let's pass struct sock to the functions.

While at it, they are renamed to match other functions starting
with mem_cgroup_sk_.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://patch.msgid.link/20250815201712.1745332-9-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima
43049b0db0 net-memcg: Introduce mem_cgroup_sk_enabled().
The socket memcg feature is enabled by a static key and
only works for non-root cgroup.

We check both conditions in many places.

Let's factorise it as a helper function.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://patch.msgid.link/20250815201712.1745332-8-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima
f7161b234f net-memcg: Introduce mem_cgroup_from_sk().
We will store a flag in the lowest bit of sk->sk_memcg.

Then, directly dereferencing sk->sk_memcg will be illegal, and we
do not want to allow touching the raw sk->sk_memcg in many places.

Let's introduce mem_cgroup_from_sk().

Other places accessing the raw sk->sk_memcg will be converted later.

Note that we cannot define the helper as an inline function in
memcontrol.h as we cannot access any fields of struct sock there
due to circular dependency, so it is placed in sock.h.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://patch.msgid.link/20250815201712.1745332-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima
bd4aa23373 net: Clean up __sk_mem_raise_allocated().
In __sk_mem_raise_allocated(), charged is initialised as true due
to the weird condition removed in the previous patch.

It makes the variable unreliable by itself, so we have to check
another variable, memcg, in advance.

Also, we will factorise the common check below for memcg later.

    if (mem_cgroup_sockets_enabled && sk->sk_memcg)

As a prep, let's initialise charged as false and memcg as NULL.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://patch.msgid.link/20250815201712.1745332-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima
9d85c565a7 net: Call trace_sock_exceed_buf_limit() for memcg failure with SK_MEM_RECV.
Initially, trace_sock_exceed_buf_limit() was invoked when
__sk_mem_raise_allocated() failed due to the memcg limit or the
global limit.

However, commit d6f19938eb ("net: expose sk wmem in
sock_exceed_buf_limit tracepoint") somehow suppressed the event
only when memcg failed to charge for SK_MEM_RECV, although the
memcg failure for SK_MEM_SEND still triggers the event.

Let's restore the event for SK_MEM_RECV.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://patch.msgid.link/20250815201712.1745332-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima
e2afa83296 tcp: Simplify error path in inet_csk_accept().
When an error occurs in inet_csk_accept(), what we should do is
only call release_sock() and set the errno to arg->err.

But the path jumps to another label, which introduces unnecessary
initialisation and tests for newsk.

Let's simplify the error path and remove the redundant NULL
checks for newsk.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://patch.msgid.link/20250815201712.1745332-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 19:20:58 -07:00