linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-06 03:06:50 -04:00

Author	SHA1	Message	Date
Luo Jie	73d05bdaf0	net: ethernet: qualcomm: Initialize PPE service code settings PPE service code is a special code (0-255) that is defined by PPE for PPE's packet processing stages, as per the network functions required for the packet. For packet being sent out by ARM cores on Ethernet ports, The service code 1 is used as the default service code. This service code is used to bypass most of packet processing stages of the PPE before the packet transmitted out PPE port, since the software network stack has already processed the packet. Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-8-1d4ff641fce9@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 12:38:41 +02:00
Luo Jie	7a23a8af17	net: ethernet: qualcomm: Initialize PPE queue settings Configure unicast and multicast hardware queues for the PPE ports to enable packet forwarding between the ports. Each PPE port is assigned with a range of queues. The queue ID selection for the packet is decided by the queue base and queue offset that is configured based on the internal priority and the RSS hash value of the packet. Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-7-1d4ff641fce9@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 12:38:41 +02:00
Luo Jie	3312279838	net: ethernet: qualcomm: Initialize the PPE scheduler settings The PPE scheduler settings determine the priority of scheduling the packet across the different hardware queues per PPE port. Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-6-1d4ff641fce9@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 12:38:41 +02:00
Luo Jie	806268dc7e	net: ethernet: qualcomm: Initialize PPE queue management for IPQ9574 QM (queue management) configurations decide the length of PPE queues and the queue depth for these queues which are used to drop packets in events of congestion. There are two types of PPE queues - unicast queues (0-255) and multicast queues (256-299). These queue types are used to forward different types of traffic, and are configured with different lengths. Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-5-1d4ff641fce9@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 12:38:41 +02:00
Luo Jie	8a971df98c	net: ethernet: qualcomm: Initialize PPE buffer management for IPQ9574 The BM (Buffer Management) config controls the pause frame generated on the PPE port. There are maximum 15 BM ports and 4 groups supported, all BM ports are assigned to group 0 by default. The number of hardware buffers configured for the port influence the threshold of the flow control for that port. Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-4-1d4ff641fce9@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 12:38:41 +02:00
Luo Jie	353a0f1d5b	net: ethernet: qualcomm: Add PPE driver for IPQ9574 SoC The PPE (Packet Process Engine) hardware block is available on Qualcomm IPQ SoC that support PPE architecture, such as IPQ9574. The PPE in IPQ9574 includes six integrated Ethernet MAC for 6 PPE ports, buffer management, queue management and scheduler functions. The MACs can connect with the external PHY or switch devices using the UNIPHY PCS block available in the SoC. The PPE also includes various packet processing offload capabilities such as L3 routing and L2 bridging, VLAN and tunnel processing offload. It also includes Ethernet DMA function for transferring packets between ARM cores and PPE Ethernet ports. This patch adds the base source files and Makefiles for the PPE driver such as platform driver registration, clock initialization, and PPE reset routines. Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-3-1d4ff641fce9@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 12:38:41 +02:00
Lei Wei	6b9f301985	docs: networking: Add PPE driver documentation for Qualcomm IPQ9574 SoC Add description and high-level diagram for PPE, driver overview and module enable/debug information. Signed-off-by: Lei Wei <quic_leiwei@quicinc.com> Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-2-1d4ff641fce9@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 12:38:41 +02:00
Luo Jie	1898fc5721	dt-bindings: net: Add PPE for Qualcomm IPQ9574 SoC The PPE (packet process engine) hardware block is available in Qualcomm IPQ chipsets that support PPE architecture, such as IPQ9574. The PPE in the IPQ9574 SoC includes six Ethernet ports (6 GMAC and 6 XGMAC), which are used to connect with external PHY devices by PCS. It includes an L2 switch function for bridging packets among the 6 Ethernet ports and the CPU port. The CPU port enables packet transfer between the Ethernet ports and the ARM cores in the SoC, using the Ethernet DMA. The PPE also includes packet processing offload capabilities for various networking functions such as route and bridge flows, VLANs, different tunnel protocols and VPN. The PPE switch is modeled according to the Ethernet switch schema, with additional properties defined for the switch node for interrupts, clocks, resets, interconnects and Ethernet DMA. The switch port node is extended with additional properties for clocks and resets. Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-1-1d4ff641fce9@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 12:38:41 +02:00
Paolo Abeni	d051b1f9df	Merge branch 'net-phy-micrel-add-support-for-lan8842' Horatiu Vultur says: ==================== net: phy: micrel: Add support for lan8842 Add support for LAN8842 which supports industry-standard SGMII. While add this the first 3 patches in the series cleans more the driver, they should not introduce any functional changes. ==================== Link: https://patch.msgid.link/20250818075121.1298170-1-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 10:42:33 +02:00
Horatiu Vultur	5a774b64cd	net: phy: micrel: Add support for lan8842 The LAN8842 is a low-power, single port triple-speed (10BASE-T/ 100BASE-TX/ 1000BASE-T) ethernet physical layer transceiver (PHY) that supports transmission and reception of data on standard CAT-5, as well as CAT-5e and CAT-6, Unshielded Twisted Pair (UTP) cables. The LAN8842 supports industry-standard SGMII (Serial Gigabit Media Independent Interface) providing chip-to-chip connection to a Gigabit Ethernet MAC using a single serialized link (differential pair) in each direction. There are 2 variants of the lan8842. The one that supports timestamping (lan8842) and one that doesn't have timestamping (lan8832). Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250818075121.1298170-5-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 10:42:31 +02:00
Horatiu Vultur	d471793a9b	net: phy: micrel: Replace hardcoded pages with defines The functions lan_*_page_reg gets as a second parameter the page where the register is. In all the functions the page was hardcoded. Replace the hardcoded values with defines to make it more clear what are those parameters. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://patch.msgid.link/20250818075121.1298170-4-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 10:42:30 +02:00
Horatiu Vultur	a0de636ed7	net: phy: micrel: Introduce lanphy_modify_page_reg As the name suggests this function modifies the register in an extended page. It has the same parameters as phy_modify_mmd. This function was introduce because there are many places in the code where the registers was read then the value was modified and written back. So replace all this code with this function to make it clear. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://patch.msgid.link/20250818075121.1298170-3-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 10:42:30 +02:00
Horatiu Vultur	54e974c715	net: phy: micrel: Start using PHY_ID_MATCH_MODEL Start using PHY_ID_MATCH_MODEL for all the drivers. While at this add also PHY_ID_KSZ8041RNLI to micrel_tbl. It is safe to change the current of 0x00fffff0 to PHY_ID_MATCH_MODEL because all the micrel PHYs have in MSB a value of 0. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250818075121.1298170-2-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 10:42:30 +02:00
Paolo Abeni	38dad812bb	Merge tag 'mlx5-next-vhca-id' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next-vhca-id A preparation patchset for adjacent function vports. Adjacent functions can delegate their SR-IOV VFs to sibling PFs, allowing for more flexible and scalable management in multi-host and ECPF-to-host scenarios. Adjacent vports can be managed by the management PF via their unique vhca id and can't be managed by function index as the index can conflict with the local vports/vfs. This series provides: - Use the cached vcha id instead of querying it every time from fw - Query hca cap using vhca id instead of function id when FW supports it - Add HW capabilities and required definitions for adjacent function vports * tag 'mlx5-next-vhca-id' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: {rdma,net}/mlx5: export mlx5_vport_get_vhca_id net/mlx5: E-Switch, Set/Query hca cap via vhca id net/mlx5: E-Switch, Cache vport vhca id on first cap query net/mlx5: mlx5_ifc, Add hardware definitions needed for adjacent vports ==================== Link: https://patch.msgid.link/20250815194901.298689-1-saeed@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 10:18:53 +02:00
Thorsten Blum	833e43171b	net: pktgen: Use min()/min_t() to improve pktgen_finalize_skb() Use min() and min_t() to improve pktgen_finalize_skb() and avoid calculating 'datalen / frags' twice. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20250815153334.295431-3-thorsten.blum@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 10:12:11 +02:00
Yury Norov (NVIDIA)	62a2b35025	net: openvswitch: Use for_each_cpu() where appropriate Due to legacy reasons, openswitch code opencodes for_each_cpu() to make sure that CPU0 is always considered. Since commit `c4b2bf6b4a` ("openvswitch: Optimize operations for OvS flow_stats."), the corresponding flow->cpu_used_mask is initialized such that CPU0 is explicitly set. So, switch the code to using plain for_each_cpu(). Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Link: https://patch.msgid.link/20250818172806.189325-1-yury.norov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:47:22 -07:00
Kuniyuki Iwashima	a5c10aa3d1	selftests/net: packetdrill: Support single protocol test. Currently, we cannot write IPv4 or IPv6 specific packetdrill tests as ksft_runner.sh runs each .pkt file for both protocols. Let's support single protocol test by checking --ip_version in the .pkt file. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250819231527.1427361-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:39:25 -07:00
Eric Dumazet	a6d4f25888	net: set net.core.rmem_max and net.core.wmem_max to 4 MB SO_RCVBUF and SO_SNDBUF have limited range today, unless distros or system admins change rmem_max and wmem_max. Even iproute2 uses 1 MB SO_RCVBUF which is capped by the kernel. Decouple [rw]mem_max and [rw]mem_default and increase [rw]mem_max to 4 MB. Before: $ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max net.core.rmem_default = 212992 net.core.rmem_max = 212992 net.core.wmem_default = 212992 net.core.wmem_max = 212992 After: $ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max net.core.rmem_default = 212992 net.core.rmem_max = 4194304 net.core.wmem_default = 212992 net.core.wmem_max = 4194304 Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20250819174030.1986278-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:35:00 -07:00
Jakub Kicinski	2a2e6e5375	Merge branch 'bnxt_en-updates-for-net-next' Michael Chan says: ==================== bnxt_en: Updates for net-next The first patch is the FW interface update, followed by 3 patches to support the expanded pcie v2 structure for ethtool -d. The last patch adds a Hyper-V PCI ID for the 5760X chips (Thor2). v1: https://lore.kernel.org/20250818004940.5663-1-michael.chan@broadcom.com ==================== Link: https://patch.msgid.link/20250819163919.104075-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:34:11 -07:00
Pavan Chebbi	5be7cb805b	bnxt_en: Add Hyper-V VF ID VFs of the P7 chip family created by Hyper-V will have the device ID of 0x181b. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250819163919.104075-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:34:08 -07:00
Shruti Parab	5a4cf42322	bnxt_en: Add pcie_ctx_v2 support for ethtool -d Add support to dump the expanded v2 struct that contains PCIE read/write latency and credit histogram data. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250819163919.104075-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:34:07 -07:00
Shruti Parab	b530173d3c	bnxt_en: Add pcie_stat_len to struct bp Add this length field to capture the length of the pcie stats structure supported by the FW. This length will be determined in bnxt_ethtool_init(). The minimum of this FW length and the length known to the driver will determine the actual ethtool -d length. Suggested-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250819163919.104075-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:34:07 -07:00
Shruti Parab	1cc174d33a	bnxt_en: Refactor bnxt_get_regs() Separate the code that sends the FW message to retrieve pcie stats into a new helper function. The caller of the helper will call hwrm_req_hold() beforehand and so the caller will call hwrm_req_drop() in all cases afterwards. This helper will be useful when adding the support for the larger struct pcie_ctx_hw_stats_v2. Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20250819163919.104075-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:34:07 -07:00
Michael Chan	5f8a4f34f6	bnxt_en: hsi: Update FW interface to 1.10.3.133 The major change is struct pcie_ctx_hw_stats_v2 which has new latency histograms added. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250819163919.104075-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:34:07 -07:00
Hangbin Liu	781bf2cc06	selftests: rtnetlink: print device info on preferred_lft test failure Even with slowwait used to avoid system sleep in the preferred_lft test, failures can still occur after long runtimes. Print the device address info when the test fails to provide better troubleshooting data. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250819074749.388064-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:28:08 -07:00
Hangbin Liu	eacb6e408d	selftests: net: bpf_offload: print loaded programs on mismatch The test sometimes fails due to an unexpected number of loaded programs. e.g FAIL: 2 BPF programs loaded, expected 1 File "/usr/libexec/kselftests/net/./bpf_offload.py", line 940, in <module> progs = bpftool_prog_list(expected=1) File "/usr/libexec/kselftests/net/./bpf_offload.py", line 187, in bpftool_prog_list fail(True, "%d BPF programs loaded, expected %d" % File "/usr/libexec/kselftests/net/./bpf_offload.py", line 89, in fail tb = "".join(traceback.extract_stack().format()) However, the logs do not show which programs were actually loaded, making it difficult to debug the failure. Add printing of the loaded programs when a mismatch is detected to help troubleshoot such errors. The list is printed on a new line to avoid breaking the current log format. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250819073348.387972-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:28:03 -07:00
Alex Tran	6b4b1d577e	selftests/net/socket.c: removed warnings from unused returns socket.c: In function ‘run_tests’: socket.c:59:25: warning: ignoring return value of ‘strerror_r’ \ declared with attribute ‘warn_unused_result’ [-Wunused-result] 59 \| strerror_r(-s->expect, err_string1, ERR_STRING_SZ); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ socket.c:60:25: warning: ignoring return value of ‘strerror_r’ \ declared with attribute ‘warn_unused_result’ [-Wunused-result] 60 \| strerror_r(errno, err_string2, ERR_STRING_SZ); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ socket.c:73:33: warning: ignoring return value of ‘strerror_r’ \ declared with attribute ‘warn_unused_result’ [-Wunused-result] 73 \| strerror_r(errno, err_string1, ERR_STRING_SZ); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ changelog: v2 - const char* messages and fixed patch warnings of max 75 chars per line Signed-off-by: Alex Tran <alex.t.tran@gmail.com> Link: https://patch.msgid.link/20250819025227.239885-1-alex.t.tran@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:26:21 -07:00
Pengtao He	8f2c72f225	net: avoid one loop iteration in __skb_splice_bits If len is equal to 0 at the beginning of __splice_segment it returns true directly. But when decreasing len from a positive number to 0 in __splice_segment, it returns false. The __skb_splice_bits needs to call __splice_segment again. Recheck *len if it changes, return true in time. Reduce unnecessary calls to __splice_segment. Signed-off-by: Pengtao He <hept.hept.hept@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250819021551.8361-1-hept.hept.hept@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:24:17 -07:00
Jakub Kicinski	c3199adbe4	Merge branch 'sctp-convert-to-use-crypto-lib-and-upgrade-cookie-auth' Eric Biggers says: ==================== sctp: Convert to use crypto lib, and upgrade cookie auth This series converts SCTP chunk and cookie authentication to use the crypto library API instead of crypto_shash. This is much simpler (the diffstat should speak for itself), and also faster too. In addition, this series upgrades the cookie authentication to use HMAC-SHA256. I've tested that kernels with this series applied can continue to communicate using SCTP with older ones, in either direction, using any choice of None, HMAC-SHA1, or HMAC-SHA256 chunk authentication. ==================== Link: https://patch.msgid.link/20250818205426.30222-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:36:30 -07:00
Eric Biggers	d5a253702a	sctp: Stop accepting md5 and sha1 for net.sctp.cookie_hmac_alg The upgrade of the cookie authentication algorithm to HMAC-SHA256 kept some backwards compatibility for the net.sctp.cookie_hmac_alg sysctl by still accepting the values 'md5' and 'sha1'. Those algorithms are no longer actually used, but rather those values were just treated as requests to enable cookie authentication. As requested at https://lore.kernel.org/netdev/CADvbK_fmCRARc8VznH8cQa-QKaCOQZ6yFbF=1-VDK=zRqv_cXw@mail.gmail.com/ and https://lore.kernel.org/netdev/20250818084345.708ac796@kernel.org/ , go further and start rejecting 'md5' and 'sha1' completely. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20250818205426.30222-6-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:36:26 -07:00
Eric Biggers	2f3dd6ec90	sctp: Convert cookie authentication to use HMAC-SHA256 Convert SCTP cookies to use HMAC-SHA256, instead of the previous choice of the legacy algorithms HMAC-MD5 and HMAC-SHA1. Simplify and optimize the code by using the HMAC-SHA256 library instead of crypto_shash, and by preparing the HMAC key when it is generated instead of per-operation. This doesn't break compatibility, since the cookie format is an implementation detail, not part of the SCTP protocol itself. Note that the cookie size doesn't change either. The HMAC field was already 32 bytes, even though previously at most 20 bytes were actually compared. 32 bytes exactly fits an untruncated HMAC-SHA256 value. So, although we could safely truncate the MAC to something slightly shorter, for now just keep the cookie size the same. I also considered SipHash, but that would generate only 8-byte MACs. An 8-byte MAC might suffice here. However, there's quite a lot of information in the SCTP cookies: more than in TCP SYN cookies. So absent an analysis that occasional forgeries of all that information is okay in SCTP, I errored on the side of caution. Remove HMAC-MD5 and HMAC-SHA1 as options, since the new HMAC-SHA256 option is just better. It's faster as well as more secure. For example, benchmarking on x86_64, cookie authentication is now nearly 3x as fast as the previous default choice and implementation of HMAC-MD5. Also just make the kernel always support cookie authentication if SCTP is supported at all, rather than making it optional in the build. (It was sort of optional before, but it didn't really work properly. E.g., a kernel with CONFIG_SCTP_COOKIE_HMAC_MD5=n still supported HMAC-MD5 cookie authentication if CONFIG_CRYPTO_HMAC and CONFIG_CRYPTO_MD5 happened to be enabled in the kconfig for other reasons.) Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20250818205426.30222-5-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:36:26 -07:00
Eric Biggers	bf40785fa4	sctp: Use HMAC-SHA1 and HMAC-SHA256 library for chunk authentication For SCTP chunk authentication, use the HMAC-SHA1 and HMAC-SHA256 library functions instead of crypto_shash. This is simpler and faster. There's no longer any need to pre-allocate 'crypto_shash' objects; the SCTP code now simply calls into the HMAC code directly. As part of this, make SCTP always support both HMAC-SHA1 and HMAC-SHA256. Previously, it only guaranteed support for HMAC-SHA1. However, HMAC-SHA256 tended to be supported too anyway, as it was supported if CONFIG_CRYPTO_SHA256 was enabled elsewhere in the kconfig. Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20250818205426.30222-4-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:36:25 -07:00
Eric Biggers	dd91c79e4f	sctp: Fix MAC comparison to be constant-time To prevent timing attacks, MACs need to be compared in constant time. Use the appropriate helper function for this. Fixes: `bbd0d59809` ("[SCTP]: Implement the receive and verification of AUTH chunk") Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20250818205426.30222-3-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:36:25 -07:00
Eric Biggers	490a9591b5	selftests: net: Explicitly enable CONFIG_CRYPTO_SHA1 for IPsec xfrm_policy.sh, nft_flowtable.sh, and vrf-xfrm-tests.sh use 'ip xfrm' with SHA-1, either 'auth sha1' or 'auth-trunc hmac(sha1)'. That requires CONFIG_CRYPTO_SHA1, which CONFIG_INET_ESP intentionally doesn't select (as per its help text). Previously, the config for these tests relied on CONFIG_CRYPTO_SHA1 being selected by the unrelated option CONFIG_IP_SCTP. Since CONFIG_IP_SCTP is being changed to no longer do that, instead add CONFIG_CRYPTO_SHA1 to the configs explicitly. Reported-by: Paolo Abeni <pabeni@redhat.com> Closes: https://lore.kernel.org/r/766e4508-aaba-4cdc-92b4-e116e52ae13b@redhat.com Suggested-by: Florian Westphal <fw@strlen.de> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20250818205426.30222-2-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:36:24 -07:00
Jakub Kicinski	f9ca2820f5	Merge branch 'net-memcg-gather-memcg-code-under-config_memcg' Kuniyuki Iwashima says: ==================== net-memcg: Gather memcg code under CONFIG_MEMCG. This series converts most sk->sk_memcg access to helper functions under CONFIG_MEMCG and finally defines sk_memcg under CONFIG_MEMCG. This is v5 of the series linked below but without core changes that decoupled memcg and global socket memory accounting. I will defer the changes to a follow-up series that will use BPF to store a flag in sk->sk_memcg. Overview of the series: patch 1 is a trivial fix for MPTCP patch 2 ~ 9 move sk->sk_memcg accesses to a single place patch 10 moves sk_memcg under CONFIG_MEMCG v4: https://lore.kernel.org/20250814200912.1040628-1-kuniyu@google.com v3: https://lore.kernel.org/20250812175848.512446-1-kuniyu@google.com v2: https://lore.kernel.org/20250811173116.2829786-1-kuniyu@google.com v1: https://lore.kernel.org/20250721203624.3807041-1-kuniyu@google.com ==================== Link: https://patch.msgid.link/20250815201712.1745332-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:21:01 -07:00
Kuniyuki Iwashima	bf64002c94	net: Define sk_memcg under CONFIG_MEMCG. Except for sk_clone_lock(), all accesses to sk->sk_memcg is done under CONFIG_MEMCG. As a bonus, let's define sk->sk_memcg under CONFIG_MEMCG. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-11-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	b2ffd10cdd	net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure(). We will store a flag in the lowest bit of sk->sk_memcg. Then, we cannot pass the raw pointer to mem_cgroup_under_socket_pressure(). Let's pass struct sock to it and rename the function to match other functions starting with mem_cgroup_sk_. Note that the helper is moved to sock.h to use mem_cgroup_from_sk(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-10-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	bb178c6bc0	net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge(). We will store a flag in the lowest bit of sk->sk_memcg. Then, we cannot pass the raw pointer to mem_cgroup_charge_skmem() and mem_cgroup_uncharge_skmem(). Let's pass struct sock to the functions. While at it, they are renamed to match other functions starting with mem_cgroup_sk_. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-9-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	43049b0db0	net-memcg: Introduce mem_cgroup_sk_enabled(). The socket memcg feature is enabled by a static key and only works for non-root cgroup. We check both conditions in many places. Let's factorise it as a helper function. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	f7161b234f	net-memcg: Introduce mem_cgroup_from_sk(). We will store a flag in the lowest bit of sk->sk_memcg. Then, directly dereferencing sk->sk_memcg will be illegal, and we do not want to allow touching the raw sk->sk_memcg in many places. Let's introduce mem_cgroup_from_sk(). Other places accessing the raw sk->sk_memcg will be converted later. Note that we cannot define the helper as an inline function in memcontrol.h as we cannot access any fields of struct sock there due to circular dependency, so it is placed in sock.h. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	bd4aa23373	net: Clean up __sk_mem_raise_allocated(). In __sk_mem_raise_allocated(), charged is initialised as true due to the weird condition removed in the previous patch. It makes the variable unreliable by itself, so we have to check another variable, memcg, in advance. Also, we will factorise the common check below for memcg later. if (mem_cgroup_sockets_enabled && sk->sk_memcg) As a prep, let's initialise charged as false and memcg as NULL. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	9d85c565a7	net: Call trace_sock_exceed_buf_limit() for memcg failure with SK_MEM_RECV. Initially, trace_sock_exceed_buf_limit() was invoked when __sk_mem_raise_allocated() failed due to the memcg limit or the global limit. However, commit `d6f19938eb` ("net: expose sk wmem in sock_exceed_buf_limit tracepoint") somehow suppressed the event only when memcg failed to charge for SK_MEM_RECV, although the memcg failure for SK_MEM_SEND still triggers the event. Let's restore the event for SK_MEM_RECV. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	e2afa83296	tcp: Simplify error path in inet_csk_accept(). When an error occurs in inet_csk_accept(), what we should do is only call release_sock() and set the errno to arg->err. But the path jumps to another label, which introduces unnecessary initialisation and tests for newsk. Let's simplify the error path and remove the redundant NULL checks for newsk. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:58 -07:00
Kuniyuki Iwashima	1068b48ed1	mptcp: Use tcp_under_memory_pressure() in mptcp_epollin_ready(). Some conditions used in mptcp_epollin_ready() are the same as tcp_under_memory_pressure(). We will modify tcp_under_memory_pressure() in the later patch. Let's use tcp_under_memory_pressure() instead. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:58 -07:00
Kuniyuki Iwashima	68889dfd54	mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n. When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets sk->sk_memcg based on the current task. MPTCP subflow socket creation is triggered from userspace or an in-kernel worker. In the latter case, sk->sk_memcg is not what we want. So, we fix it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup(). Although the code is placed under #ifdef CONFIG_MEMCG, it is buried under #ifdef CONFIG_SOCK_CGROUP_DATA. The two configs are orthogonal. If CONFIG_MEMCG is enabled without CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged correctly. Let's move the code out of the wrong ifdef guard. Note that sk->sk_memcg is freed in sk_prot_free() and the parent sk holds the refcnt of memcg->css here, so we don't need to use css_tryget(). Fixes: `3764b0c565` ("mptcp: attach subflow socket to parent cgroup") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:58 -07:00
Jakub Kicinski	5c69e0b395	Merge branch 'stmmac-stop-silently-dropping-bad-checksum-packets' Oleksij Rempel says: ==================== stmmac: stop silently dropping bad checksum packets this series reworks how stmmac handles receive checksum offload (CoE) errors on dwmac4. At present, when CoE is enabled, the hardware silently discards any frame that fails checksum validation. These packets never reach the driver and are not accounted in the generic drop statistics. They are only visible in the stmmac-specific counters as "payload error" or "header error" packets, which makes it harder to debug or monitor network issues. Following discussion [1], the driver is reworked to propagate checksum error information up to the stack. With these changes, CoE stays enabled, but frames that fail hardware validation are no longer dropped in hardware. Instead, the driver marks them with CHECKSUM_NONE so the network stack can validate, drop, and properly account them in the standard drop statistics. [1] https://lore.kernel.org/all/20250625132117.1b3264e8@kernel.org/ ==================== Link: https://patch.msgid.link/20250818090217.2789521-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:33:09 -07:00
Oleksij Rempel	fe40427976	net: stmmac: dwmac4: stop hardware from dropping checksum-error packets Tell the MAC not to discard frames that fail TCP/IP checksum validation. By default, when the hardware checksum engine (CoE) is enabled, dwmac4 silently drops any packet where the offload engine detects a checksum error. These frames are not reported to the driver and are not counted in any statistics as dropped packets. Set the MTL_OP_MODE_DIS_TCP_EF bit when initializing the Rx channel so that all packets are delivered, even if they failed hardware checksum validation. CoE remains enabled, but instead of dropping such frames, the driver propagates the error status and marks the skb with CHECKSUM_NONE. This allows the stack to verify and drop the packet while updating statistics. This change follows the decision made in the discussion: Link: https://lore.kernel.org/all/20250625132117.1b3264e8@kernel.org/ It depends on the previous patches that added proper error propagation in the Rx path. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250818090217.2789521-4-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:33:06 -07:00
Oleksij Rempel	644b8437cc	net: stmmac: dwmac4: report Rx checksum errors in status Propagate hardware checksum failures from the descriptor parser to the caller. Currently, dwmac4_wrback_get_rx_status() updates stats when the Rx descriptor signals an IP header or payload checksum error, but it does not reflect this in its return value. The higher-level stmmac_rx() code therefore cannot tell that hardware checksum validation failed. Set the csum_none flag in the returned status when either RDES1_IP_HDR_ERROR or RDES1_IP_PAYLOAD_ERROR is present. This aligns dwmac4 with enh_desc_coe_rdes0() and lets stmmac_rx() mark the skb as CHECKSUM_NONE for software verification. This is a preparatory step for disabling the hardware filter that drops frames which do not pass checksum validation. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250818090217.2789521-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:33:05 -07:00
Oleksij Rempel	ee0aace5f8	net: stmmac: Correctly handle Rx checksum offload errors The stmmac_rx function would previously set skb->ip_summed to CHECKSUM_UNNECESSARY if hardware checksum offload (CoE) was enabled and the packet was of a known IP ethertype. However, this logic failed to check if the hardware had actually reported a checksum error. The hardware status, indicating a header or payload checksum failure, was being ignored at this stage. This could cause corrupt packets to be passed up the network stack as valid. This patch corrects the logic by checking the `csum_none` status flag, which is set when the hardware reports a checksum error. If this flag is set, skb->ip_summed is now correctly set to CHECKSUM_NONE, ensuring the kernel's network stack will perform its own validation and properly handle the corrupt packet. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250818090217.2789521-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:33:05 -07:00
Jakub Kicinski	8beead2d15	Merge branch 'there-are-a-cleancode-and-a-parameter-check-for-hns3-driver' Jijie Shao says: ==================== There are a cleancode and a parameter check for hns3 driver This patchset includes: 1. a parameter check omitted from fix code in net branch https://lore.kernel.org/all/20250723072900.GV2459@horms.kernel.org/ 2. a small clean code ==================== Link: https://patch.msgid.link/20250815100414.949752-1-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:11:19 -07:00

1 2 3 4 5 ...

1382020 Commits