linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-06 00:47:56 -04:00

Author	SHA1	Message	Date
Michael Chan	5f8a4f34f6	bnxt_en: hsi: Update FW interface to 1.10.3.133 The major change is struct pcie_ctx_hw_stats_v2 which has new latency histograms added. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250819163919.104075-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:34:07 -07:00
Hangbin Liu	781bf2cc06	selftests: rtnetlink: print device info on preferred_lft test failure Even with slowwait used to avoid system sleep in the preferred_lft test, failures can still occur after long runtimes. Print the device address info when the test fails to provide better troubleshooting data. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250819074749.388064-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:28:08 -07:00
Hangbin Liu	eacb6e408d	selftests: net: bpf_offload: print loaded programs on mismatch The test sometimes fails due to an unexpected number of loaded programs. e.g FAIL: 2 BPF programs loaded, expected 1 File "/usr/libexec/kselftests/net/./bpf_offload.py", line 940, in <module> progs = bpftool_prog_list(expected=1) File "/usr/libexec/kselftests/net/./bpf_offload.py", line 187, in bpftool_prog_list fail(True, "%d BPF programs loaded, expected %d" % File "/usr/libexec/kselftests/net/./bpf_offload.py", line 89, in fail tb = "".join(traceback.extract_stack().format()) However, the logs do not show which programs were actually loaded, making it difficult to debug the failure. Add printing of the loaded programs when a mismatch is detected to help troubleshoot such errors. The list is printed on a new line to avoid breaking the current log format. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250819073348.387972-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:28:03 -07:00
Alex Tran	6b4b1d577e	selftests/net/socket.c: removed warnings from unused returns socket.c: In function ‘run_tests’: socket.c:59:25: warning: ignoring return value of ‘strerror_r’ \ declared with attribute ‘warn_unused_result’ [-Wunused-result] 59 \| strerror_r(-s->expect, err_string1, ERR_STRING_SZ); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ socket.c:60:25: warning: ignoring return value of ‘strerror_r’ \ declared with attribute ‘warn_unused_result’ [-Wunused-result] 60 \| strerror_r(errno, err_string2, ERR_STRING_SZ); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ socket.c:73:33: warning: ignoring return value of ‘strerror_r’ \ declared with attribute ‘warn_unused_result’ [-Wunused-result] 73 \| strerror_r(errno, err_string1, ERR_STRING_SZ); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ changelog: v2 - const char* messages and fixed patch warnings of max 75 chars per line Signed-off-by: Alex Tran <alex.t.tran@gmail.com> Link: https://patch.msgid.link/20250819025227.239885-1-alex.t.tran@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:26:21 -07:00
Pengtao He	8f2c72f225	net: avoid one loop iteration in __skb_splice_bits If len is equal to 0 at the beginning of __splice_segment it returns true directly. But when decreasing len from a positive number to 0 in __splice_segment, it returns false. The __skb_splice_bits needs to call __splice_segment again. Recheck *len if it changes, return true in time. Reduce unnecessary calls to __splice_segment. Signed-off-by: Pengtao He <hept.hept.hept@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250819021551.8361-1-hept.hept.hept@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:24:17 -07:00
Jakub Kicinski	c3199adbe4	Merge branch 'sctp-convert-to-use-crypto-lib-and-upgrade-cookie-auth' Eric Biggers says: ==================== sctp: Convert to use crypto lib, and upgrade cookie auth This series converts SCTP chunk and cookie authentication to use the crypto library API instead of crypto_shash. This is much simpler (the diffstat should speak for itself), and also faster too. In addition, this series upgrades the cookie authentication to use HMAC-SHA256. I've tested that kernels with this series applied can continue to communicate using SCTP with older ones, in either direction, using any choice of None, HMAC-SHA1, or HMAC-SHA256 chunk authentication. ==================== Link: https://patch.msgid.link/20250818205426.30222-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:36:30 -07:00
Eric Biggers	d5a253702a	sctp: Stop accepting md5 and sha1 for net.sctp.cookie_hmac_alg The upgrade of the cookie authentication algorithm to HMAC-SHA256 kept some backwards compatibility for the net.sctp.cookie_hmac_alg sysctl by still accepting the values 'md5' and 'sha1'. Those algorithms are no longer actually used, but rather those values were just treated as requests to enable cookie authentication. As requested at https://lore.kernel.org/netdev/CADvbK_fmCRARc8VznH8cQa-QKaCOQZ6yFbF=1-VDK=zRqv_cXw@mail.gmail.com/ and https://lore.kernel.org/netdev/20250818084345.708ac796@kernel.org/ , go further and start rejecting 'md5' and 'sha1' completely. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20250818205426.30222-6-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:36:26 -07:00
Eric Biggers	2f3dd6ec90	sctp: Convert cookie authentication to use HMAC-SHA256 Convert SCTP cookies to use HMAC-SHA256, instead of the previous choice of the legacy algorithms HMAC-MD5 and HMAC-SHA1. Simplify and optimize the code by using the HMAC-SHA256 library instead of crypto_shash, and by preparing the HMAC key when it is generated instead of per-operation. This doesn't break compatibility, since the cookie format is an implementation detail, not part of the SCTP protocol itself. Note that the cookie size doesn't change either. The HMAC field was already 32 bytes, even though previously at most 20 bytes were actually compared. 32 bytes exactly fits an untruncated HMAC-SHA256 value. So, although we could safely truncate the MAC to something slightly shorter, for now just keep the cookie size the same. I also considered SipHash, but that would generate only 8-byte MACs. An 8-byte MAC might suffice here. However, there's quite a lot of information in the SCTP cookies: more than in TCP SYN cookies. So absent an analysis that occasional forgeries of all that information is okay in SCTP, I errored on the side of caution. Remove HMAC-MD5 and HMAC-SHA1 as options, since the new HMAC-SHA256 option is just better. It's faster as well as more secure. For example, benchmarking on x86_64, cookie authentication is now nearly 3x as fast as the previous default choice and implementation of HMAC-MD5. Also just make the kernel always support cookie authentication if SCTP is supported at all, rather than making it optional in the build. (It was sort of optional before, but it didn't really work properly. E.g., a kernel with CONFIG_SCTP_COOKIE_HMAC_MD5=n still supported HMAC-MD5 cookie authentication if CONFIG_CRYPTO_HMAC and CONFIG_CRYPTO_MD5 happened to be enabled in the kconfig for other reasons.) Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20250818205426.30222-5-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:36:26 -07:00
Eric Biggers	bf40785fa4	sctp: Use HMAC-SHA1 and HMAC-SHA256 library for chunk authentication For SCTP chunk authentication, use the HMAC-SHA1 and HMAC-SHA256 library functions instead of crypto_shash. This is simpler and faster. There's no longer any need to pre-allocate 'crypto_shash' objects; the SCTP code now simply calls into the HMAC code directly. As part of this, make SCTP always support both HMAC-SHA1 and HMAC-SHA256. Previously, it only guaranteed support for HMAC-SHA1. However, HMAC-SHA256 tended to be supported too anyway, as it was supported if CONFIG_CRYPTO_SHA256 was enabled elsewhere in the kconfig. Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20250818205426.30222-4-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:36:25 -07:00
Eric Biggers	dd91c79e4f	sctp: Fix MAC comparison to be constant-time To prevent timing attacks, MACs need to be compared in constant time. Use the appropriate helper function for this. Fixes: `bbd0d59809` ("[SCTP]: Implement the receive and verification of AUTH chunk") Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20250818205426.30222-3-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:36:25 -07:00
Eric Biggers	490a9591b5	selftests: net: Explicitly enable CONFIG_CRYPTO_SHA1 for IPsec xfrm_policy.sh, nft_flowtable.sh, and vrf-xfrm-tests.sh use 'ip xfrm' with SHA-1, either 'auth sha1' or 'auth-trunc hmac(sha1)'. That requires CONFIG_CRYPTO_SHA1, which CONFIG_INET_ESP intentionally doesn't select (as per its help text). Previously, the config for these tests relied on CONFIG_CRYPTO_SHA1 being selected by the unrelated option CONFIG_IP_SCTP. Since CONFIG_IP_SCTP is being changed to no longer do that, instead add CONFIG_CRYPTO_SHA1 to the configs explicitly. Reported-by: Paolo Abeni <pabeni@redhat.com> Closes: https://lore.kernel.org/r/766e4508-aaba-4cdc-92b4-e116e52ae13b@redhat.com Suggested-by: Florian Westphal <fw@strlen.de> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20250818205426.30222-2-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:36:24 -07:00
Jakub Kicinski	f9ca2820f5	Merge branch 'net-memcg-gather-memcg-code-under-config_memcg' Kuniyuki Iwashima says: ==================== net-memcg: Gather memcg code under CONFIG_MEMCG. This series converts most sk->sk_memcg access to helper functions under CONFIG_MEMCG and finally defines sk_memcg under CONFIG_MEMCG. This is v5 of the series linked below but without core changes that decoupled memcg and global socket memory accounting. I will defer the changes to a follow-up series that will use BPF to store a flag in sk->sk_memcg. Overview of the series: patch 1 is a trivial fix for MPTCP patch 2 ~ 9 move sk->sk_memcg accesses to a single place patch 10 moves sk_memcg under CONFIG_MEMCG v4: https://lore.kernel.org/20250814200912.1040628-1-kuniyu@google.com v3: https://lore.kernel.org/20250812175848.512446-1-kuniyu@google.com v2: https://lore.kernel.org/20250811173116.2829786-1-kuniyu@google.com v1: https://lore.kernel.org/20250721203624.3807041-1-kuniyu@google.com ==================== Link: https://patch.msgid.link/20250815201712.1745332-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:21:01 -07:00
Kuniyuki Iwashima	bf64002c94	net: Define sk_memcg under CONFIG_MEMCG. Except for sk_clone_lock(), all accesses to sk->sk_memcg is done under CONFIG_MEMCG. As a bonus, let's define sk->sk_memcg under CONFIG_MEMCG. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-11-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	b2ffd10cdd	net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure(). We will store a flag in the lowest bit of sk->sk_memcg. Then, we cannot pass the raw pointer to mem_cgroup_under_socket_pressure(). Let's pass struct sock to it and rename the function to match other functions starting with mem_cgroup_sk_. Note that the helper is moved to sock.h to use mem_cgroup_from_sk(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-10-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	bb178c6bc0	net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge(). We will store a flag in the lowest bit of sk->sk_memcg. Then, we cannot pass the raw pointer to mem_cgroup_charge_skmem() and mem_cgroup_uncharge_skmem(). Let's pass struct sock to the functions. While at it, they are renamed to match other functions starting with mem_cgroup_sk_. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-9-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	43049b0db0	net-memcg: Introduce mem_cgroup_sk_enabled(). The socket memcg feature is enabled by a static key and only works for non-root cgroup. We check both conditions in many places. Let's factorise it as a helper function. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	f7161b234f	net-memcg: Introduce mem_cgroup_from_sk(). We will store a flag in the lowest bit of sk->sk_memcg. Then, directly dereferencing sk->sk_memcg will be illegal, and we do not want to allow touching the raw sk->sk_memcg in many places. Let's introduce mem_cgroup_from_sk(). Other places accessing the raw sk->sk_memcg will be converted later. Note that we cannot define the helper as an inline function in memcontrol.h as we cannot access any fields of struct sock there due to circular dependency, so it is placed in sock.h. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	bd4aa23373	net: Clean up __sk_mem_raise_allocated(). In __sk_mem_raise_allocated(), charged is initialised as true due to the weird condition removed in the previous patch. It makes the variable unreliable by itself, so we have to check another variable, memcg, in advance. Also, we will factorise the common check below for memcg later. if (mem_cgroup_sockets_enabled && sk->sk_memcg) As a prep, let's initialise charged as false and memcg as NULL. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	9d85c565a7	net: Call trace_sock_exceed_buf_limit() for memcg failure with SK_MEM_RECV. Initially, trace_sock_exceed_buf_limit() was invoked when __sk_mem_raise_allocated() failed due to the memcg limit or the global limit. However, commit `d6f19938eb` ("net: expose sk wmem in sock_exceed_buf_limit tracepoint") somehow suppressed the event only when memcg failed to charge for SK_MEM_RECV, although the memcg failure for SK_MEM_SEND still triggers the event. Let's restore the event for SK_MEM_RECV. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:59 -07:00
Kuniyuki Iwashima	e2afa83296	tcp: Simplify error path in inet_csk_accept(). When an error occurs in inet_csk_accept(), what we should do is only call release_sock() and set the errno to arg->err. But the path jumps to another label, which introduces unnecessary initialisation and tests for newsk. Let's simplify the error path and remove the redundant NULL checks for newsk. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:58 -07:00
Kuniyuki Iwashima	1068b48ed1	mptcp: Use tcp_under_memory_pressure() in mptcp_epollin_ready(). Some conditions used in mptcp_epollin_ready() are the same as tcp_under_memory_pressure(). We will modify tcp_under_memory_pressure() in the later patch. Let's use tcp_under_memory_pressure() instead. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:58 -07:00
Kuniyuki Iwashima	68889dfd54	mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n. When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets sk->sk_memcg based on the current task. MPTCP subflow socket creation is triggered from userspace or an in-kernel worker. In the latter case, sk->sk_memcg is not what we want. So, we fix it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup(). Although the code is placed under #ifdef CONFIG_MEMCG, it is buried under #ifdef CONFIG_SOCK_CGROUP_DATA. The two configs are orthogonal. If CONFIG_MEMCG is enabled without CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged correctly. Let's move the code out of the wrong ifdef guard. Note that sk->sk_memcg is freed in sk_prot_free() and the parent sk holds the refcnt of memcg->css here, so we don't need to use css_tryget(). Fixes: `3764b0c565` ("mptcp: attach subflow socket to parent cgroup") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://patch.msgid.link/20250815201712.1745332-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 19:20:58 -07:00
Jakub Kicinski	5c69e0b395	Merge branch 'stmmac-stop-silently-dropping-bad-checksum-packets' Oleksij Rempel says: ==================== stmmac: stop silently dropping bad checksum packets this series reworks how stmmac handles receive checksum offload (CoE) errors on dwmac4. At present, when CoE is enabled, the hardware silently discards any frame that fails checksum validation. These packets never reach the driver and are not accounted in the generic drop statistics. They are only visible in the stmmac-specific counters as "payload error" or "header error" packets, which makes it harder to debug or monitor network issues. Following discussion [1], the driver is reworked to propagate checksum error information up to the stack. With these changes, CoE stays enabled, but frames that fail hardware validation are no longer dropped in hardware. Instead, the driver marks them with CHECKSUM_NONE so the network stack can validate, drop, and properly account them in the standard drop statistics. [1] https://lore.kernel.org/all/20250625132117.1b3264e8@kernel.org/ ==================== Link: https://patch.msgid.link/20250818090217.2789521-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:33:09 -07:00
Oleksij Rempel	fe40427976	net: stmmac: dwmac4: stop hardware from dropping checksum-error packets Tell the MAC not to discard frames that fail TCP/IP checksum validation. By default, when the hardware checksum engine (CoE) is enabled, dwmac4 silently drops any packet where the offload engine detects a checksum error. These frames are not reported to the driver and are not counted in any statistics as dropped packets. Set the MTL_OP_MODE_DIS_TCP_EF bit when initializing the Rx channel so that all packets are delivered, even if they failed hardware checksum validation. CoE remains enabled, but instead of dropping such frames, the driver propagates the error status and marks the skb with CHECKSUM_NONE. This allows the stack to verify and drop the packet while updating statistics. This change follows the decision made in the discussion: Link: https://lore.kernel.org/all/20250625132117.1b3264e8@kernel.org/ It depends on the previous patches that added proper error propagation in the Rx path. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250818090217.2789521-4-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:33:06 -07:00
Oleksij Rempel	644b8437cc	net: stmmac: dwmac4: report Rx checksum errors in status Propagate hardware checksum failures from the descriptor parser to the caller. Currently, dwmac4_wrback_get_rx_status() updates stats when the Rx descriptor signals an IP header or payload checksum error, but it does not reflect this in its return value. The higher-level stmmac_rx() code therefore cannot tell that hardware checksum validation failed. Set the csum_none flag in the returned status when either RDES1_IP_HDR_ERROR or RDES1_IP_PAYLOAD_ERROR is present. This aligns dwmac4 with enh_desc_coe_rdes0() and lets stmmac_rx() mark the skb as CHECKSUM_NONE for software verification. This is a preparatory step for disabling the hardware filter that drops frames which do not pass checksum validation. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250818090217.2789521-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:33:05 -07:00
Oleksij Rempel	ee0aace5f8	net: stmmac: Correctly handle Rx checksum offload errors The stmmac_rx function would previously set skb->ip_summed to CHECKSUM_UNNECESSARY if hardware checksum offload (CoE) was enabled and the packet was of a known IP ethertype. However, this logic failed to check if the hardware had actually reported a checksum error. The hardware status, indicating a header or payload checksum failure, was being ignored at this stage. This could cause corrupt packets to be passed up the network stack as valid. This patch corrects the logic by checking the `csum_none` status flag, which is set when the hardware reports a checksum error. If this flag is set, skb->ip_summed is now correctly set to CHECKSUM_NONE, ensuring the kernel's network stack will perform its own validation and properly handle the corrupt packet. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250818090217.2789521-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:33:05 -07:00
Jakub Kicinski	8beead2d15	Merge branch 'there-are-a-cleancode-and-a-parameter-check-for-hns3-driver' Jijie Shao says: ==================== There are a cleancode and a parameter check for hns3 driver This patchset includes: 1. a parameter check omitted from fix code in net branch https://lore.kernel.org/all/20250723072900.GV2459@horms.kernel.org/ 2. a small clean code ==================== Link: https://patch.msgid.link/20250815100414.949752-1-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:11:19 -07:00
Jijie Shao	021f989c86	net: hns3: change the function return type from int to bool hclge_only_alloc_priv_buff() only return true or false, So, change the function return type from integer to boolean. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250815100414.949752-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:11:15 -07:00
Jijie Shao	e16e973c57	net: hns3: add parameter check for tx_copybreak and tx_spare_buf_size Since the driver always enables tx bounce buffer, there are minimum values for `copybreak` and `tx_spare_buf_size`. This patch will check and reject configurations with values smaller than these minimums. Closes: https://lore.kernel.org/all/20250723072900.GV2459@horms.kernel.org/ Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250815100414.949752-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:11:15 -07:00
Markus Stockhausen	3a752e6780	net: phy: realtek: enable serdes option mode for RTL8226-CG The RTL8226-CG can make use of the serdes option mode feature to dynamically switch between SGMII and 2500base-X. From what is known the setup sequence is much simpler with no magic values. Convert the exiting config_init() into a helper that configures the PHY depending on generation 1 or 2. Call the helper from two separated new config_init() functions. Finally convert the phy_driver specs of the RTL8226-CG to make use of the new configuration and switch over to the extended read_status() function to dynamically change the interface according to the serdes mode. Remark! The logic could be simpler if the serdes mode could be set before all other generation 2 magic values. Due to missing RTL8221B test hardware the mmd command order was kept. Tested on Zyxel XGS1210-12. Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de> Link: https://patch.msgid.link/20250815082009.3678865-1-markus.stockhausen@gmx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:09:52 -07:00
Miguel García	09bde6fdcd	ipv6: ip6_gre: replace strcpy with strscpy for tunnel name Replace the strcpy() call that copies the device name into tunnel->parms.name with strscpy(), to avoid potential overflow and guarantee NULL termination. This uses the two-argument form of strscpy(), where the destination size is inferred from the array type. Destination is tunnel->parms.name (size IFNAMSIZ). Tested in QEMU (Alpine rootfs): - Created IPv6 GRE tunnels over loopback - Assigned overlay IPv6 addresses - Verified bidirectional ping through the tunnel - Changed tunnel parameters at runtime (`ip -6 tunnel change`) Signed-off-by: Miguel García <miguelgarciaroman8@gmail.com> Link: https://patch.msgid.link/20250818220203.899338-1-miguelgarciaroman8@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 18:06:24 -07:00
Jakub Kicinski	9efd5152e3	Merge branch 'net-convert-to-skb_dstref_steal-and-skb_dstref_restore' Stanislav Fomichev says: ==================== net: Convert to skb_dstref_steal and skb_dstref_restore To diagnose and prevent issues similar to [0], emit warning (CONFIG_DEBUG_NET) from skb_dst_set and skb_dst_set_noref when overwriting non-null reference-counted entry. Two new helpers are added to handle special cases where the entry needs to be reset and restored: skb_dstref_steal/skb_dstref_restore. The bulk of the patches in the series converts manual _skb_refst manipulations to these new helpers. 0: https://lore.kernel.org/netdev/20250723224625.1340224-1-sdf@fomichev.me/T/#u ==================== Link: https://patch.msgid.link/20250818154032.3173645-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:54:47 -07:00
Stanislav Fomichev	a890348adc	net: Add skb_dst_check_unset To prevent dst_entry leaks, add warning when the non-NULL dst_entry is rewritten. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250818154032.3173645-8-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:54:44 -07:00
Stanislav Fomichev	3e31075a11	chtls: Convert to skb_dst_reset Going forward skb_dst_set will assert that skb dst_entry is empty during skb_dst_set. skb_dstref_steal is added to reset existing entry without doing refcnt. Chelsio driver is doing extra dst management via skb_dst_set(NULL). Replace these calls with skb_dstref_steal. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250818154032.3173645-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:54:41 -07:00
Stanislav Fomichev	da3b9d493b	staging: octeon: Convert to skb_dst_drop Instead of doing dst_release and skb_dst_set, do skb_dst_drop which should do the right thing. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250818154032.3173645-6-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:54:38 -07:00
Stanislav Fomichev	e97e6a1830	net: Switch to skb_dstref_steal/skb_dstref_restore for ip_route_input callers Going forward skb_dst_set will assert that skb dst_entry is empty during skb_dst_set. skb_dstref_steal is added to reset existing entry without doing refcnt. skb_dstref_restore should be used to restore the previous entry. Convert icmp_route_lookup and ip_options_rcv_srr to these helpers. Add extra call to skb_dstref_reset to icmp_route_lookup to clear the ip_route_input entry. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250818154032.3173645-5-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:54:35 -07:00
Stanislav Fomichev	15488d4d8d	netfilter: Switch to skb_dstref_steal to clear dst_entry Going forward skb_dst_set will assert that skb dst_entry is empty during skb_dst_set. skb_dstref_steal is added to reset existing entry without doing refcnt. Switch to skb_dstref_steal in ip[6]_route_me_harder and add a comment on why it's safe to skip skb_dstref_restore. Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250818154032.3173645-4-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:54:19 -07:00
Stanislav Fomichev	c829aab21e	xfrm: Switch to skb_dstref_steal to clear dst_entry Going forward skb_dst_set will assert that skb dst_entry is empty during skb_dst_set. skb_dstref_steal is added to reset existing entry without doing refcnt. Switch to skb_dstref_steal in __xfrm_route_forward and add a comment on why it's safe to skip skb_dstref_restore. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250818154032.3173645-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:54:17 -07:00
Stanislav Fomichev	c3f0c02997	net: Add skb_dstref_steal and skb_dstref_restore Going forward skb_dst_set will assert that skb dst_entry is empty during skb_dst_set to prevent potential leaks. There are few places that still manually manage dst_entry not using the helpers. Convert them to the following new helpers: - skb_dstref_steal that resets dst_entry and returns previous dst_entry value - skb_dstref_restore that restores dst_entry previously reset via skb_dstref_steal Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250818154032.3173645-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:54:13 -07:00
Jakub Kicinski	0e041220ea	Merge branch 'net-speedup-some-nexthop-handling-when-having-a-lot-of-nexthops' Christoph Paasch says: ==================== net: Speedup some nexthop handling when having A LOT of nexthops Configuring a very large number of nexthops is fairly possible within a reasonable time-frame. But, certain netlink commands can become extremely slow. This series addresses some of these, namely dumping and removing nexthops. v1: https://lore.kernel.org/20250724-nexthop_dump-v1-1-6b43fffd5bac@openai.com ==================== Link: https://patch.msgid.link/20250816-nexthop_dump-v2-0-491da3462118@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:50:36 -07:00
Christoph Paasch	b0ac6d3b56	net: When removing nexthops, don't call synchronize_net if it is not necessary When removing a nexthop, commit `90f33bffa3` ("nexthops: don't modify published nexthop groups") added a call to synchronize_rcu() (later changed to _net()) to make sure everyone sees the new nexthop-group before the rtnl-lock is released. When one wants to delete a large number of groups and nexthops, it is fastest to first flush the groups (ip nexthop flush groups) and then flush the nexthops themselves (ip -6 nexthop flush). As that way the groups don't need to be rebalanced. However, `ip -6 nexthop flush` will still take a long time if there is a very large number of nexthops because of the call to synchronize_net(). Now, if there are no more groups, there is no point in calling synchronize_net(). So, let's skip that entirely by checking if nh->grp_list is empty. This gives us a nice speedup: BEFORE: ======= $ time sudo ip -6 nexthop flush Dump was interrupted and may be inconsistent. Flushed 2097152 nexthops real 1m45.345s user 0m0.001s sys 0m0.005s $ time sudo ip -6 nexthop flush Dump was interrupted and may be inconsistent. Flushed 4194304 nexthops real 3m10.430s user 0m0.002s sys 0m0.004s AFTER: ====== $ time sudo ip -6 nexthop flush Dump was interrupted and may be inconsistent. Flushed 2097152 nexthops real 0m17.545s user 0m0.003s sys 0m0.003s $ time sudo ip -6 nexthop flush Dump was interrupted and may be inconsistent. Flushed 4194304 nexthops real 0m35.823s user 0m0.002s sys 0m0.004s Signed-off-by: Christoph Paasch <cpaasch@openai.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250816-nexthop_dump-v2-2-491da3462118@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:50:33 -07:00
Christoph Paasch	5236f57e7c	net: Make nexthop-dumps scale linearly with the number of nexthops When we have a (very) large number of nexthops, they do not fit within a single message. rtm_dump_walk_nexthops() thus will be called repeatedly and ctx->idx is used to avoid dumping the same nexthops again. The approach in which we avoid dumping the same nexthops is by basically walking the entire nexthop rb-tree from the left-most node until we find a node whose id is >= s_idx. That does not scale well. Instead of this inefficient approach, rather go directly through the tree to the nexthop that should be dumped (the one whose nh_id >= s_idx). This allows us to find the relevant node in O(log(n)). We have quite a nice improvement with this: Before: ======= --> ~1M nexthops: $ time ~/libnl/src/nl-nh-list \| wc -l 1050624 real 0m21.080s user 0m0.666s sys 0m20.384s --> ~2M nexthops: $ time ~/libnl/src/nl-nh-list \| wc -l 2101248 real 1m51.649s user 0m1.540s sys 1m49.908s After: ====== --> ~1M nexthops: $ time ~/libnl/src/nl-nh-list \| wc -l 1050624 real 0m1.157s user 0m0.926s sys 0m0.259s --> ~2M nexthops: $ time ~/libnl/src/nl-nh-list \| wc -l 2101248 real 0m2.763s user 0m2.042s sys 0m0.776s Signed-off-by: Christoph Paasch <cpaasch@openai.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250816-nexthop_dump-v2-1-491da3462118@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:50:33 -07:00
Jakub Kicinski	51992f99f0	selftests: drv-net: ncdevmem: make configure_channels() support combined channels ncdevmem tests that the kernel correctly rejects attempts to deactivate queues with MPs bound. Make the configure_channels() test support combined channels. Currently it tries to set the queue counts to rx N tx N-1, which only makes sense for devices which have IRQs per ring type. Most modern devices used combined IRQs/channels with both Rx and Tx queues. Since the math is total Rx == combined+Rx setting Rx when combined is non-zero will be increasing the total queue count, not decreasing as the test intends. Note that the test would previously also try to set the Tx ring count to Rx - 1, for some reason. Which would be 0 if the device has only 2 queues configured. With this change (device with 2 queues): setting channel count rx:1 tx:1 YNL set channels: Kernel error: 'requested channel counts are too low for existing memory provider setting (2)' Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250815231513.381652-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:49:35 -07:00
Jakub Kicinski	eddc821f98	selftests: drv-net: tso: increase the retransmit threshold We see quite a few flakes during the TSO test against virtualized devices in NIPA. There's often 10-30 retransmissions during the test. Sometimes as many as 100. Set the retransmission threshold at 1/4th of the wire frame target. Link: https://patch.msgid.link/20250815224100.363438-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-19 17:49:21 -07:00
Chaoyi Chen	da114122b8	net: ethernet: stmmac: dwmac-rk: Make the clk_phy could be used for external phy For external phy, clk_phy should be optional, and some external phy need the clock input from clk_phy. This patch adds support for setting clk_phy for external phy. Signed-off-by: David Wu <david.wu@rock-chips.com> Signed-off-by: Chaoyi Chen <chaoyi.chen@rock-chips.com> Link: https://patch.msgid.link/20250815023515.114-1-kernel@airkyi.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-19 15:57:28 +02:00
Jakub Kicinski	0283b8f134	selftests: drv-net: test the napi init state Test that threaded state (in the persistent NAPI config) gets updated even when NAPI with given ID is not allocated at the time. This test is validating commit `ccba9f6baa` ("net: update NAPI threaded config even for disabled NAPIs"). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250815013314.2237512-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-19 15:46:04 +02:00
Dipayaan Roy	730ff06d3f	net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency. This patch enhances RX buffer handling in the mana driver by allocating pages from a page pool and slicing them into MTU-sized fragments, rather than dedicating a full page per packet. This approach is especially beneficial on systems with large base page sizes like 64KB. Key improvements: - Proper integration of page pool for RX buffer allocations. - MTU-sized buffer slicing to improve memory utilization. - Reduce overall per Rx queue memory footprint. - Automatic fallback to full-page buffers when: * Jumbo frames are enabled (MTU > PAGE_SIZE / 2). * The XDP path is active, to avoid complexities with fragment reuse. Testing on VMs with 64KB pages shows around 200% throughput improvement. Memory efficiency is significantly improved due to reduced wastage in page allocations. Example: We are now able to fit 35 rx buffers in a single 64kb page for MTU size of 1500, instead of 1 rx buffer per page previously. Tested: - iperf3, iperf2, and nttcp benchmarks. - Jumbo frames with MTU 9000. - Native XDP programs (XDP_PASS, XDP_DROP, XDP_TX, XDP_REDIRECT) for testing the XDP path in driver. - Memory leak detection (kmemleak). - Driver load/unload, reboot, and stress scenarios. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20250814140410.GA22089@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-19 14:42:44 +02:00
Lorenzo Bianconi	a8bdd935d1	net: airoha: Add wlan flowtable TX offload Introduce support to offload the traffic received on the ethernet NIC and forwarded to the wireless one using HW Packet Processor Engine (PPE) capabilities. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250814-airoha-en7581-wlan-tx-offload-v1-1-72e0a312003e@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-19 12:22:24 +02:00
Paolo Abeni	244ada9cb7	Merge branch 'net-macb-add-taprio-traffic-scheduling-support' Vineeth Karumanchi says: ==================== net: macb: Add TAPRIO traffic scheduling support Implement Time-Aware Traffic Scheduling (TAPRIO) offload support for Cadence MACB/GEM ethernet controllers to enable IEEE 802.1Qbv compliant time-sensitive networking (TSN) capabilities. Key features implemented: - Complete TAPRIO qdisc offload infrastructure with TC_SETUP_QDISC_TAPRIO - Hardware-accelerated time-based gate control for multiple queues - Enhanced Scheduled Traffic (ENST) register configuration and management - Gate state scheduling with configurable start times, on/off intervals - Support for cycle-time based traffic scheduling with validation - Hardware capability detection via MACB_CAPS_QBV flag - Robust error handling and parameter validation - Queue-specific timing register programming (ENST_START_TIME, ENST_ON_TIME, ENST_OFF_TIME) Changes include: - Add enst_ns_to_hw_units(): Converts nanoseconds to hardware units - Add enst_max_hw_interval(): Returns max interval for given speed - Add macb_taprio_setup_replace() for TAPRIO configuration - Add macb_taprio_destroy() for cleanup and reset - Add macb_setup_tc() as TC offload entry point - Enable NETIF_F_HW_TC feature for QBV-capable hardware - Add ENST register offsets to queue configuration The implementation validates timing constraints against hardware limits, supports per-queue gate mask configuration, and provides comprehensive logging for debugging and monitoring. Hardware registers are programmed atomically with proper locking to ensure consistent state. Tested on Xilinx Versal platforms with QBV-capable MACB controllers. Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com> ==================== Link: https://patch.msgid.link/20250814071058.3062453-1-vineeth.karumanchi@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-19 12:13:05 +02:00
Vineeth Karumanchi	d739ce4beb	net: macb: Add capability-based QBV detection and Versal support The 'exclude_qbv' bit in the designcfg_debug1 register varies across MACB/GEM IP revisions, making direct probing unreliable for detecting QBV support. This patch introduces a capability-based approach for consistent QBV feature identification across the IP family. Platform support updates: - Establish foundation for QBV detection in TAPRIO implementation - Enable MACB_CAPS_QBV for Xilinx Versal platform configuration - Fix capability line wrapping, ensuring code stays within 80 columns Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com> Link: https://patch.msgid.link/20250814071058.3062453-3-vineeth.karumanchi@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-19 12:13:03 +02:00

1 2 3 4 5 ...

1381993 Commits