linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-20 17:59:03 -04:00

Author	SHA1	Message	Date
Dimitri Daskalakis	19c3a2a81d	selftests: drv-net: rss: Generate unique ports for RSS context tests The RSS ctx tests rely on NFC rules with unique ports to steer packets to the correct ctx. This updates the test to use the new rand_ports() helper to guarantee the ports are unique. Manual testing shows that generating 32 ports with the existing method would result in at least one duplicate 4% of the time. Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Link: https://patch.msgid.link/20260224224659.1507082-3-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:42:02 -08:00
Dimitri Daskalakis	b0249c0d41	selftests: net: py: Add rand_ports helper method Certain tests need a unique set of ports. Successive calls to the existing rand_port method may return a duplicate port, resulting in test flakiness. The new helper keeps sockets open while building a list of ephemeral ports, thus the kernel enforces their uniqueness. Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Link: https://patch.msgid.link/20260224224659.1507082-2-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:42:02 -08:00
Jakub Kicinski	2cd63825c7	Merge branch 'netfilter-updates-for-net-next' Florian Westphal says: ==================== netfilter: updates for net-next including IPVS updates from and via Julian Anastasov. First updates for IPVS. From Julians cover-letter: * Convert the global __ip_vs_mutex to per-net service_mutex and switch the service tables to be per-net, cowork by Jiejian Wu and Dust Li * Convert some code that walks the service lists to use RCU instead of the service_mutex * We used two tables for services (non-fwmark and fwmark), merge them into single svc_table * The list for unavailable destinations (dest_trash) holds dsts and thus dev references causing extra work for the ip_vs_dst_event() dev notifier handler. Change this by dropping the reference when dest is removed and saved into dest_trash. The dest_trash will need more changes to make it light for lookups. TODO. * On new connection we can do multiple lookups for services by trying different fallback options. Add more counters for service types, so that we can avoid unneeded lookups for services. * The no_cport and dropentry counters can be per-net and also we can avoid extra conn lookups Then, a few cleanups for nf_tables: * keep BH enabled during nft_set_rbtree inserts, this is possible because the root lock is now only taken from control plane. * toss a few EXPORT_SYMBOLs from nf_tables; these were historic leftovers from back in the day when e.g. set backends were still residing in their own modules. * remove the register tracking infra from nftables. It was disabled years ago in 5.18 and there are no plans to salvage this work; the idea was good (remove redundant register stores), but there is just one too many pitfalls, and better rule structuring (verdict maps) largely avoids the scenarios where this would have helped. ==================== Link: https://patch.msgid.link/20260224205048.4718-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:29 -08:00
Florian Westphal	6b94d081f8	netfilter: nf_tables: remove register tracking infrastructure This facility was disabled in commit `9e539c5b6d` ("netfilter: nf_tables: disable expression reduction infra"), because not all nft_exprs guarantee they will update the destination register: some may set NFT_BREAK instead to cancel evaluation of the rule. This has been dead code ever since. There are no plans to salvage this at this time, so remove this. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-10-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:26 -08:00
Florian Westphal	b6461103e0	netfilter: nf_tables: drop obsolete EXPORT_SYMBOLs These are no longer required, calling objects are nowadays baked into nf_tables.ko itself. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-9-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:26 -08:00
Florian Westphal	3aea466a43	netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock As of commit `7e43e0a114` ("netfilter: nft_set_rbtree: translate rbtree to array for binary search") the lock is only taken from control plane, no need to disable BH anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-8-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:26 -08:00
Julian Anastasov	09b71fb459	ipvs: no_cport and dropentry counters can be per-net Change the no_cport counters to be per-net and address family. This should reduce the extra conn lookups done during present NO_CPORT connections. By changing from global to per-net dropentry counters, one net will not affect the drop rate of another net. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-7-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:26 -08:00
Julian Anastasov	c59bd9e62e	ipvs: use more counters to avoid service lookups When new connection is created we can lookup for services multiple times to support fallback options. We already have some counters to skip specific lookups because it costs CPU cycles for hash calculation, etc. Add more counters for fwmark/non-fwmark services (fwm_services and nonfwm_services) and make all counters per address family. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-6-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:26 -08:00
Julian Anastasov	40fb72209f	ipvs: do not keep dest_dst after dest is removed Before now dest->dest_dst is not released when server is moved into dest_trash list after removal. As result, we can keep dst/dev references for long time without actively using them. It is better to avoid walking the dest_trash list when ip_vs_dst_event() receives dev events. So, make sure we do not hold dev references in dest_trash list. As packets can be flying while server is being removed, check the IP_VS_DEST_F_AVAILABLE flag in slow path to ensure we do not save new dev references to removed servers. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-5-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:26 -08:00
Julian Anastasov	b24ae1a387	ipvs: use single svc table fwmark based services and non-fwmark based services can be hashed in same service table. This reduces the burden of working with two tables. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-4-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:25 -08:00
Julian Anastasov	3de0ec2873	ipvs: some service readers can use RCU Some places walk the services under mutex but they can just use RCU: * ip_vs_dst_event() uses ip_vs_forget_dev() which uses its own lock to modify dest * ip_vs_genl_dump_services(): ip_vs_genl_fill_service() just fills skb * ip_vs_genl_parse_service(): move RCU lock to callers ip_vs_genl_set_cmd(), ip_vs_genl_dump_dests() and ip_vs_genl_get_cmd() * ip_vs_genl_dump_dests(): just fill skb Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-3-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:25 -08:00
Jiejian Wu	74455a5b43	ipvs: make ip_vs_svc_table and ip_vs_svc_fwm_table per netns Current ipvs uses one global mutex "__ip_vs_mutex" to keep the global "ip_vs_svc_table" and "ip_vs_svc_fwm_table" safe. But when there are tens of thousands of services from different netns in the table, it takes a long time to look up the table, for example, using "ipvsadm -ln" from different netns simultaneously. We make "ip_vs_svc_table" and "ip_vs_svc_fwm_table" per netns, and we add "service_mutex" per netns to keep these two tables safe instead of the global "__ip_vs_mutex" in current version. To this end, looking up services from different netns simultaneously will not get stuck, shortening the time consumption in large-scale deployment. It can be reproduced using the simple scripts below. init.sh: #!/bin/bash for((i=1;i<=4;i++));do ip netns add ns$i ip netns exec ns$i ip link set dev lo up ip netns exec ns$i sh add-services.sh done add-services.sh: #!/bin/bash for((i=0;i<30000;i++)); do ipvsadm -A -t 10.10.10.10:$((80+$i)) -s rr done runtest.sh: #!/bin/bash for((i=1;i<4;i++));do ip netns exec ns$i ipvsadm -ln > /dev/null & done ip netns exec ns4 ipvsadm -ln > /dev/null Run "sh init.sh" to initiate the network environment. Then run "time ./runtest.sh" to evaluate the time consumption. Our testbed is a 4-core Intel Xeon ECS. The result of the original version is around 8 seconds, while the result of the modified version is only 0.8 seconds. Signed-off-by: Jiejian Wu <jiejian@linux.alibaba.com> Co-developed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-2-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:36:25 -08:00
Eric Woudstra	7717fbb140	net: pppoe: avoid zero-length arrays in struct pppoe_hdr Jakub Kicinski reported following issue in upcoming patches: W=1 C=1 GCC build gives us: net/bridge/netfilter/nf_conntrack_bridge.c: note: in included file (through ../include/linux/if_pppox.h, ../include/uapi/linux/netfilter_bridge.h, ../include/linux/netfilter_bridge.h): include/uapi/linux/if_pppox.h: 153:29: warning: array of flexible structures sparse doesn't like that hdr has a zero-length array which overlaps proto. The kernel code doesn't currently need those arrays. PPPoE connection is functional after applying this patch. Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Eric Woudstra <ericwouds@gmail.com> Link: https://patch.msgid.link/20260224155030.106918-1-ericwouds@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:25:08 -08:00
Abhilekh Deka	8debe7a223	net/ibmveth: fix comment typos in ibmveth.c Correct spelling mistakes in comments: - Fix misspelling of gro_receive - Fix misspelling of Partition Signed-off-by: Abhilekh Deka <abhindeka@gmail.com> Reviewed-by: Dave Marquardt <davemarq@linux.ibm.com> Link: https://patch.msgid.link/20260224153601.17534-1-abhindeka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:23:04 -08:00
Nicolai Buchwitz	45ce4b753a	net: cadence: macb: add ethtool nway_reset support Wire phy_ethtool_nway_reset() as the .nway_reset ethtool operation, allowing userspace to restart PHY autonegotiation via 'ethtool -r'. Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Link: https://patch.msgid.link/20260224145723.49450-1-nb@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:22:32 -08:00
Jakub Kicinski	23a611b9b3	Merge branch 'net-stmmac-fix-interrupt-coalescing' Russell King says: ==================== net: stmmac: fix interrupt coalescing While cleaning up the descriptor handling, I noticed that the accounting of transmit "packets" for interrupt coalescing was buggy in that it takes the difference of the two indexes into the circular list of transmit discriptors and merely subtracts one from the other without regard for the indexes wrapping. This can result in a negative number or very large positive number which would have the effect of either reducing tx_q->tx_count_frames or making that very large. Either way, the result is numerically incorrect, and could trigger interrupts or not trigger interrupts when required. This series converts stmmac to use the circ_buf helpers, and then fixes this problem. ==================== Link: https://patch.msgid.link/aZ1o2dmfpeiubCik@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:12:36 -08:00
Russell King (Oracle)	dd53a0e859	net: stmmac: fix transmit interrupt coalescing The accounting for transmit frames does not count the descriptors correctly. It uses: tx_packets = (tx_q->cur_tx + 1) - first_tx; however, these are indexes into a circular buffer, so cur_tx can be less than first_tx, and when that happens, tx_packets becomes a very large unsigned integer. When this is added to tx_q->tx_count_frames, it has the effect of reducing the count of frames, possibly causing it to also wrap to a very large unsigned integer. Fix this by using CIRC_CNT() to calculate the number of descriptors used. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/E1vuoIl-0000000Aouz-0ttb@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:12:34 -08:00
Russell King (Oracle)	819101c3c1	net: stmmac: use circ_buf helpers for descriptors The stmmac descriptor queues are circular buffers, operated as far as the hardware is concerned as either a ring, or a chain that loops back on itself. From the software perspective, it forms a circular buffer. We have a few places which calculate the number of in-use and free entries in these circular buffers, for which we have macros for. Use CIRC_CNT() and CIRC_SPACE() as appropriate to calculate these values. Validating, for stmmac_tx_avail(), which uses CIRC_SPACE(): dirty_tx = 1, cur_tx = 0 -> 0 dirty_tx = 0, cur_tx = 0 -> dma_tx_size - 1 dirty_tx = 0, cur_tx = 1 -> dma_tx_size - 2 dirty_tx passed as end, reduced by one. cur_tx passed as start. Output on sane computers is identical. For stmmac_rx_dirty(), which uses CIRC_CNT(): dirty_rx = 1, cur_rx = 0 -> dma_rx_size - 1 dirty_rx = 0, cur_rx = 0 -> 0 dirty_rx = 0, cur_rx = 1 -> 1 dirty_rx passed as start, cur_rx passed as end. Output is identical. Same validation performed on the is_last_segment calculation, which also gets converted to CIRC_CNT(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/E1vuoIg-0000000Aout-0LyS@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:12:34 -08:00
kexinsun	51432958b5	rds: update outdated comment The function rds_send_reset() was subsumed by rds_send_path_reset() by commit `d769ef81d5` ("RDS: Update rds_conn_shutdown to work with rds_conn_path"). Update the comment accordingly. Signed-off-by: kexinsun <kexinsun@smail.nju.edu.cn> Link: https://patch.msgid.link/20260224020720.1174-1-kexinsun@smail.nju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:03:55 -08:00
Rosen Penev	dc2a1facbd	net: fs_enet: allow nvmem to override MAC address NVMEM typically loads after the ethernet driver and of_get_ethdev_address returns -EPROBE_DEFER. return in such a case to allow NVMEM to work. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20260224014607.353378-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 19:01:24 -08:00
Jakub Kicinski	6698d6ce6a	selftests: hw-net: tso: set a TCP window clamp to avoid spurious drops The TSO test wants to make sure that there isn't a lot of retransmits, because that could indicate that device has a buggy TSO implementation. On debug kernels, however, we're likely to see significant packet loss because we simply overwhelm the receiver. In a QEMU loop with virtio devices we see ~10% false positive rate with occasional run hitting the threshold of 25% packet loss. Since we're only sending 4MB of data, set a TCP_WINDOW_CLAMP to 200k. This seems to make virtio happy while having little impact since we're primarily interested in testing the sender, and the test doesn't currently enable BIG TCP. Running socat over virtio loop for 2 sec on a debug kernel shows: TcpOutSegs 27327 0.0 TcpRetransSegs 83 0.0 TcpOutSegs 30012 0.0 TcpRetransSegs 80 0.0 TcpOutSegs 28767 0.0 TcpRetransSegs 77 0.0 But with the clamp the 3 attempts show no retransmit: TcpOutSegs 31537 0.0 TcpRetransSegs 0 0.0 TcpOutSegs 30323 0.0 TcpRetransSegs 0 0.0 TcpOutSegs 28700 0.0 TcpRetransSegs 0 0.0 Since we expect no receiver-related drops now we can significantly increase test's sensitivity to drops. All the testing we do in NIPA uses cubic. Link: https://patch.msgid.link/20260223204030.4142884-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 18:59:57 -08:00
Russell King (Oracle)	8215d7cbfb	net: stmmac: fix EEE supportable interfaces According to the dwmac v3.74a databook, only MII, GMII and RGMII dwmac interface modes are supported for EEE. Restrict EEE to these modes, or the modules supported by a PCS other than the GMAC's integrated PCS. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vuUsD-0000000Afci-0XxO@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-25 18:51:19 -08:00
Rosen Penev	d2adf01780	net: freescale: ucc_geth: call of_node_put once Move it up to avoid placing it in both the error and success paths. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260224014141.352642-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 18:26:30 -08:00
Jakub Kicinski	7235555e9a	Merge branch 'selftests-net-py-improve-bkg-error-reporting' Jakub Kicinski says: ==================== selftests: net: py: improve bkg() error reporting bkg() is a helper for running commands in the background. When init or body of a with() block fails check if the bkg() process already exited and report its status (including stdout/ /stderr). This significantly improves debugability. ==================== Link: https://patch.msgid.link/20260223202633.4126087-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 18:25:32 -08:00
Jakub Kicinski	6e4dff2002	selftests: net: py: add cmd info for ksft_wait failure Gal recently complained: When [ksft_wait failure] happens, the test fails with a cryptic message: # Exception\| Exception: Did not receive ready message Let's try to include the stdout/stderr of the command we tried to start. E.g. for cmd("false", ksft_wait=True): # Exception\| lib.py.utils.CmdInitFailure: Did not receive ready message # Exception\| CMD: false # Exception\| EXIT: 1 We need to factor out _process_terminate() otherwise the exit path may try to write to already disconnected self.ksft_term_fd. Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260223202633.4126087-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 18:25:29 -08:00
Jakub Kicinski	04abab18e1	selftests: net: py: use repr(cmd) for failure exceptions Reuse repr(cmd) instead of manually formatting a similar string. Before: # Exception\| lib.py.utils.CmdExitFailure: Command failed: false # Exception\| STDOUT: b'' # Exception\| STDERR: b'' After: # Exception\| lib.py.utils.CmdExitFailure: Command failed # Exception\| CMD: false # Exception\| EXIT: 1 Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260223202633.4126087-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 18:25:29 -08:00
Jakub Kicinski	d99aa5912c	selftests: net: py: avoid masking exceptions in bkg() failures bkg() failures are currently quite hard to debug and spot. Often we have code along the lines of: with bkg("./cmd_rx_something -p PORT"): wait_port_listen(PORT) cmd("./cmd_tx_something", host=remote) When wait_port_listen() fails we don't get to see the exit status of bkg(). Even tho very often it's a failure in the bkg() command that's actually to blame. Try not to interfere with the bkg() command error checking. With: with bkg("false", exit_wait=True): time.sleep(0.01) # let the 'false' cmd run raise Exception("bla") Before: .. stack trace .. # Exception\| Exception: bla After: .. stack trace .. # Exception\| Exception: bla # Exception\| # Exception\| During handling of the above exception, another exception occurred: .. stack trace .. # Exception\| lib.py.utils.CmdExitFailure: Command failed: false # Exception\| STDOUT: b'' # Exception\| STDERR: b'' Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260223202633.4126087-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 18:25:29 -08:00
Jakub Kicinski	6c32b07650	eth: bnxt: rename ring_err_stats -> ring_drv_stats We recently added GRO stats to bnxt, which are maintained by the driver. Having "err" in the name of the struct for ring stats no longer makes sense (as pointed out by Michael, see Link). Rename them to "drv" stats, as these are all maintained by the driver (even if partially based on info from descriptors). Michael suggested calling these misc, happy to go back to that. IMHO "drv" is a bit more meaningful that "misc". Pure rename using sed, no functional changes. Link: https://lore.kernel.org/CACKFLimgibJ0qkM1AacZVh8MKKy-pE_AAc4KPKZ7GUqebmXW9A@mail.gmail.com Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260223203702.4137801-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 18:24:46 -08:00
Kuniyuki Iwashima	fc1f97929a	bonding: Optimise is_netpoll_tx_blocked(). bond_start_xmit() spends some cycles in is_netpoll_tx_blocked(): if (unlikely(is_netpoll_tx_blocked(dev))) return NETDEV_TX_BUSY; because of the "pushf;pop reg" sequence (aka irqs_disabled()). Let's swap the conditions in is_netpoll_tx_blocked() and convert netpoll_block_tx to a static key. Before: 1.23 │ mov %gs:0x28,%rax 1.24 │ mov %rax,0x18(%rsp) 29.45 │ pushfq 0.50 │ pop %rax 0.47 │ test $0x200,%eax │ ↓ je 1b4 0.49 │ 32: lea 0x980(%rsi),%rbx After: 0.72 │ mov %gs:0x28,%rax 0.81 │ mov %rax,0x18(%rsp) 0.82 │ nop 2.77 │ 2a: lea 0x980(%rsi),%rbx Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260223230749.2376145-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 18:13:38 -08:00
Eric Dumazet	64db5933c7	icmp: increase net.ipv4.icmp_msgs_{per_sec,burst} These sysctls were added in `4cdf507d54` ("icmp: add a global rate limitation") and their default values might be too small. Some network tools send probes to closed UDP ports from many hosts to estimate proportion of packet drops on a particular target. This patch sets both sysctls to 10000. Note the per-peer rate-limit (as described in RFC 4443 2.4 (f)) intent is still enforced. This also increases security, see `b38e7819ca` ("icmp: randomize the global rate limiter") for reference. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260223161742.929830-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:50:12 -08:00
Eric Dumazet	539a6cf084	tcp: move inet6_csk_update_pmtu() to tcp_ipv6.c This function is only called from tcp_v6_mtu_reduced() and can be (auto)inlined by the compiler. Note that inet6_csk_route_socket() is no longer (auto)inlined, which is a good thing as it is slow path. $ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1 add/remove: 0/2 grow/shrink: 2/0 up/down: 93/-129 (-36) Function old new delta tcp_v6_mtu_reduced 139 228 +89 inet6_csk_route_socket 486 490 +4 __pfx_inet6_csk_update_pmtu 16 - -16 inet6_csk_update_pmtu 113 - -113 Total: Before=25076512, After=25076476, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260223153047.886683-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:47:27 -08:00
Eric Dumazet	fca59a2dd0	tcp: reduce calls to tcp_schedule_loss_probe() For RPC workloads, we alternate tcp_schedule_loss_probe() calls from output path and from input path, with tp->packets_out value oscillating between !zero and zero, leading to poor branch prediction. Move tp->packets_out check from tcp_schedule_loss_probe() to tcp_set_xmit_timer(). We avoid one call to tcp_schedule_loss_probe() from tcp_ack() path for typical RPC workloads, while improving branch prediction. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260223113501.4070245-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:44:33 -08:00
Jakub Kicinski	a09eb622f3	Merge branch 'net-stmmac-qcom-ethqos-cleanups-and-re-organise-serdes-handling' Russell King says: ==================== net: stmmac: qcom-ethqos: cleanups and re-organise SerDes handling As the last series had issues with stability, I've changed the approach in this series to concentrate on keeping much of the SerDes related code within the qcom-ethqos driver rather than trying to move it out at this stage. This means it should be possible to bisect these patches and pinpoint exactly the code movement that causes any instability. This series starts with various cleanups to qcom-ethqos (the first four patches) before beginning to move code, passing phylink's phy interface (which will change) to the fix_mac_speed() method, and then using that to configure the serdes and inband setting before moving the SerDes code. This patch set has been tested. ==================== Link: https://patch.msgid.link/aZwfAFJQcp9f0niI@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:43:25 -08:00
Russell King (Oracle)	9192320a65	net: stmmac: qcom-ethqos: convert to set_clk_tx_rate() method Set the RGMII link clock using the set_clk_tx_rate() method rather than coding it into the .fix_mac_speed() method. This simplifies ethqos's ethqos_fix_mac_speed(). Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vuSLF-0000000ASci-42kh@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:43:23 -08:00
Russell King (Oracle)	fb42f19e67	net: stmmac: qcom-ethqos: move SerDes speed configuration Move the SerDes speed configuration to phylink's .mac_finish() stage so that the SerDes is appropriately configured for the interface mode prior to the link coming up. Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vuSLA-0000000AScc-3RFf@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:43:23 -08:00
Russell King (Oracle)	b8ab32315e	net: stmmac: qcom-ethqos: use phy interface mode for inband qcom-ethqos currently forces inband to be enabled for the Cisco SGMII speeds (1G, 100M and 10M) but not for 2500BASE-X (2.5G). Rather than using the speed to determine the forced inband state, use phylink's PHY interface mode which will switch between SGMII for the 10M, 100M and 1G speeds, and 2500BASE-X for 2.5G. Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vuSL5-0000000AScX-2wuM@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:43:23 -08:00
Russell King (Oracle)	b560938163	net: stmmac: qcom-ethqos: pass phy interface mode to configs Pass the current phylink phy interface mode to the RGMII and "SGMII" configuration functions. Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vuSL0-0000000AScM-2TN0@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:43:22 -08:00
Russell King (Oracle)	cd0aa65153	net: stmmac: pass interface mode into fix_mac_speed() method Pass the current interface mode reported by phylink into the fix_mac_speed() method. This will be used by qcom-ethqos for its "SGMII" configuration. Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vuSKv-0000000AScG-1zv6@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:43:22 -08:00
Russell King (Oracle)	834c72ca30	net: stmmac: qcom-ethqos: move loopback disable to .mac_finish() Loopback is enabled to allow the dwmac soft reset to succeed. This is enabled when clocks are enabled in ethqos_clks_config(), which happens at driver probe and runtime PM resume - e.g. when the network device is administratively brought up. Currently, the loopback is disabled when the link comes up (via .mac_link_up() calling this driver's .fix_mac_speed().) Move the qcom_ethqos_set_sgmii_loopback() call which disables loopback from ethqos_fix_mac_speed() into ethqos' SerDes specific .mac_finish() method so that loopback is disabled a little earlier after reset has completed, and dwmac setup has completed. Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vuSKq-0000000AScA-1Wh3@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:43:22 -08:00
Russell King (Oracle)	3baa791f19	net: stmmac: qcom-ethqos: move qcom_ethqos_set_sgmii_loopback() up ethqos_set_func_clk_en() configures both SGMII loopback and the RGMII functional clock setting. qcom_ethqos_set_sgmii_loopback() is only called from within ethqos_set_func_clk_en(), and checks for PHY_INTERFACE_MODE_2500BASEX. Move qcom_ethqos_set_sgmii_loopback() to the callers of ethqos_set_func_clk_en() except for ethqos_configure_rgmii() where we know that ethqos->phy_mode will not be PHY_INTERFACE_MODE_2500BASEX. Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vuSKl-0000000ASc1-18ka@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:43:22 -08:00
Russell King (Oracle)	649a00c392	net: stmmac: qcom-ethqos: change ethqos_configure() to return void The ethqos_configure() family of functions always return zero, and the return value is never checked. Change the int return type to void. Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vuSKg-0000000ASbv-0iWL@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:43:22 -08:00
Russell King (Oracle)	e6f43a41ba	net: stmmac: qcom-ethqos: remove register field value obfuscations Convert the register field values to something more human readable. For example, using (BIT(29) \| BIT(27)) to update a register field that consists of bits 29:27 is an obfuscated way of writing decimal 5 for this field. The comment above needs to explain that this value is 5. Worse still is BIT(12) \| GENMASK(9, 8), which is used to hide the decimal value 19 for the bitfield 16:8. Fix these, and a few others by using FIELD_PREP(). While it means we have bare numeric constants, this is more preferable than having the obfuscation. Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vuSKa-0000000ASbo-2zQg@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:43:22 -08:00
Russell King (Oracle)	ebfc2be12e	net: stmmac: qcom-ethqos: rename "por" members to "rgmii_por" Rename the "por" and "num_por" members to indicate that they are for RGMII mode only as ethqos_configure_rgmii() is the only place that the values are programmed into the registers. Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vuSKV-0000000ASbg-28JK@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:43:21 -08:00
Jakub Kicinski	583706230e	Merge branch 'net-ethernet-enic-add-vic-ids-and-link-modes' Satish Kharat says: ==================== eth: enic: add VIC ids and link modes Add VIC subsystem ids and their supported/advertised media types so ethtool reflects the hardware capabilities for the VIC variants. ==================== Link: https://patch.msgid.link/20260223-enic-cscwi36355-v2-0-63488194a974@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:20:13 -08:00
Satish Kharat	426f1f5b87	net:ethernet:enic: map ethtool link modes by VIC type Report supported media types based on the VIC subsystem ID so ethtool reflects the hardware capabilities. Signed-off-by: Satish Kharat <satishkh@cisco.com> Link: https://patch.msgid.link/20260223-enic-cscwi36355-v2-2-63488194a974@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:20:11 -08:00
Satish Kharat	472e079f8c	net:ethernet:enic: add VIC subsystem ids Add VIC subsystem id for 12xx, 13xx, 14xx and 15xxx series Signed-off-by: Satish Kharat <satishkh@cisco.com> Link: https://patch.msgid.link/20260223-enic-cscwi36355-v2-1-63488194a974@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:20:11 -08:00
Gabriel Goller	3197cce4d4	docs: net: document neigh gc_stale_time sysctl Add missing documentation for a neighbor table garbage collector sysctl parameter in ip-sysctl.rst: neigh/default/gc_stale_time: controls how long an unused neighbor entry is kept before becoming eligible for garbage collection (default: 60 seconds) Signed-off-by: Gabriel Goller <g.goller@proxmox.com> Link: https://patch.msgid.link/20260223101257.47563-1-g.goller@proxmox.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:17:06 -08:00
Jakub Kicinski	54ef3e6bbe	Merge branch 'tcp-rework-tcp_v-4-6-_send_check' Eric Dumazet says: ==================== tcp: rework tcp_v{4,6}_send_check() tcp_v{4,6}_send_check() are only called from __tcp_transmit_skb() They currently are in different files (tcp_ipv4.c and tcp_ipv6.c) thus out of line. This series move them close to their caller so that compiler can inline them. For all patches in the series: $ scripts/bloat-o-meter -t vmlinux.0 vmlinux.3 add/remove: 0/2 grow/shrink: 1/3 up/down: 102/-178 (-76) Function old new delta __tcp_transmit_skb 3321 3423 +102 tcp_v4_send_check 136 132 -4 __tcp_v4_send_check 130 121 -9 mptcp_subflow_init 777 763 -14 __pfx_tcp_v6_send_check 16 - -16 tcp_v6_send_check 135 - -135 Total: Before=25143100, After=25143024, chg -0.00% ==================== Link: https://patch.msgid.link/20260223100729.3761597-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:16:19 -08:00
Eric Dumazet	fcd3d039fa	tcp: make tcp_v{4,6}_send_check() static tcp_v{4,6}_send_check() are only called from tcp_output.c and should be made static so that the compiler does not need to put an out of line copy of them. Remove (struct inet_connection_sock_af_ops) send_check field and use instead @net_header_len. Move @net_header_len close to @queue_xmit for data locality as both are used in TCP tx fast path. $ scripts/bloat-o-meter -t vmlinux.2 vmlinux.3 add/remove: 0/2 grow/shrink: 0/3 up/down: 0/-172 (-172) Function old new delta __tcp_transmit_skb 3426 3423 -3 tcp_v4_send_check 136 132 -4 mptcp_subflow_init 777 763 -14 __pfx_tcp_v6_send_check 16 - -16 tcp_v6_send_check 135 - -135 Total: Before=25143196, After=25143024, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260223100729.3761597-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:16:09 -08:00
Eric Dumazet	255688652b	tcp: move tcp_v6_send_check() to tcp_output.c Move tcp_v6_send_check() so that __tcp_transmit_skb() can inline it. $ scripts/bloat-o-meter -t vmlinux.1 vmlinux.2 add/remove: 0/0 grow/shrink: 1/0 up/down: 105/0 (105) Function old new delta __tcp_transmit_skb 3321 3426 +105 Total: Before=25143091, After=25143196, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260223100729.3761597-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-24 17:16:09 -08:00

1 2 3 4 5 ...

1426259 Commits