tcp_v{4,6}_send_check() are only called from tcp_output.c
and should be made static so that the compiler does not need
to put an out of line copy of them.
Remove (struct inet_connection_sock_af_ops) send_check field
and use instead @net_header_len.
Move @net_header_len close to @queue_xmit for data locality
as both are used in TCP tx fast path.
$ scripts/bloat-o-meter -t vmlinux.2 vmlinux.3
add/remove: 0/2 grow/shrink: 0/3 up/down: 0/-172 (-172)
Function old new delta
__tcp_transmit_skb 3426 3423 -3
tcp_v4_send_check 136 132 -4
mptcp_subflow_init 777 763 -14
__pfx_tcp_v6_send_check 16 - -16
tcp_v6_send_check 135 - -135
Total: Before=25143196, After=25143024, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223100729.3761597-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Inline __tcp_v4_send_check(), like __tcp_v6_send_check().
Move tcp_v4_send_check() to tcp_output.c close to
its fast path caller (__tcp_transmit_skb()).
Note __tcp_v4_send_check() is still out-of-line for tcp4_gso_segment()
because it is called in an unlikely() section.
$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-9 (-9)
Function old new delta
__tcp_v4_send_check 130 121 -9
Total: Before=25143100, After=25143091, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223100729.3761597-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This function has a single caller in net/ipv6/udp.c.
Move it there so that the compiler can decide to (auto)inline
it if he prefers to. IBT glue is removed anyway.
With clang, we can see it was able to inline it and also
inlined one other helper at the same time.
UDPLITE removal will also help.
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/2 grow/shrink: 1/0 up/down: 840/-785 (55)
Function old new delta
__udp6_lib_rcv 1247 2087 +840
__pfx_udp6_csum_init 16 - -16
udp6_csum_init 769 - -769
Total: Before=25074399, After=25074454, chg +0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223093445.3696368-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit 6511882cdd ("mptcp: allocate fwd memory separately
on the rx and tx path") __lock_sock() can be static again.
Make sure __lock_sock() is not inlined, so that lock_sock_nested()
no longer needs a stack canary.
Add a noinline attribute on lock_sock_nested() so that calls
to lock_sock() from net/core/sock.c are not inlined,
none of them are fast path to deserve that:
- sockopt_lock_sock()
- sock_set_reuseport()
- sock_set_reuseaddr()
- sock_set_mark()
- sock_set_keepalive()
- sock_no_linger()
- sock_bindtoindex()
- sk_wait_data()
- sock_set_rcvbuf()
$ scripts/bloat-o-meter -t vmlinux.old vmlinux
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-312 (-312)
Function old new delta
__lock_sock 192 188 -4
__lock_sock_fast 239 86 -153
lock_sock_nested 227 72 -155
Total: Before=24888707, After=24888395, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223092716.3673939-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter.
Current release - new code bugs:
- net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT
- eth: mlx5e: XSK, Fix unintended ICOSQ change
- phy_port: correctly recompute the port's linkmodes
- vsock: prevent child netns mode switch from local to global
- couple of kconfig fixes for new symbols
Previous releases - regressions:
- nfc: nci: fix false-positive parameter validation for packet data
- net: do not delay zero-copy skbs in skb_attempt_defer_free()
Previous releases - always broken:
- mctp: ensure our nlmsg responses to user space are zero-initialised
- ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()
- fixes for ICMP rate limiting
Misc:
- intel: fix PCI device ID conflict between i40e and ipw2200"
* tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: nfc: nci: Fix parameter validation for packet data
net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
net/mlx5e: Fix deadlocks between devlink and netdev instance locks
net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
net/mlx5: Fix misidentification of write combining CQE during poll loop
net/mlx5e: Fix misidentification of ASO CQE during poll loop
net/mlx5: Fix multiport device check over light SFs
bonding: alb: fix UAF in rlb_arp_recv during bond up/down
bnge: fix reserving resources from FW
eth: fbnic: Advertise supported XDP features.
rds: tcp: fix uninit-value in __inet_bind
net/rds: Fix NULL pointer dereference in rds_tcp_accept_one
octeontx2-af: Fix default entries mcam entry action
net/mlx5e: XSK, Fix unintended ICOSQ change
ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero
ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero
ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()
inet: move icmp_global_{credit,stamp} to a separate cache line
icmp: prevent possible overflow in icmp_global_allow()
selftests/net: packetdrill: add ipv4-mapped-ipv6 tests
...
Pull bpf fixes from Alexei Starovoitov:
- Fix invalid write loop logic in libbpf's bpf_linker__add_buf() (Amery
Hung)
- Fix a potential use-after-free of BTF object (Anton Protopopov)
- Add feature detection to libbpf and avoid moving arena global
variables on older kernels (Emil Tsalapatis)
- Remove extern declaration of bpf_stream_vprintk() from libbpf headers
(Ihor Solodrai)
- Fix truncated netlink dumps in bpftool (Jakub Kicinski)
- Fix map_kptr grace period wait in bpf selftests (Kumar Kartikeya
Dwivedi)
- Remove hexdump dependency while building bpf selftests (Matthieu
Baerts)
- Complete fsession support in BPF trampolines on riscv (Menglong Dong)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Remove hexdump dependency
libbpf: Remove extern declaration of bpf_stream_vprintk()
selftests/bpf: Use vmlinux.h in test_xdp_meta
bpftool: Fix truncated netlink dumps
libbpf: Delay feature gate check until object prepare time
libbpf: Do not use PROG_TYPE_TRACEPOINT program for feature gating
bpf: Add a map/btf from a fd array more consistently
selftests/bpf: Fix map_kptr grace period wait
selftests/bpf: enable fsession_test on riscv64
selftests/bpf: Adjust selftest due to function rename
bpf, riscv: add fsession support for trampolines
bpf: Fix a potential use-after-free of BTF object
bpf, riscv: introduce emit_store_stack_imm64() for trampoline
libbpf: Fix invalid write loop logic in bpf_linker__add_buf()
libbpf: Add gating for arena globals relocation feature
The max number of channels is always an unsigned int, use the correct
type to fix compilation errors done with strict type checking, e.g.:
error: call to ‘__compiletime_assert_1110’ declared with attribute
error: min(mlx5e_get_devlink_param_num_doorbells(mdev),
mlx5e_get_max_num_channels(mdev)) signedness error
Fixes: 74a8dadac1 ("net/mlx5e: Preparations for supporting larger number of channels")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the mentioned "Fixes" commit, various work tasks triggering devlink
health reporter recovery were switched to use netdev_trylock to protect
against concurrent tear down of the channels being recovered. But this
had the side effect of introducing potential deadlocks because of
incorrect lock ordering.
The correct lock order is described by the init flow:
probe_one -> mlx5_init_one (acquires devlink lock)
-> mlx5_init_one_devl_locked -> mlx5_register_device
-> mlx5_rescan_drivers_locked -...-> mlx5e_probe -> _mlx5e_probe
-> register_netdev (acquires rtnl lock)
-> register_netdevice (acquires netdev lock)
=> devlink lock -> rtnl lock -> netdev lock.
But in the current recovery flow, the order is wrong:
mlx5e_tx_err_cqe_work (acquires netdev lock)
-> mlx5e_reporter_tx_err_cqe -> mlx5e_health_report
-> devlink_health_report (acquires devlink lock => boom!)
-> devlink_health_reporter_recover
-> mlx5e_tx_reporter_recover -> mlx5e_tx_reporter_recover_from_ctx
-> mlx5e_tx_reporter_err_cqe_recover
The same pattern exists in:
mlx5e_reporter_rx_timeout
mlx5e_reporter_tx_ptpsq_unhealthy
mlx5e_reporter_tx_timeout
Fix these by moving the netdev_trylock calls from the work handlers
lower in the call stack, in the respective recovery functions, where
they are actually necessary.
Fixes: 8f7b00307b ("net/mlx5e: Convert mlx5 netdevs to instance locking")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The macsec_aso_set_arm_event function calls mlx5_aso_poll_cq once
without a retry loop. If the CQE is not immediately available after
posting the WQE, the function fails unnecessarily.
Use read_poll_timeout() to poll 3-10 usecs for CQE, consistent with
other ASO polling code paths in the driver.
Fixes: 739cfa3451 ("net/mlx5: Make ASO poll CQ usable in atomic context")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The write combining completion poll loop uses usleep_range() which can
sleep much longer than requested due to scheduler latency. Under load,
we witnessed a 20ms+ delay until the process was rescheduled, causing
the jiffies based timeout to expire while the thread is sleeping.
The original do-while loop structure (poll, sleep, check timeout) would
exit without a final poll when waking after timeout, missing a CQE that
arrived during sleep.
Instead of the open-coded while loop, use the kernel's poll_timeout_us()
which always performs an additional check after the sleep expiration,
and is less error-prone.
Note: poll_timeout_us() doesn't accept a sleep range, by passing 10
sleep_us the sleep range effectively changes from 2-10 to 3-10 usecs.
Fixes: d98995b4bf ("net/mlx5: Reimplement write combining test")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ASO completion poll loop uses usleep_range() which can sleep much
longer than requested due to scheduler latency. Under load, we witnessed
a 20ms+ delay until the process was rescheduled, causing the jiffies
based timeout to expire while the thread is sleeping.
The original do-while loop structure (poll, sleep, check timeout) would
exit without a final poll when waking after timeout, missing a CQE that
arrived during sleep.
Instead of the open-coded while loop, use the kernel's
read_poll_timeout() which always performs an additional check after the
sleep expiration, and is less error-prone.
Note: read_poll_timeout() doesn't accept a sleep range, by passing 10
sleep_us the sleep range effectively changes from 2-10 to 3-10 usecs.
Fixes: 739cfa3451 ("net/mlx5: Make ASO poll CQ usable in atomic context")
Fixes: 7e3fce82d9 ("net/mlx5e: Overcome slow response for first macsec ASO WQE")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Driver is using num_vhca_ports capability to distinguish between
multiport master device and multiport slave device. num_vhca_ports is a
capability the driver sets according to the MAX num_vhca_ports
capability reported by FW. On the other hand, light SFs doesn't set the
above capbility.
This leads to wrong results whenever light SFs is checking whether he is
a multiport master or slave.
Therefore, use the MAX capability to distinguish between master and
slave devices.
Fixes: e71383fb9c ("net/mlx5: Light probe local SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Save a local pointer to new_sock->sk and hold a reference before
installing callbacks in rds_tcp_accept_one. After
rds_tcp_set_callbacks() or rds_tcp_reset_callbacks(), tc->t_sock is
set to new_sock which may race with the shutdown path. A concurrent
rds_tcp_conn_path_shutdown() may call sock_release(), which sets
new_sock->sk = NULL and may eventually free sk when the refcount
reaches zero.
Subsequent accesses to new_sock->sk->sk_state would dereference NULL,
causing the crash. The fix saves a local sk pointer before callbacks
are installed so that sk_state can be accessed safely even after
new_sock->sk is nulled, and uses sock_hold()/sock_put() to ensure
sk itself remains valid for the duration.
Fixes: 826c1004d4 ("net/rds: rds_tcp_conn_path_shutdown must not discard messages")
Reported-by: syzbot+96046021045ffe6d7709@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=96046021045ffe6d7709
Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260216222643.2391390-1-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull more non-MM updates from Andrew Morton:
- "two fixes in kho_populate()" fixes a couple of not-major issues in
the kexec handover code (Ran Xiaokai)
- misc singletons
* tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib/group_cpus: handle const qualifier from clusters allocation type
kho: remove unnecessary WARN_ON(err) in kho_populate()
kho: fix missing early_memunmap() call in kho_populate()
scripts/gdb: implement x86_page_ops in mm.py
objpool: fix the overestimation of object pooling metadata size
selftests/memfd: use IPC semaphore instead of SIGSTOP/SIGCONT
delayacct: fix build regression on accounting tool
Pull more MM updates from Andrew Morton:
- "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
couple of issues in the demotion code - pages were failed demotion
and were finding themselves demoted into disallowed nodes (Bing Jiao)
- "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
mapledtree race and performs a number of cleanups (Liam Howlett)
- "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
them" implements a lot of cleanups following on from the conversion
of the VMA flags into a bitmap (Lorenzo Stoakes)
- "support batch checking of references and unmapping for large folios"
implements batching to greatly improve the performance of reclaiming
clean file-backed large folios (Baolin Wang)
- "selftests/mm: add memory failure selftests" does as claimed (Miaohe
Lin)
* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
mm/page_alloc: clear page->private in free_pages_prepare()
selftests/mm: add memory failure dirty pagecache test
selftests/mm: add memory failure clean pagecache test
selftests/mm: add memory failure anonymous page test
mm: rmap: support batched unmapping for file large folios
arm64: mm: implement the architecture-specific clear_flush_young_ptes()
arm64: mm: support batch clearing of the young flag for large folios
arm64: mm: factor out the address and ptep alignment into a new helper
mm: rmap: support batched checks of the references for large folios
tools/testing/vma: add VMA userland tests for VMA flag functions
tools/testing/vma: separate out vma_internal.h into logical headers
tools/testing/vma: separate VMA userland tests into separate files
mm: make vm_area_desc utilise vma_flags_t only
mm: update all remaining mmap_prepare users to use vma_flags_t
mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
mm: update secretmem to use VMA flags on mmap_prepare
mm: update hugetlbfs to use VMA flags on mmap_prepare
mm: add basic VMA flag operation helper functions
tools: bitmap: add missing bitmap_[subset(), andnot()]
mm: add mk_vma_flags() bitmap flag macro helper
...
Florian Westphal says:
====================
netfilter: updates for net
The following patchset contains Netfilter fixes for *net*:
1) Add missing __rcu annotations to NAT helper hook pointers in Amanda,
FTP, IRC, SNMP and TFTP helpers. From Sun Jian.
2-4):
- Add global spinlock to serialize nft_counter fetch+reset operations.
- Use atomic64_xchg() for nft_quota reset instead of read+subtract pattern.
Note AI review detects a race in this change but it isn't new. The
'racing' bit only exists to prevent constant stream of 'quota expired'
notifications.
- Revert commit_mutex usage in nf_tables reset path, it caused
circular lock dependency. All from Brian Witte.
5) Fix uninitialized l3num value in nf_conntrack_h323 helper.
6) Fix musl libc compatibility in netfilter_bridge.h UAPI header. This
change isn't nice (UAPI headers should not include libc headers), but
as-is musl builds may fail due to redefinition of struct ethhdr.
7) Fix protocol checksum validation in IPVS for IPv6 with extension headers,
from Julian Anastasov.
8) Fix device reference leak in IPVS when netdev goes down. Also from
Julian.
9) Remove WARN_ON_ONCE when accessing forward path array, this can
trigger with sufficiently long forward paths. From Pablo Neira Ayuso.
10) Fix use-after-free in nf_tables_addchain() error path, from Inseo An.
* tag 'nf-26-02-17' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: fix use-after-free in nf_tables_addchain()
net: remove WARN_ON_ONCE when accessing forward path array
ipvs: do not keep dest_dst if dev is going down
ipvs: skip ipv6 extension headers for csum checks
include: uapi: netfilter_bridge.h: Cover for musl libc
netfilter: nf_conntrack_h323: don't pass uninitialised l3num value
netfilter: nf_tables: revert commit_mutex usage in reset path
netfilter: nft_quota: use atomic64_xchg for reset
netfilter: nft_counter: serialize reset with spinlock
netfilter: annotate NAT helper hook pointers with __rcu
====================
Link: https://patch.msgid.link/20260217163233.31455-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
XSK wakeup must use the async ICOSQ (with proper locking), as it is not
guaranteed to run on the same CPU as the channel.
The commit that converted the NAPI trigger path to use the sync ICOSQ
incorrectly applied the same change to XSK, causing XSK wakeups to use
the sync ICOSQ as well. Revert XSK flows to use the async ICOSQ.
XDP program attach/detach triggers channel reopen, while XSK pool
enable/disable can happen on-the-fly via NDOs without reopening
channels. As a result, xsk_pool state cannot be reliably used at
mlx5e_open_channel() time to decide whether an async ICOSQ is needed.
Update the async_icosq_needed logic to depend on the presence of an XDP
program rather than the xsk_pool, ensuring the async ICOSQ is available
when XSK wakeups are enabled.
This fixes multiple issues:
1. Illegal synchronize_rcu() in an RCU read- side critical section via
mlx5e_xsk_wakeup() -> mlx5e_trigger_napi_icosq() ->
synchronize_net(). The stack holds RCU read-lock in xsk_poll().
2. Hitting a NULL pointer dereference in mlx5e_xsk_wakeup():
[] BUG: kernel NULL pointer dereference, address: 0000000000000240
[] #PF: supervisor read access in kernel mode
[] #PF: error_code(0x0000) - not-present page
[] PGD 0 P4D 0
[] Oops: Oops: 0000 [#1] SMP
[] CPU: 0 UID: 0 PID: 2255 Comm: qemu-system-x86 Not tainted 6.19.0-rc5+ #229 PREEMPT(none)
[] Hardware name: [...]
[] RIP: 0010:mlx5e_xsk_wakeup+0x53/0x90 [mlx5_core]
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/all/20260123223916.361295-1-daniel@iogearbox.net/
Fixes: 56aca3e0f7 ("net/mlx5e: Use regular ICOSQ for triggering NAPI")
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Alice Mikityanska <alice.kernel@fastmail.im>
Link: https://patch.msgid.link/20260217074525.1761454-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet says:
====================
icmp: better deal with DDOS
When dealing with death of big UDP servers, admins might want to
increase net.ipv4.icmp_msgs_per_sec and net.ipv4.icmp_msgs_burst
to big values (2,000,000 or more).
They also might need to tune the per-host ratelimit to 1ms or 0ms
in favor of the global rate limit.
This series fixes bugs showing up in all these needs.
====================
Link: https://patch.msgid.link/20260216142832.3834174-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Following part was needed before the blamed commit, because
inet_getpeer_v6() second argument was the prefix.
/* Give more bandwidth to wider prefixes. */
if (rt->rt6i_dst.plen < 128)
tmo >>= ((128 - rt->rt6i_dst.plen)>>5);
Now inet_getpeer_v6() retrieves hosts, we need to remove
@tmo adjustement or wider prefixes likes /24 allow 8x
more ICMP to be sent for a given ratelimit.
As we had this issue for a while, this patch changes net.ipv6.icmp.ratelimit
default value from 1000ms to 100ms to avoid potential regressions.
Also add a READ_ONCE() when reading net->ipv6.sysctl.icmpv6_time.
Fixes: fd0273d793 ("ipv6: Remove external dependency on rt6i_dst and rt6i_src")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260216142832.3834174-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
icmp_global_credit was meant to be changed ~1000 times per second,
but if an admin sets net.ipv4.icmp_msgs_per_sec to a very high value,
icmp_global_credit changes can inflict false sharing to surrounding
fields that are read mostly.
Move icmp_global_credit and icmp_global_stamp to a separate
cacheline aligned group.
Fixes: b056b4cd91 ("icmp: move icmp_global.credit and icmp_global.stamp to per netns storage")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The verification signature header generation requires converting a
binary certificate to a C array. Previously this only worked with xxd,
and a switch to hexdump has been done in commit b640d556a2
("selftests/bpf: Remove xxd util dependency").
hexdump is a more common utility program, yet it might not be installed
by default. When it is not installed, BPF selftests build without
errors, but tests_progs is unusable: it exits with the 255 code and
without any error messages. When manually reproducing the issue, it is
not too hard to find out that the generated verification_cert.h file is
incorrect, but that's time consuming. When digging the BPF selftests
build logs, this line can be seen amongst thousands others, but ignored:
/bin/sh: 2: hexdump: not found
Here, od is used instead of hexdump. od is coming from the coreutils
package, and this new od command produces the same output when using od
from GNU coreutils, uutils, and even busybox. This is more portable, and
it produces a similar results to what was done before with hexdump:
there is an extra comma at the end instead of trailing whitespaces,
but the C code is not impacted.
Fixes: b640d556a2 ("selftests/bpf: Remove xxd util dependency")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20260218-bpf-sft-hexdump-od-v2-1-2f9b3ee5ab86@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull Kbuild fixes from Nathan Chancellor:
- Ensure tools/objtool is cleaned by 'make clean' and 'make mrproper'
- Fix test program for CONFIG_CC_CAN_LINK to avoid a warning, which is
made fatal by -Werror
- Drop explicit LZMA parallel compression in scripts/make_fit.py
- Several fixes for commit 62089b8048 ("kbuild: rpm-pkg: Generate
debuginfo package manually")
* tag 'kbuild-fixes-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: rpm-pkg: Disable automatic requires for manual debuginfo package
kbuild: rpm-pkg: Fix manual debuginfo generation when using .src.rpm
kernel: rpm-pkg: Restore find-debuginfo.sh approach to -debuginfo package
kbuild: rpm-pkg: Restrict manual debug package creation
scripts/make_fit.py: Drop explicit LZMA parallel compression
kbuild: Fix CC_CAN_LINK detection
kbuild: Add objtool to top-level clean target
Ihor Solodrai says:
====================
libbpf: Remove extern declaration of bpf_stream_vprintk()
The first patch adjusts a selftest that has been using
bpf_stream_printk() macro. The second patch removes the declaration.
====================
Link: https://patch.msgid.link/20260218215651.2057673-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull thermal control fix from Rafael Wysocki:
"This fixes a sysfs group leak on DLVR registration failure in the
Intel int340x thermal driver (Kaushlendra Kumar)"
* tag 'thermal-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: int340x: Fix sysfs group leak on DLVR registration failure
Pull more ACPI support updates from Rafael J. Wysocki:
"These are mostly fixes and cleanups on top of the ACPI support updates
merged recently, including two new quirks, an ACPI CPPC library fix,
and fixes and cleanups of a few core ACPI device drivers:
- Add an unused power resource handling quirk for THUNDEROBOT ZERO
(Zhai Can)
- Fix remaining for_each_possible_cpu() in the ACPI CPPC library to
use online CPUs (Sean V Kelley)
- Drop redundant checks from the ACPI notify handler and the driver
remove callback in the ACPI battery driver (Rafael Wysocki)
- Move the creation of the wakeup source during the ACPI button
driver probe to an earlier point to avoid missing a wakeup event
due to a race and clean up system wakeup handling and remove
callback in that driver (Rafael Wysocki)
- Drop unnecessary driver_data pointer clearing from the ACPI EC and
SMBUS HC drivers and make the ACPI backlight (video) driver clear
the device's driver_data pointer on remove (Rafael Wysocki)
- Force enabling of PWM2 on the Yogabook YB1-X90 tablets (Yauhen
Kharuzhy)"
* tag 'acpi-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: Add unused power resource quirk for THUNDEROBOT ZERO
ACPI: driver: Drop driver_data pointer clearing from two drivers
ACPI: video: Clear driver_data pointer on remove
ACPI: button: Tweak acpi_button_remove()
ACPI: button: Tweak system wakeup handling
ACPI: battery: Drop redundant checks from acpi_battery_remove()
ACPI: CPPC: Fix remaining for_each_possible_cpu() to use online CPUs
ACPI: x86: Force enabling of PWM2 on the Yogabook YB1-X90
ACPI: button: Call device_init_wakeup() earlier during probe
ACPI: battery: Drop redundant check from acpi_battery_notify()
Pull more power management updates from Rafael Wysocki:
"These are mostly fixes on top of the power management updates merged
recently in cpuidle governors, in the Intel RAPL power capping driver
and in the wake IRQ management code:
- Fix the handling of package-scope MSRs in the intel_rapl power
capping driver when called from the PMU subsystem and make it add
all package CPUs to the PMU cpumask to allow tools to read RAPL
events from any CPU in the package (Kuppuswamy Satharayananyan)
- Rework the invalid version check in the intel_rapl_tpmi power
capping driver to account for the fact that on partitioned systems,
multiple TPMI instances may exist per package, but RAPL registers
are only valid on one instance (Kuppuswamy Satharayananyan)
- Describe the new intel_idle.table command line option in the
admin-guide intel_idle documentation (Artem Bityutskiy)
- Fix a crash in the ladder cpuidle governor on systems with only one
(polling) idle state available by making the cpuidle core bypass
the governor in those cases and adjust the other existing governors
to that change (Aboorva Devarajan, Christian Loehle)
- Update kerneldoc comments for wake IRQ management functions that
have not been matching the code (Wang Jiayue)"
* tag 'pm-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: menu: Remove single state handling
cpuidle: teo: Remove single state handling
cpuidle: haltpoll: Remove single state handling
cpuidle: Skip governor when only one idle state is available
powercap: intel_rapl_tpmi: Remove FW_BUG from invalid version check
PM: sleep: wakeirq: Update outdated documentation comments
Documentation: PM: Document intel_idle.table command line option
powercap: intel_rapl: Expose all package CPUs in PMU cpumask
powercap: intel_rapl: Remove incorrect CPU check in PMU context
Merge additional updates of multiple core ACPI device drivers (battery,
button, video, EC, SMBUS HC) for 7.0-rc1:
- Drop redundant checks from the ACPI notify handler and the driver
remove callback in the ACPI battery driver (Rafael Wysocki)
- Move the creation of the wakeup source during the ACPI button driver
probe to an earlier point to avoid missing a wakeup event due to a
race and clean up system wakeup handling and remove callback in that
driver (Rafael Wysocki)
- Drop unnecessary driver_data pointer clearing from the ACPI EC and
SMBUS HC drivers and make the ACPI backlight (video) driver clear the
device's driver_data pointer on remove (Rafael Wysocki)
* acpi-battery:
ACPI: battery: Drop redundant checks from acpi_battery_remove()
ACPI: battery: Drop redundant check from acpi_battery_notify()
* acpi-button:
ACPI: button: Tweak acpi_button_remove()
ACPI: button: Tweak system wakeup handling
ACPI: button: Call device_init_wakeup() earlier during probe
* acpi-driver:
ACPI: driver: Drop driver_data pointer clearing from two drivers
ACPI: video: Clear driver_data pointer on remove
Merge an ACPI power management update and an ACPI CPPC library update
for 7.0-rc1:
- Add an unused power resource handling quirk for THUNDEROBOT ZERO (Zhai
Can)
- Fix remaining for_each_possible_cpu() in the ACPI CPPC library to use
online CPUs (Sean V Kelley)
* acpi-pm:
ACPI: PM: Add unused power resource quirk for THUNDEROBOT ZERO
* acpi-cppc:
ACPI: CPPC: Fix remaining for_each_possible_cpu() to use online CPUs
Merge additional power capping and cpuidle updates for 7.0-rc1:
- Fix the handling of package-scope MSRs in the intel_rapl power
capping driver when called from the PMU subsystem and make it add all
package CPUs to the PMU cpumask to allow tools to read RAPL events
from any CPU in the package (Kuppuswamy Sathyanarayanan)
- Rework the invalid version check in the intel_rapl_tpmi power capping
driver to account for the fact that on partitioned systems, multiple
TPMI instances may exist per package, but RAPL registers are only
valid on one instance (Kuppuswamy Satharayananyan)
- Describe the new intel_idle.table command line option in the
admin-guide intel_idle documentation (Artem Bityutskiy)
- Fix a crash in the ladder cpuidle governor on systems with only one
(polling) idle state available by making the cpuidle core bypass the
governor in those cases and adjust the other existing governors to
that change (Aboorva Devarajan, Christian Loehle)
* pm-powercap:
powercap: intel_rapl_tpmi: Remove FW_BUG from invalid version check
powercap: intel_rapl: Expose all package CPUs in PMU cpumask
powercap: intel_rapl: Remove incorrect CPU check in PMU context
* pm-cpuidle:
cpuidle: menu: Remove single state handling
cpuidle: teo: Remove single state handling
cpuidle: haltpoll: Remove single state handling
cpuidle: Skip governor when only one idle state is available
Documentation: PM: Document intel_idle.table command line option
Pull sysctl updates from Joel Granados:
- Remove macros from proc handler converters
Replace the proc converter macros with "regular" functions. Though it
is more verbose than the macro version, it helps when debugging and
better aligns with coding-style.rst.
- General cleanup
Remove superfluous ctl_table forward declarations. Const qualify the
memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays.
Add missing kernel doc to proc_dointvec_conv.
- Testing
This series was run through sysctl selftests/kunit test suite in
x86_64. And went into linux-next after rc4, giving it a good 3 weeks
of testing
* tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions
sysctl: Replace unidirectional INT converter macros with functions
sysctl: Add kernel doc to proc_douintvec_conv
sysctl: Replace UINT converter macros with functions
sysctl: Add CONFIG_PROC_SYSCTL guards for converter macros
sysctl: clarify proc_douintvec_minmax doc
sysctl: Return -ENOSYS from proc_douintvec_conv when CONFIG_PROC_SYSCTL=n
sysctl: Remove unused ctl_table forward declarations
loadpin: Implement custom proc_handler for enforce
alloc_tag: move memory_allocation_profiling_sysctls into .rodata
sysctl: Add missing kernel-doc for proc_dointvec_conv
Pull turbostat fix from Len Brown:
"Fix a recent AMD regression due to errant code cleanup"
* tag 'turbostat-2026.02.14-AMD-RAPL-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: Fix AMD RAPL regression
Increasing the MTU beyond the HDS threshold causes the hardware to
fragment packets across multiple buffers. If a single-buffer XDP program
is attached, the driver will drop all multi-frag frames. While we can't
prevent a remote sender from sending non-TCP packets larger than the MTU,
this will prevent users from inadvertently breaking new TCP streams.
Traditionally, drivers supported XDP with MTU less than 4Kb
(packet per page). Fbnic currently prevents attaching XDP when MTU is too high.
But it does not prevent increasing MTU after XDP is attached.
Fixes: 1b0a3950db ("eth: fbnic: Add XDP pass, drop, abort support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>