Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN, netfilter and wireless.
Current release - new code bugs:
- sched: cake: fixup cake_mq rate adjustment for diffserv config
- wifi: fix missing ieee80211_eml_params member initialization
Previous releases - regressions:
- tcp: give up on stronger sk_rcvbuf checks (for now)
Previous releases - always broken:
- net: fix rcu_tasks stall in threaded busypoll
- sched:
- fq: clear q->band_pkt_count[] in fq_reset()
- only allow act_ct to bind to clsact/ingress qdiscs and shared
blocks
- bridge: check relevant per-VLAN options in VLAN range grouping
- xsk: fix fragment node deletion to prevent buffer leak
Misc:
- spring cleanup of inactive maintainers"
* tag 'net-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits)
xdp: produce a warning when calculated tailroom is negative
net: enetc: use truesize as XDP RxQ info frag_size
libeth, idpf: use truesize as XDP RxQ info frag_size
i40e: use xdp.frame_sz as XDP RxQ info frag_size
i40e: fix registering XDP RxQ info
ice: change XDP RxQ frag_size from DMA write length to xdp.frame_sz
ice: fix rxq info registering in mbuf packets
xsk: introduce helper to determine rxq->frag_size
xdp: use modulo operation to calculate XDP frag tailroom
selftests/tc-testing: Add tests exercising act_ife metalist replace behaviour
net/sched: act_ife: Fix metalist update behavior
selftests: net: add test for IPv4 route with loopback IPv6 nexthop
net: ipv6: fix panic when IPv4 route references loopback IPv6 nexthop
net: vxlan: fix nd_tbl NULL dereference when IPv6 is disabled
net: bridge: fix nd_tbl NULL dereference when IPv6 is disabled
MAINTAINERS: remove Thomas Falcon from IBM ibmvnic
MAINTAINERS: remove Claudiu Manoil and Alexandre Belloni from Ocelot switch
MAINTAINERS: replace Taras Chornyi with Elad Nachman for Marvell Prestera
MAINTAINERS: remove Jonathan Lemon from OpenCompute PTP
MAINTAINERS: replace Clark Wang with Frank Li for Freescale FEC
...
Add a regression test for a kernel panic that occurs when an IPv4 route
references an IPv6 nexthop object created on the loopback device.
The test creates an IPv6 nexthop on lo, binds an IPv4 route to it, then
triggers a route lookup via ping to verify the kernel does not crash.
./fib_nexthops.sh
Tests passed: 249
Tests failed: 0
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260304113817.294966-3-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The tun UDP tunnel GSO fixture contains XFAIL-marked variants intended to
exercise failure paths (e.g. EMSGSIZE / "Message too long").
Using ASSERT_EQ() in these tests aborts the subtest, which prevents the
harness from classifying them as XFAIL and can make the overall net: tun
test fail.
Switch the relevant ASSERT_EQ() checks to EXPECT_EQ() so the subtests
continue running and the failures are correctly reported and accounted
as XFAIL where applicable.
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://patch.msgid.link/20260225111451.347923-2-sun.jian.kdev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TEST_F() allocates and registers its struct __test_metadata via mmap()
inside its constructor, and only then assigns the
_##fixture_##test##_object pointer.
XFAIL_ADD() runs in a constructor too and reads
_##fixture_##test##_object to initialize xfail->test. If XFAIL_ADD runs
first, xfail->test can be NULL and the expected failure will be reported
as FAIL.
Use constructor priorities to ensure TEST_F registration runs before
XFAIL_ADD, without adding extra state or runtime lookups.
Fixes: 2709473c93 ("selftests: kselftest_harness: support using xfail")
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://patch.msgid.link/20260225111451.347923-1-sun.jian.kdev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This validates the previous commit: endpoints with both the signal and
subflow flags should always be marked as used even if it was not
possible to create new subflows due to the MPTCP PM limits.
For this test, an extra endpoint is created with both the signal and the
subflow flags, and limits are set not to create extra subflows. In this
case, an ADD_ADDR is sent, but no subflows are created. Still, the local
endpoint is marked as used, and no warning is fired when removing the
endpoint, after having sent a RM_ADDR.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 85df533a78 ("mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260303-net-mptcp-misc-fixes-7-0-rc2-v1-5-4b5462b6f016@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This validates the previous commit: RM_ADDR were sent over the first
found active subflow which could be the same as the one being removed.
It is more likely to loose this notification.
For this check, RM_ADDR are explicitly dropped when trying to send them
over the initial subflow, when removing the endpoint attached to it. If
it is dropped, the test will complain because some RM_ADDR have not been
received.
Note that only the RM_ADDR are dropped, to allow the linked subflow to
be quickly and cleanly closed. To only drop those RM_ADDR, a cBPF byte
code is used. If the IPTables commands fail, that's OK, the tests will
continue to pass, but not validate this part. This can be ignored:
another subtest fully depends on such command, and will be marked as
skipped.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 8dd5efb1f9 ("mptcp: send ack for rm_addr")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260303-net-mptcp-misc-fixes-7-0-rc2-v1-3-4b5462b6f016@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
By default, the netem qdisc can keep up to 1000 packets under its belly
to deal with the configured rate and delay. The simult flows test-case
simulates very low speed links, to avoid problems due to slow CPUs and
the TCP stack tend to transmit at a slightly higher rate than the
(virtual) link constraints.
All the above causes a relatively large amount of packets being enqueued
in the netem qdiscs - the longer the transfer, the longer the queue -
producing increasingly high TCP RTT samples and consequently increasingly
larger receive buffer size due to DRS.
When the receive buffer size becomes considerably larger than the needed
size, the tests results can flake, i.e. because minimal inaccuracy in the
pacing rate can lead to a single subflow usage towards the end of the
connection for a considerable amount of data.
Address the issue explicitly setting netem limits suitable for the
configured link speeds and unflake all the affected tests.
Fixes: 1a418cb8e8 ("mptcp: simult flow self-tests")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260303-net-mptcp-misc-fixes-7-0-rc2-v1-1-4b5462b6f016@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
list_categories() builds a set directly from the 'category'
field of each test case. Since 'category' is a list,
set(map(...)) attempts to insert lists into a set, which
raises:
TypeError: unhashable type: 'list'
Flatten category lists and collect unique category names
using set.update() instead.
Signed-off-by: Naveen Anandhan <mr.navi8680@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull cgroup fixes from Tejun Heo:
- Fix circular locking dependency in cpuset partition code by
deferring housekeeping_update() calls to a workqueue instead
of calling them directly under cpus_read_lock
- Fix null-ptr-deref in rebuild_sched_domains_cpuslocked() when
generate_sched_domains() returns NULL due to kmalloc failure
- Fix incorrect cpuset behavior for effective_xcpus in
partition_xcpus_del() and cpuset_update_tasks_cpumask()
in update_cpumasks_hier()
- Fix race between task migration and cgroup iteration
* tag 'cgroup-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: fix null-ptr-deref in rebuild_sched_domains_cpuslocked
cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock
cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue
cgroup/cpuset: Move housekeeping_update()/rebuild_sched_domains() together
kselftest/cgroup: Simplify test_cpuset_prs.sh by removing "S+" command
cgroup/cpuset: Set isolated_cpus_updating only if isolated_cpus is changed
cgroup/cpuset: Clarify exclusion rules for cpuset internal variables
cgroup/cpuset: Fix incorrect use of cpuset_update_tasks_cpumask() in update_cpumasks_hier()
cgroup/cpuset: Fix incorrect change to effective_xcpus in partition_xcpus_del()
cgroup: fix race between task migration and iteration
Pull sched_ext fixes from Tejun Heo:
- Fix starvation of scx_enable() under fair-class saturation by
offloading the enable path to an RT kthread
- Fix out-of-bounds access in idle mask initialization on systems with
non-contiguous NUMA node IDs
- Fix a preemption window during scheduler exit and a refcount
underflow in cgroup init error path
- Fix SCX_EFLAG_INITIALIZED being a no-op flag
- Add READ_ONCE() annotations for KCSAN-clean lockless accesses and
replace naked scx_root dereferences with container_of() in kobject
callbacks
- Tooling and selftest fixes: compilation issues with clang 17,
strtoul() misuse, unused options cleanup, and Kconfig sync
* tag 'sched_ext-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Fix starvation of scx_enable() under fair-class saturation
sched_ext: Remove redundant css_put() in scx_cgroup_init()
selftests/sched_ext: Fix peek_dsq.bpf.c compile error for clang 17
selftests/sched_ext: Add -fms-extensions to bpf build flags
tools/sched_ext: Add -fms-extensions to bpf build flags
sched_ext: Use READ_ONCE() for plain reads of scx_watchdog_timeout
sched_ext: Replace naked scx_root dereferences in kobject callbacks
sched_ext: Use READ_ONCE() for the read side of dsq->nr update
tools/sched_ext: fix strtoul() misuse in scx_hotplug_seq()
sched_ext: Fix SCX_EFLAG_INITIALIZED being a no-op flag
sched_ext: Fix out-of-bounds access in scx_idle_init_masks()
sched_ext: Disable preemption between scx_claim_exit() and kicking helper work
tools/sched_ext: Add Kconfig to sync with upstream
tools/sched_ext: Sync README.md Kconfig with upstream scx
selftests/sched_ext: Remove duplicated unistd.h include in rt_stall.c
tools/sched_ext: scx_sdt: Remove unused '-f' option
tools/sched_ext: scx_central: Remove unused '-p' option
selftests/sched_ext: Fix unused-result warning for read()
selftests/sched_ext: Abort test loop on signal
Add a selftest to verify that changing xmit_hash_policy to vlan+srcmac
is rejected when a native XDP program is loaded on a bond in 802.3ad
mode. Without the fix in bond_option_xmit_hash_policy_set(), the change
succeeds silently, creating an inconsistent state that triggers a kernel
WARNING in dev_xdp_uninstall() when the bond is torn down.
The test attaches native XDP to a bond0 (802.3ad, layer2+3), then
attempts to switch xmit_hash_policy to vlan+srcmac and asserts the
operation fails. It also verifies the change succeeds after XDP is
detached, confirming the rejection is specific to the XDP-loaded state.
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://patch.msgid.link/20260226080306.98766-3-jiayuan.chen@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When compiling sched_ext selftests using clang 17.0.6, it raised
compiler crash and build error:
Error at line 68: Unsupport signed division for DAG: 0x55b2f9a60240:
i64 = sdiv 0x55b2f9a609b0, Constant:i64<100>, peek_dsq.bpf.c:68:25 @[
peek_dsq.bpf.c:95:4 @[ peek_dsq.bpf.c:169:8 @[ peek
_dsq.bpf.c:140:6 ] ] ]Please convert to unsigned div/mod
After digging, it's not a compiler error, clang supported Signed division
only when using -mcpu=v4, while we use -mcpu=v3 currently, the better way
is to use unsigned div, see [1] for details.
[1] https://github.com/llvm/llvm-project/issues/70433
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Similar to commit 835a507535 ("selftests/bpf: Add -fms-extensions to
bpf build flags") and commit 639f58a0f4 ("bpftool: Fix build warnings
due to MS extensions")
Fix "declaration does not declare anything" warning by using
-fms-extensions and -Wno-microsoft-anon-tag flags to build bpf programs
that #include "vmlinux.h"
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Commit 1cc93c48b5 ("selftests/net: packetdrill: remove tests for
tcp_rcv_*big") removed the test for the reverted commit 1d2fbaad7c
("tcp: stronger sk_rcvbuf checks") but also the one for commit
9ca48d616e ("tcp: do not accept packets beyond window").
Restore the test with the necessary adaptation: expect a delayed ACK
instead of an immediate one, since tcp_can_ingest() does not fail
anymore for the last data packet.
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Link: https://patch.msgid.link/20260301-tcp_rcv_big_endseq-v1-1-86ab7415ab58@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull bpf fixes from Alexei Starovoitov:
- Fix alignment of arm64 JIT buffer to prevent atomic tearing (Fuad
Tabba)
- Fix invariant violation for single value tnums in the verifier
(Harishankar Vishwanathan, Paul Chaignon)
- Fix a bunch of issues found by ASAN in selftests/bpf (Ihor Solodrai)
- Fix race in devmpa and cpumap on PREEMPT_RT (Jiayuan Chen)
- Fix show_fdinfo of kprobe_multi when cookies are not present (Jiri
Olsa)
- Fix race in freeing special fields in BPF maps to prevent memory
leaks (Kumar Kartikeya Dwivedi)
- Fix OOB read in dmabuf_collector (T.J. Mercier)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (36 commits)
selftests/bpf: Avoid simplification of crafted bounds test
selftests/bpf: Test refinement of single-value tnum
bpf: Improve bounds when tnum has a single possible value
bpf: Introduce tnum_step to step through tnum's members
bpf: Fix race in devmap on PREEMPT_RT
bpf: Fix race in cpumap on PREEMPT_RT
selftests/bpf: Add tests for special fields races
bpf: Retire rcu_trace_implies_rcu_gp() from local storage
bpf: Delay freeing fields in local storage
bpf: Lose const-ness of map in map_check_btf()
bpf: Register dtor for freeing special fields
selftests/bpf: Fix OOB read in dmabuf_collector
selftests/bpf: Fix a memory leak in xdp_flowtable test
bpf: Fix stack-out-of-bounds write in devmap
bpf: Fix kprobe_multi cookies access in show_fdinfo callback
bpf, arm64: Force 8-byte alignment for JIT buffer to prevent atomic tearing
selftests/bpf: Don't override SIGSEGV handler with ASAN
selftests/bpf: Check BPFTOOL env var in detect_bpftool_path()
selftests/bpf: Fix out-of-bounds array access bugs reported by ASAN
selftests/bpf: Fix array bounds warning in jit_disasm_helpers
...
Jakub reports test flakes on debug kernels:
FAIL: test_udp_gro_ct: Expected software segmentation to occur, had 23 and 17
This test assumes that the kernels nfnetlink_queue module sees N GSO
packets, segments them into M skbs and queues them to userspace for
reinjection.
Hence, if M >= N, no segmentation occurred.
However, its possible that this happens:
- nfnetlink_queue gets GSO packet
- segments that into n skbs
- userspace buffer is full, kernel drops the segmented skbs
-> "toqueue" counter incremented by 1, "fromqueue" is unchanged.
If this happens often enough in a single run, M >= N check triggers
incorrectly.
To solve this, allow the nf_queue.c test program to set the FAIL_OPEN
flag so that the segmented skbs bypass the queueing step in the kernel
if the receive buffer is full.
Also, reduce number of sending socat instances, decrease their priority
and increase nice value for the nf_queue program itself to reduce the
probability of overruns happening in the first place.
Fixes: 59ecffa399 ("selftests: netfilter: nft_queue.sh: add udp fraglist gro test case")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20260218184114.0b405b72@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260226161920.1205-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The reg_bounds_crafted tests validate the verifier's range analysis
logic. They focus on the actual ranges and thus ignore the tnum. As a
consequence, they carry the assumption that the tested cases can be
reproduced in userspace without using the tnum information.
Unfortunately, the previous change the refinement logic breaks that
assumption for one test case:
(u64)2147483648 (u32)<op> [4294967294; 0x100000000]
The tested bytecode is shown below. Without our previous improvement, on
the false branch of the condition, R7 is only known to have u64 range
[0xfffffffe; 0x100000000]. With our improvement, and using the tnum
information, we can deduce that R7 equals 0x100000000.
19: (bc) w0 = w6 ; R6=0x80000000
20: (bc) w0 = w7 ; R7=scalar(smin=umin=0xfffffffe,smax=umax=0x100000000,smin32=-2,smax32=0,var_off=(0x0; 0x1ffffffff))
21: (be) if w6 <= w7 goto pc+3 ; R6=0x80000000 R7=0x100000000
R7's tnum is (0; 0x1ffffffff). On the false branch, regs_refine_cond_op
refines R7's u32 range to [0; 0x7fffffff]. Then, __reg32_deduce_bounds
refines the s32 range to 0 using u32 and finally also sets u32=0.
From this, __reg_bound_offset improves the tnum to (0; 0x100000000).
Finally, our previous patch uses this new tnum to deduce that it only
intersect with u64=[0xfffffffe; 0x100000000] in a single value:
0x100000000.
Because the verifier uses the tnum to reach this constant value, the
selftest is unable to reproduce it by only simulating ranges. The
solution implemented in this patch is to change the test case such that
there is more than one overlap value between u64 and the tnum. The max.
u64 value is thus changed from 0x100000000 to 0x300000000.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/50641c6a7ef39520595dcafa605692427c1006ec.1772225741.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch introduces selftests to cover the new bounds refinement
logic introduced in the previous patch. Without the previous patch,
the first two tests fail because of the invariant violation they
trigger. The last test fails because the R10 access is not detected as
dead code. In addition, all three tests fail because of R0 having a
non-constant value in the verifier logs.
In addition, the last two cases are covering the negative cases: when we
shouldn't refine the bounds because the u64 and tnum overlap in at least
two values.
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/90d880c8cf587b9f7dc715d8961cd1b8111d01a8.1772225741.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a couple of tests to ensure that the refcount drops to zero when we
exercise the race where creation of a special field succeeds the logical
bpf_obj_free_fields done when deleting an element. Prior to previous
changes, the fields would be freed eagerly and repopulate and end up
leaking, causing the reference to not drop down correctly. Running this
test on a kernel without fixes will cause a hang in delete_module, since
the module reference stays active due to the leaked kptr not dropping
it. After the fixes tests succeed as expected.
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260227224806.646888-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull arm64 fixes from Will Deacon:
"The diffstat is dominated by changes to our TLB invalidation errata
handling and the introduction of a new GCS selftest to catch one of
the issues that is fixed here relating to PROT_NONE mappings.
- Fix cpufreq warning due to attempting a cross-call with interrupts
masked when reading local AMU counters
- Fix DEBUG_PREEMPT warning from the delay loop when it tries to
access per-cpu errata workaround state for the virtual counter
- Re-jig and optimise our TLB invalidation errata workarounds in
preparation for more hardware brokenness
- Fix GCS mappings to interact properly with PROT_NONE and to avoid
corrupting the pte on CPUs with FEAT_LPA2
- Fix ioremap_prot() to extract only the memory attributes from the
user pte and ignore all the other 'prot' bits"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: topology: Fix false warning in counters_read_on_cpu() for same-CPU reads
arm64: Fix sampling the "stable" virtual counter in preemptible section
arm64: tlb: Optimize ARM64_WORKAROUND_REPEAT_TLBI
arm64: tlb: Allow XZR argument to TLBI ops
kselftest: arm64: Check access to GCS after mprotect(PROT_NONE)
arm64: gcs: Honour mprotect(PROT_NONE) on shadow stack mappings
arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled
arm64: io: Extract user memory type in ioremap_prot()
arm64: io: Rename ioremap_prot() to __ioremap_prot()
Add a new test file bridge_vlan_dump.sh with four test cases that verify
VLANs with different per-VLAN options are not incorrectly grouped into
ranges in the dump output.
The tests verify the kernel's br_vlan_opts_eq_range() function correctly
prevents VLAN range grouping when neigh_suppress, mcast_max_groups,
mcast_n_groups, or mcast_enabled options differ.
Each test verifies that VLANs with different option values appear as
individual entries rather than ranges, and that VLANs with matching
values are properly grouped together.
Example output:
$ ./bridge_vlan_dump.sh
TEST: VLAN range grouping with neigh_suppress [ OK ]
TEST: VLAN range grouping with mcast_max_groups [ OK ]
TEST: VLAN range grouping with mcast_n_groups [ OK ]
TEST: VLAN range grouping with mcast_enabled [ OK ]
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260225143956.3995415-3-danieller@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dmabuf name allocations can be less than DMA_BUF_NAME_LEN characters,
but bpf_probe_read_kernel always tries to read exactly that many bytes.
If a name is less than DMA_BUF_NAME_LEN characters,
bpf_probe_read_kernel will read past the end. bpf_probe_read_kernel_str
stops at the first NUL terminator so use it instead, like
iter_dmabuf_for_each already does.
Fixes: ae5d2c59ec ("selftests/bpf: Add test for dmabuf_iter")
Reported-by: Jerome Lee <jaewookl@quicinc.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Link: https://lore.kernel.org/r/20260225003349.113746-1-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull networking fixes from Paolo Abeni:
"Including fixes from IPsec, Bluetooth and netfilter
Current release - regressions:
- wifi: fix dev_alloc_name() return value check
- rds: fix recursive lock in rds_tcp_conn_slots_available
Current release - new code bugs:
- vsock: lock down child_ns_mode as write-once
Previous releases - regressions:
- core:
- do not pass flow_id to set_rps_cpu()
- consume xmit errors of GSO frames
- netconsole: avoid OOB reads, msg is not nul-terminated
- netfilter: h323: fix OOB read in decode_choice()
- tcp: re-enable acceptance of FIN packets when RWIN is 0
- udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb().
- wifi: brcmfmac: fix potential kernel oops when probe fails
- phy: register phy led_triggers during probe to avoid AB-BA deadlock
- eth:
- bnxt_en: fix deleting of Ntuple filters
- wan: farsync: fix use-after-free bugs caused by unfinished tasklets
- xscale: check for PTP support properly
Previous releases - always broken:
- tcp: fix potential race in tcp_v6_syn_recv_sock()
- kcm: fix zero-frag skb in frag_list on partial sendmsg error
- xfrm:
- fix race condition in espintcp_close()
- always flush state and policy upon NETDEV_UNREGISTER event
- bluetooth:
- purge error queues in socket destructors
- fix response to L2CAP_ECRED_CONN_REQ
- eth:
- mlx5:
- fix circular locking dependency in dump
- fix "scheduling while atomic" in IPsec MAC address query
- gve: fix incorrect buffer cleanup for QPL
- team: avoid NETDEV_CHANGEMTU event when unregistering slave
- usb: validate USB endpoints"
* tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
netfilter: nf_conntrack_h323: fix OOB read in decode_choice()
dpaa2-switch: validate num_ifs to prevent out-of-bounds write
net: consume xmit errors of GSO frames
vsock: document write-once behavior of the child_ns_mode sysctl
vsock: lock down child_ns_mode as write-once
selftests/vsock: change tests to respect write-once child ns mode
net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query
net/mlx5: Fix missing devlink lock in SRIOV enable error path
net/mlx5: E-switch, Clear legacy flag when moving to switchdev
net/mlx5: LAG, disable MPESW in lag_disable_change()
net/mlx5: DR, Fix circular locking dependency in dump
selftests: team: Add a reference count leak test
team: avoid NETDEV_CHANGEMTU event when unregistering slave
net: mana: Fix double destroy_workqueue on service rescan PCI path
MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER
dpll: zl3073x: Remove redundant cleanup in devm_dpll_init()
selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0
tcp: re-enable acceptance of FIN packets when RWIN is 0
vsock: Use container_of() to get net namespace in sysctl handlers
net: usb: kaweth: validate USB endpoints
...
The child_ns_mode sysctl parameter becomes write-once in a future patch
in this series, which breaks existing tests. This patch updates the
tests to respect this new policy. No additional tests are added.
Add "global-parent" and "local-parent" namespaces as intermediaries to
spawn namespaces in the given modes. This avoids the need to change
"child_ns_mode" in the init_ns. nsenter must be used because ip netns
unshares the mount namespace so nested "ip netns add" breaks exec calls
from the init ns. Adds nsenter to the deps check.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-1-c0cde6959923@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a test for the issue that was fixed in "team: avoid NETDEV_CHANGEMTU
event when unregistering slave".
The test hangs due to a reference count leak without the fix:
# make -C tools/testing/selftests TARGETS="drivers/net/team" TEST_PROGS=refleak.sh TEST_GEN_PROGS="" run_tests
[...]
TAP version 13
1..1
# timeout set to 45
# selftests: drivers/net/team: refleak.sh
[ 50.681299][ T496] unregister_netdevice: waiting for dummy1 to become free. Usage count = 3
[ 71.185325][ T496] unregister_netdevice: waiting for dummy1 to become free. Usage count = 3
And passes with the fix:
# make -C tools/testing/selftests TARGETS="drivers/net/team" TEST_PROGS=refleak.sh TEST_GEN_PROGS="" run_tests
[...]
TAP version 13
1..1
# timeout set to 45
# selftests: drivers/net/team: refleak.sh
ok 1 selftests: drivers/net/team: refleak.sh
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260224125709.317574-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bpftool_maps_access and bpftool_metadata tests may fail on BPF CI
with "command not found", depending on a workflow.
This happens because detect_bpftool_path() only checks two hardcoded
relative paths:
- ./tools/sbin/bpftool
- ../tools/sbin/bpftool
Add support for a BPFTOOL environment variable that allows specifying
the exact path to the bpftool binary.
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223191118.655185-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- kmem_cache_iter: remove unnecessary debug output
- lwt_seg6local: change the type of foobar to char[]
- the sizeof(foobar) returned the pointer size and not a string
length as intended
- verifier_log: increase prog_name buffer size in verif_log_subtest()
- compiler has a conservative estimate of fixed_log_sz value, making
ASAN complain on snprint() call
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223191118.655185-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ASAN reported a memory leak in bpf_get_ksyms(): it allocates a struct
ksyms internally and never frees it.
Move struct ksyms to trace_helpers.h and return it from the
bpf_get_ksyms(), giving ownership to the caller. Add filtered_syms and
filtered_cnt fields to the ksyms to hold the filtered array of
symbols, previously returned by bpf_get_ksyms().
Fixup the call sites: kprobe_multi_test and bench_trigger.
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260223190736.649171-10-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a denylist file for tests that should be skipped when built with
userspace ASAN:
$ make ... SAN_CFLAGS="-fsanitize=address -fno-omit-frame-pointer"
Skip the following tests:
- *arena*: userspace ASAN does not understand BPF arena maps and gets
confused particularly when map_extra is non-zero
- non-zero map_extra leads to mmap with MAP_FIXED, and ASAN treats
this as an unknown memory region
- task_local_data: ASAN complains about "incorrect" aligned_alloc()
usage, but it's intentional in the test
- uprobe_multi_test: very slow with ASAN enabled
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-9-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
EXTRA_* and SAN_* build flags were not correctly propagated to bpftool
and resolve_btids when building selftests/bpf. This led to various
build errors on attempt to build with SAN_CFLAGS="-fsanitize=address",
for example.
Fix the makefiles to address this:
- Pass SAN_CFLAGS/SAN_LDFLAGS to bpftool and resolve_btfids build
- Propagate EXTRA_LDFLAGS to resolve_btfids link command
- Use pkg-config to detect zlib and zstd for resolve_btfids, similar
libelf handling
Also check for ASAN flag in selftests/bpf/Makefile for convenience.
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-7-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>