Jakub reports test flakes on debug kernels:
FAIL: test_udp_gro_ct: Expected software segmentation to occur, had 23 and 17
This test assumes that the kernels nfnetlink_queue module sees N GSO
packets, segments them into M skbs and queues them to userspace for
reinjection.
Hence, if M >= N, no segmentation occurred.
However, its possible that this happens:
- nfnetlink_queue gets GSO packet
- segments that into n skbs
- userspace buffer is full, kernel drops the segmented skbs
-> "toqueue" counter incremented by 1, "fromqueue" is unchanged.
If this happens often enough in a single run, M >= N check triggers
incorrectly.
To solve this, allow the nf_queue.c test program to set the FAIL_OPEN
flag so that the segmented skbs bypass the queueing step in the kernel
if the receive buffer is full.
Also, reduce number of sending socat instances, decrease their priority
and increase nice value for the nf_queue program itself to reduce the
probability of overruns happening in the first place.
Fixes: 59ecffa399 ("selftests: netfilter: nft_queue.sh: add udp fraglist gro test case")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20260218184114.0b405b72@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260226161920.1205-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new test file bridge_vlan_dump.sh with four test cases that verify
VLANs with different per-VLAN options are not incorrectly grouped into
ranges in the dump output.
The tests verify the kernel's br_vlan_opts_eq_range() function correctly
prevents VLAN range grouping when neigh_suppress, mcast_max_groups,
mcast_n_groups, or mcast_enabled options differ.
Each test verifies that VLANs with different option values appear as
individual entries rather than ranges, and that VLANs with matching
values are properly grouped together.
Example output:
$ ./bridge_vlan_dump.sh
TEST: VLAN range grouping with neigh_suppress [ OK ]
TEST: VLAN range grouping with mcast_max_groups [ OK ]
TEST: VLAN range grouping with mcast_n_groups [ OK ]
TEST: VLAN range grouping with mcast_enabled [ OK ]
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260225143956.3995415-3-danieller@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Paolo Abeni:
"Including fixes from IPsec, Bluetooth and netfilter
Current release - regressions:
- wifi: fix dev_alloc_name() return value check
- rds: fix recursive lock in rds_tcp_conn_slots_available
Current release - new code bugs:
- vsock: lock down child_ns_mode as write-once
Previous releases - regressions:
- core:
- do not pass flow_id to set_rps_cpu()
- consume xmit errors of GSO frames
- netconsole: avoid OOB reads, msg is not nul-terminated
- netfilter: h323: fix OOB read in decode_choice()
- tcp: re-enable acceptance of FIN packets when RWIN is 0
- udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb().
- wifi: brcmfmac: fix potential kernel oops when probe fails
- phy: register phy led_triggers during probe to avoid AB-BA deadlock
- eth:
- bnxt_en: fix deleting of Ntuple filters
- wan: farsync: fix use-after-free bugs caused by unfinished tasklets
- xscale: check for PTP support properly
Previous releases - always broken:
- tcp: fix potential race in tcp_v6_syn_recv_sock()
- kcm: fix zero-frag skb in frag_list on partial sendmsg error
- xfrm:
- fix race condition in espintcp_close()
- always flush state and policy upon NETDEV_UNREGISTER event
- bluetooth:
- purge error queues in socket destructors
- fix response to L2CAP_ECRED_CONN_REQ
- eth:
- mlx5:
- fix circular locking dependency in dump
- fix "scheduling while atomic" in IPsec MAC address query
- gve: fix incorrect buffer cleanup for QPL
- team: avoid NETDEV_CHANGEMTU event when unregistering slave
- usb: validate USB endpoints"
* tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
netfilter: nf_conntrack_h323: fix OOB read in decode_choice()
dpaa2-switch: validate num_ifs to prevent out-of-bounds write
net: consume xmit errors of GSO frames
vsock: document write-once behavior of the child_ns_mode sysctl
vsock: lock down child_ns_mode as write-once
selftests/vsock: change tests to respect write-once child ns mode
net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query
net/mlx5: Fix missing devlink lock in SRIOV enable error path
net/mlx5: E-switch, Clear legacy flag when moving to switchdev
net/mlx5: LAG, disable MPESW in lag_disable_change()
net/mlx5: DR, Fix circular locking dependency in dump
selftests: team: Add a reference count leak test
team: avoid NETDEV_CHANGEMTU event when unregistering slave
net: mana: Fix double destroy_workqueue on service rescan PCI path
MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER
dpll: zl3073x: Remove redundant cleanup in devm_dpll_init()
selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0
tcp: re-enable acceptance of FIN packets when RWIN is 0
vsock: Use container_of() to get net namespace in sysctl handlers
net: usb: kaweth: validate USB endpoints
...
The child_ns_mode sysctl parameter becomes write-once in a future patch
in this series, which breaks existing tests. This patch updates the
tests to respect this new policy. No additional tests are added.
Add "global-parent" and "local-parent" namespaces as intermediaries to
spawn namespaces in the given modes. This avoids the need to change
"child_ns_mode" in the init_ns. nsenter must be used because ip netns
unshares the mount namespace so nested "ip netns add" breaks exec calls
from the init ns. Adds nsenter to the deps check.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-1-c0cde6959923@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a test for the issue that was fixed in "team: avoid NETDEV_CHANGEMTU
event when unregistering slave".
The test hangs due to a reference count leak without the fix:
# make -C tools/testing/selftests TARGETS="drivers/net/team" TEST_PROGS=refleak.sh TEST_GEN_PROGS="" run_tests
[...]
TAP version 13
1..1
# timeout set to 45
# selftests: drivers/net/team: refleak.sh
[ 50.681299][ T496] unregister_netdevice: waiting for dummy1 to become free. Usage count = 3
[ 71.185325][ T496] unregister_netdevice: waiting for dummy1 to become free. Usage count = 3
And passes with the fix:
# make -C tools/testing/selftests TARGETS="drivers/net/team" TEST_PROGS=refleak.sh TEST_GEN_PROGS="" run_tests
[...]
TAP version 13
1..1
# timeout set to 45
# selftests: drivers/net/team: refleak.sh
ok 1 selftests: drivers/net/team: refleak.sh
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260224125709.317574-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull sched_ext fixes from Tejun Heo:
- Various bug fixes for the example schedulers and selftests
* tag 'sched_ext-for-7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
tools/sched_ext: fix getopt not re-parsed on restart
tools/sched_ext: scx_userland: fix data races on shared counters
tools/sched_ext: scx_pair: fix stride == 0 crash on single-CPU systems
tools/sched_ext: scx_central: fix CPU_SET and skeleton leak on early exit
tools/sched_ext: scx_userland: fix stale data on restart
tools/sched_ext: scx_flatcg: fix potential stack overflow from VLA in fcg_read_stats
selftests/sched_ext: Fix rt_stall flaky failure
tools/sched_ext: scx_userland: fix restart and stats thread lifecycle bugs
tools/sched_ext: scx_central: fix sched_setaffinity() call with the set size
tools/sched_ext: scx_flatcg: zero-initialize stats counter array
Add a test to verify that RSS contexts persist across interface
down/up along with their associated Ntuple filters. Another test
that creates contexts/rules keeping interface down and test their
persistence is also added.
Tested on bnxt_en:
TAP version 13
1..1
# timeout set to 0
# selftests: drivers/net/hw: rss_ctx.py
# TAP version 13
# 1..2
# ok 1 rss_ctx.test_rss_context_persist_create_and_ifdown
# ok 2 rss_ctx.test_rss_context_persist_ifdown_and_create # SKIP Create context not supported with interface down
# # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:1 error:0
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20260219185313.2682148-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter.
Current release - new code bugs:
- net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT
- eth: mlx5e: XSK, Fix unintended ICOSQ change
- phy_port: correctly recompute the port's linkmodes
- vsock: prevent child netns mode switch from local to global
- couple of kconfig fixes for new symbols
Previous releases - regressions:
- nfc: nci: fix false-positive parameter validation for packet data
- net: do not delay zero-copy skbs in skb_attempt_defer_free()
Previous releases - always broken:
- mctp: ensure our nlmsg responses to user space are zero-initialised
- ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()
- fixes for ICMP rate limiting
Misc:
- intel: fix PCI device ID conflict between i40e and ipw2200"
* tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: nfc: nci: Fix parameter validation for packet data
net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
net/mlx5e: Fix deadlocks between devlink and netdev instance locks
net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
net/mlx5: Fix misidentification of write combining CQE during poll loop
net/mlx5e: Fix misidentification of ASO CQE during poll loop
net/mlx5: Fix multiport device check over light SFs
bonding: alb: fix UAF in rlb_arp_recv during bond up/down
bnge: fix reserving resources from FW
eth: fbnic: Advertise supported XDP features.
rds: tcp: fix uninit-value in __inet_bind
net/rds: Fix NULL pointer dereference in rds_tcp_accept_one
octeontx2-af: Fix default entries mcam entry action
net/mlx5e: XSK, Fix unintended ICOSQ change
ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero
ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero
ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()
inet: move icmp_global_{credit,stamp} to a separate cache line
icmp: prevent possible overflow in icmp_global_allow()
selftests/net: packetdrill: add ipv4-mapped-ipv6 tests
...
Pull bpf fixes from Alexei Starovoitov:
- Fix invalid write loop logic in libbpf's bpf_linker__add_buf() (Amery
Hung)
- Fix a potential use-after-free of BTF object (Anton Protopopov)
- Add feature detection to libbpf and avoid moving arena global
variables on older kernels (Emil Tsalapatis)
- Remove extern declaration of bpf_stream_vprintk() from libbpf headers
(Ihor Solodrai)
- Fix truncated netlink dumps in bpftool (Jakub Kicinski)
- Fix map_kptr grace period wait in bpf selftests (Kumar Kartikeya
Dwivedi)
- Remove hexdump dependency while building bpf selftests (Matthieu
Baerts)
- Complete fsession support in BPF trampolines on riscv (Menglong Dong)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Remove hexdump dependency
libbpf: Remove extern declaration of bpf_stream_vprintk()
selftests/bpf: Use vmlinux.h in test_xdp_meta
bpftool: Fix truncated netlink dumps
libbpf: Delay feature gate check until object prepare time
libbpf: Do not use PROG_TYPE_TRACEPOINT program for feature gating
bpf: Add a map/btf from a fd array more consistently
selftests/bpf: Fix map_kptr grace period wait
selftests/bpf: enable fsession_test on riscv64
selftests/bpf: Adjust selftest due to function rename
bpf, riscv: add fsession support for trampolines
bpf: Fix a potential use-after-free of BTF object
bpf, riscv: introduce emit_store_stack_imm64() for trampoline
libbpf: Fix invalid write loop logic in bpf_linker__add_buf()
libbpf: Add gating for arena globals relocation feature
Pull more non-MM updates from Andrew Morton:
- "two fixes in kho_populate()" fixes a couple of not-major issues in
the kexec handover code (Ran Xiaokai)
- misc singletons
* tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib/group_cpus: handle const qualifier from clusters allocation type
kho: remove unnecessary WARN_ON(err) in kho_populate()
kho: fix missing early_memunmap() call in kho_populate()
scripts/gdb: implement x86_page_ops in mm.py
objpool: fix the overestimation of object pooling metadata size
selftests/memfd: use IPC semaphore instead of SIGSTOP/SIGCONT
delayacct: fix build regression on accounting tool
Pull more MM updates from Andrew Morton:
- "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
couple of issues in the demotion code - pages were failed demotion
and were finding themselves demoted into disallowed nodes (Bing Jiao)
- "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
mapledtree race and performs a number of cleanups (Liam Howlett)
- "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
them" implements a lot of cleanups following on from the conversion
of the VMA flags into a bitmap (Lorenzo Stoakes)
- "support batch checking of references and unmapping for large folios"
implements batching to greatly improve the performance of reclaiming
clean file-backed large folios (Baolin Wang)
- "selftests/mm: add memory failure selftests" does as claimed (Miaohe
Lin)
* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
mm/page_alloc: clear page->private in free_pages_prepare()
selftests/mm: add memory failure dirty pagecache test
selftests/mm: add memory failure clean pagecache test
selftests/mm: add memory failure anonymous page test
mm: rmap: support batched unmapping for file large folios
arm64: mm: implement the architecture-specific clear_flush_young_ptes()
arm64: mm: support batch clearing of the young flag for large folios
arm64: mm: factor out the address and ptep alignment into a new helper
mm: rmap: support batched checks of the references for large folios
tools/testing/vma: add VMA userland tests for VMA flag functions
tools/testing/vma: separate out vma_internal.h into logical headers
tools/testing/vma: separate VMA userland tests into separate files
mm: make vm_area_desc utilise vma_flags_t only
mm: update all remaining mmap_prepare users to use vma_flags_t
mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
mm: update secretmem to use VMA flags on mmap_prepare
mm: update hugetlbfs to use VMA flags on mmap_prepare
mm: add basic VMA flag operation helper functions
tools: bitmap: add missing bitmap_[subset(), andnot()]
mm: add mk_vma_flags() bitmap flag macro helper
...
The verification signature header generation requires converting a
binary certificate to a C array. Previously this only worked with xxd,
and a switch to hexdump has been done in commit b640d556a2
("selftests/bpf: Remove xxd util dependency").
hexdump is a more common utility program, yet it might not be installed
by default. When it is not installed, BPF selftests build without
errors, but tests_progs is unusable: it exits with the 255 code and
without any error messages. When manually reproducing the issue, it is
not too hard to find out that the generated verification_cert.h file is
incorrect, but that's time consuming. When digging the BPF selftests
build logs, this line can be seen amongst thousands others, but ignored:
/bin/sh: 2: hexdump: not found
Here, od is used instead of hexdump. od is coming from the coreutils
package, and this new od command produces the same output when using od
from GNU coreutils, uutils, and even busybox. This is more portable, and
it produces a similar results to what was done before with hexdump:
there is an extra comma at the end instead of trailing whitespaces,
but the C code is not impacted.
Fixes: b640d556a2 ("selftests/bpf: Remove xxd util dependency")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20260218-bpf-sft-hexdump-od-v2-1-2f9b3ee5ab86@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Since we started running selftests in NIPA we have been seeing
tc_actions.sh generate a soft lockup warning on ~20% of the runs.
On the pre-netdev foundation setup it was actually a missed irq
splat from the console. Now it's either that or a lockup.
I initially suspected a socket locking issue since the test
is exercising local loopback with act_mirred.
After hours of staring at this I noticed in strace that ncat
when -o $file is specified _both_ saves the output to the file
and still prints it to stdout. Because the file being sent
is constructed with:
dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$mirred
^^^^^^^^^
the data printed is all \0. Most terminals don't display nul
characters (and neither does vng output capture save them).
But QEMU's serial console still has to poke them thru which
is very slow and causes the lockup (if the file is >600kB).
Replace the '-o $file' with '> $file'. This speeds the test up
from 2m20s to 18s on debug kernels, and prevents the warnings.
Fixes: ca22da2fbd ("act_mirred: use the backlog for nested calls to mirred ingress")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260214035159.2119699-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The tests use the tc pedit action to modify the IPv4 source address
("pedit ex munge ip src set"), but the IP header checksum is not
recalculated after the modification. As a result, the modified packet
fails sanity checks in br_netfilter after bridging and is dropped,
which causes the test to fail.
Fix this by ensuring net.bridge.bridge-nf-call-iptables is set to 0
during the test execution. This prevents the bridge from passing
L2 traffic to netfilter, bypassing the checksum validation that
causes the test failure.
Fixes: 92ad382894 ("selftests: forwarding: Add a test for pedit munge SIP and DIP")
Fixes: 226657ba23 ("selftests: forwarding: Add a forwarding test for pedit munge dsfield")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-4-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test generates VXLAN traffic using mausezahn, where the encapsulated
inner IPv6 packet has an incorrect payload length set in the IPv6 header.
After VXLAN decapsulation, such packets do not pass sanity checks in
br_netfilter and are dropped, which causes the test to fail.
Fix this by setting the correct IPv6 payload length for the encapsulated
packet generated by mausezahn, so that the packet is accepted
by br_netfilter.
tools/testing/selftests/net/forwarding/vxlan_bridge_1d_ipv6.sh
lines 698-706
)"00:03:"$( : Payload length
)"3a:"$( : Next header
)"04:"$( : Hop limit
)"$saddr:"$( : IP saddr
)"$daddr:"$( : IP daddr
)"80:"$( : ICMPv6.type
)"00:"$( : ICMPv6.code
)"00:"$( : ICMPv6.checksum
)
Data after IPv6 header:
• 80: — 1 byte (ICMPv6 type)
• 00: — 1 byte (ICMPv6 code)
• 00: — 1 byte (ICMPv6 checksum, truncated)
Total: 3 bytes → 00:03 is correct. The old value 00:08 did not match
the actual payload size.
Fixes: b07e9957f2 ("selftests: forwarding: Add VxLAN tests with a VLAN-unaware bridge for IPv6")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-3-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test generates VXLAN traffic using mausezahn, where the encapsulated
inner IPv4 packet contains a zero IP header checksum. After VXLAN
decapsulation, such packets do not pass sanity checks in br_netfilter
and are dropped, which causes the test to fail.
Fix this by calculating and setting a valid IPv4 header checksum for the
encapsulated packet generated by mausezahn, so that the packet is accepted
by br_netfilter. Fixed by using the payload_template_calc_checksum() /
payload_template_expand_checksum() helpers that are only available
in v6.3 and newer kernels.
Fixes: a0b61f3d8e ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-2-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add check_rx_hds test that verifies header/data split works across
payload sizes. The test sweeps payload sizes from 1 byte to 8KB, if any
data propagates up to userspace as SCM_DEVMEM_LINEAR, then the test
fails. This shows that regardless of payload size, ncdevmem's
configuration of hds-thresh to 0 is respected.
Add -L (--fail-on-linear) flag to ncdevmem that causes the receiver to
fail if any SCM_DEVMEM_LINEAR cmsg is received.
Use socat option for fixed block sizing and tcp nodelay to disable
nagle's algo to avoid buffering.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-4-55d050e6f606@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull LoongArch updates from Huacai Chen:
- Select HAVE_CMPXCHG_{LOCAL,DOUBLE}
- Add 128-bit atomic cmpxchg support
- Add HOTPLUG_SMT implementation
- Wire up memfd_secret system call
- Fix boot errors and unwind errors for KASAN
- Use BPF prog pack allocator and add BPF arena support
- Update dts files to add nand controllers
- Some bug fixes and other small changes
* tag 'loongarch-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: dts: loongson-2k1000: Add nand controller support
LoongArch: dts: loongson-2k0500: Add nand controller support
LoongArch: BPF: Implement bpf_addr_space_cast instruction
LoongArch: BPF: Implement PROBE_MEM32 pseudo instructions
LoongArch: BPF: Use BPF prog pack allocator
LoongArch: Use IS_ERR_PCPU() macro for KGDB
LoongArch: Rework KASAN initialization for PTW-enabled systems
LoongArch: Disable instrumentation for setup_ptwalker()
LoongArch: Remove some extern variables in source files
LoongArch: Guard percpu handler under !CONFIG_PREEMPT_RT
LoongArch: Handle percpu handler address for ORC unwinder
LoongArch: Use %px to print unmodified unwinding address
LoongArch: Prefer top-down allocation after arch_mem_init()
LoongArch: Add HOTPLUG_SMT implementation
LoongArch: Make cpumask_of_node() robust against NUMA_NO_NODE
LoongArch: Wire up memfd_secret system call
LoongArch: Replace seq_printf() with seq_puts() for simple strings
LoongArch: Add 128-bit atomic cmpxchg support
LoongArch: Add detection for SC.Q support
LoongArch: Select HAVE_CMPXCHG_LOCAL in Kconfig
Pull memblock updates from Mike Rapoport:
- update tools/include/linux/mm.h to fix memblock tests compilation
- drop redundant struct page* parameter from memblock_free_pages() and
get struct page from the pfn
- add underflow detection for size calculation in memtest and warn
about underflow when VM_DEBUG is enabled
* tag 'memblock-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
mm/memtest: add underflow detection for size calculation
memblock: drop redundant 'struct page *' argument from memblock_free_pages()
memblock test: include <linux/sizes.h> from tools mm.h stub
Commit c27cea4416 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
broke map_kptr selftest since it removed the function we were kprobing.
Use a new kfunc that invokes call_rcu_tasks_trace and sets a program
provided pointer to an integer to 1. Technically this can be unsafe if
the memory being written to from the callback disappears, but this is
just for usage in a test where we ensure we spin until we see the value
to be set to 1, so it's ok.
Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Fixes: c27cea4416 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260211185747.3630539-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
wait_for_port() can wait up to 2 seconds with the sleep and the polling
in wait_local_port_listen() combined. So, in netcons_basic.sh, the socat
process could die before the test writes to the netconsole.
Increase the timeout to 3 seconds to make netcons_basic.sh pass
consistently.
Fixes: 3dc6c76391 ("selftests: net: Add IPv6 support to netconsole basic tests")
Signed-off-by: Pin-yen Lin <treapking@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260210005939.3230550-1-treapking@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull KVM updates from Paolo Bonzini:
"Loongarch:
- Add more CPUCFG mask bits
- Improve feature detection
- Add lazy load support for FPU and binary translation (LBT) register
state
- Fix return value for memory reads from and writes to in-kernel
devices
- Add support for detecting preemption from within a guest
- Add KVM steal time test case to tools/selftests
ARM:
- Add support for FEAT_IDST, allowing ID registers that are not
implemented to be reported as a normal trap rather than as an UNDEF
exception
- Add sanitisation of the VTCR_EL2 register, fixing a number of
UXN/PXN/XN bugs in the process
- Full handling of RESx bits, instead of only RES0, and resulting in
SCTLR_EL2 being added to the list of sanitised registers
- More pKVM fixes for features that are not supposed to be exposed to
guests
- Make sure that MTE being disabled on the pKVM host doesn't give it
the ability to attack the hypervisor
- Allow pKVM's host stage-2 mappings to use the Force Write Back
version of the memory attributes by using the "pass-through'
encoding
- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
guest
- Preliminary work for guest GICv5 support
- A bunch of debugfs fixes, removing pointless custom iterators
stored in guest data structures
- A small set of FPSIMD cleanups
- Selftest fixes addressing the incorrect alignment of page
allocation
- Other assorted low-impact fixes and spelling fixes
RISC-V:
- Fixes for issues discoverd by KVM API fuzzing in
kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and
kvm_riscv_vcpu_aia_imsic_update()
- Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM
- Transparent huge page support for hypervisor page tables
- Adjust the number of available guest irq files based on MMIO
register sizes found in the device tree or the ACPI tables
- Add RISC-V specific paging modes to KVM selftests
- Detect paging mode at runtime for selftests
s390:
- Performance improvement for vSIE (aka nested virtualization)
- Completely new memory management. s390 was a special snowflake that
enlisted help from the architecture's page table management to
build hypervisor page tables, in particular enabling sharing the
last level of page tables. This however was a lot of code (~3K
lines) in order to support KVM, and also blocked several features.
The biggest advantages is that the page size of userspace is
completely independent of the page size used by the guest:
userspace can mix normal pages, THPs and hugetlbfs as it sees fit,
and in fact transparent hugepages were not possible before. It's
also now possible to have nested guests and guests with huge pages
running on the same host
- Maintainership change for s390 vfio-pci
- Small quality of life improvement for protected guests
x86:
- Add support for giving the guest full ownership of PMU hardware
(contexted switched around the fastpath run loop) and allowing
direct access to data MSRs and PMCs (restricted by the vPMU model).
KVM still intercepts access to control registers, e.g. to enforce
event filtering and to prevent the guest from profiling sensitive
host state. This is more accurate, since it has no risk of
contention and thus dropped events, and also has significantly less
overhead.
For more information, see the commit message for merge commit
bf2c3138ae ("Merge tag 'kvm-x86-pmu-6.20' ...")
- Disallow changing the virtual CPU model if L2 is active, for all
the same reasons KVM disallows change the model after the first
KVM_RUN
- Fix a bug where KVM would incorrectly reject host accesses to PV
MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled,
even if those were advertised as supported to userspace,
- Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs,
where KVM would attempt to read CR3 configuring an async #PF entry
- Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM
(for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL.
Only a few exports that are intended for external usage, and those
are allowed explicitly
- When checking nested events after a vCPU is unblocked, ignore
-EBUSY instead of WARNing. Userspace can sometimes put the vCPU
into what should be an impossible state, and spurious exit to
userspace on -EBUSY does not really do anything to solve the issue
- Also throw in the towel and drop the WARN on INIT/SIPI being
blocked when vCPU is in Wait-For-SIPI, which also resulted in
playing whack-a-mole with syzkaller stuffing architecturally
impossible states into KVM
- Add support for new Intel instructions that don't require anything
beyond enumerating feature flags to userspace
- Grab SRCU when reading PDPTRs in KVM_GET_SREGS2
- Add WARNs to guard against modifying KVM's CPU caps outside of the
intended setup flow, as nested VMX in particular is sensitive to
unexpected changes in KVM's golden configuration
- Add a quirk to allow userspace to opt-in to actually suppress EOI
broadcasts when the suppression feature is enabled by the guest
(currently limited to split IRQCHIP, i.e. userspace I/O APIC).
Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an
option as some userspaces have come to rely on KVM's buggy behavior
(KVM advertises Supress EOI Broadcast irrespective of whether or
not userspace I/O APIC supports Directed EOIs)
- Clean up KVM's handling of marking mapped vCPU pages dirty
- Drop a pile of *ancient* sanity checks hidden behind in KVM's
unused ASSERT() macro, most of which could be trivially triggered
by the guest and/or user, and all of which were useless
- Fold "struct dest_map" into its sole user, "struct rtc_status", to
make it more obvious what the weird parameter is used for, and to
allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n
- Bury all of ioapic.h, i8254.h and related ioctls (including
KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y
- Add a regression test for recent APICv update fixes
- Handle "hardware APIC ISR", a.k.a. SVI, updates in
kvm_apic_update_apicv() to consolidate the updates, and to
co-locate SVI updates with the updates for KVM's own cache of ISR
information
- Drop a dead function declaration
- Minor cleanups
x86 (Intel):
- Rework KVM's handling of VMCS updates while L2 is active to
temporarily switch to vmcs01 instead of deferring the update until
the next nested VM-Exit.
The deferred updates approach directly contributed to several bugs,
was proving to be a maintenance burden due to the difficulty in
auditing the correctness of deferred updates, and was polluting
"struct nested_vmx" with a growing pile of booleans
- Fix an SGX bug where KVM would incorrectly try to handle EPCM page
faults, and instead always reflect them into the guest. Since KVM
doesn't shadow EPCM entries, EPCM violations cannot be due to KVM
interference and can't be resolved by KVM
- Fix a bug where KVM would register its posted interrupt wakeup
handler even if loading kvm-intel.ko ultimately failed
- Disallow access to vmcb12 fields that aren't fully supported,
mostly to avoid weirdness and complexity for FRED and other
features, where KVM wants enable VMCS shadowing for fields that
conditionally exist
- Print out the "bad" offsets and values if kvm-intel.ko refuses to
load (or refuses to online a CPU) due to a VMCS config mismatch
x86 (AMD):
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure
- Add support for virtualizing ERAPS. Note, correct virtualization of
ERAPS relies on an upcoming, publicly announced change in the APM
to reduce the set of conditions where hardware (i.e. KVM) *must*
flush the RAP
- Ignore nSVM intercepts for instructions that are not supported
according to L1's virtual CPU model
- Add support for expedited writes to the fast MMIO bus, a la VMX's
fastpath for EPT Misconfig
- Don't set GIF when clearing EFER.SVME, as GIF exists independently
of SVM, and allow userspace to restore nested state with GIF=0
- Treat exit_code as an unsigned 64-bit value through all of KVM
- Add support for fetching SNP certificates from userspace
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when
emulating VMLOAD or VMSAVE on behalf of L2
- Misc fixes and cleanups
x86 selftests:
- Add a regression test for TPR<=>CR8 synchronization and IRQ masking
- Overhaul selftest's MMU infrastructure to genericize stage-2 MMU
support, and extend x86's infrastructure to support EPT and NPT
(for L2 guests)
- Extend several nested VMX tests to also cover nested SVM
- Add a selftest for nested VMLOAD/VMSAVE
- Rework the nested dirty log test, originally added as a regression
test for PML where KVM logged L2 GPAs instead of L1 GPAs, to
improve test coverage and to hopefully make the test easier to
understand and maintain
guest_memfd:
- Remove kvm_gmem_populate()'s preparation tracking and half-baked
hugepage handling. SEV/SNP was the only user of the tracking and it
can do it via the RMP
- Retroactively document and enforce (for SNP) that
KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the
source page to be 4KiB aligned, to avoid non-trivial complexity for
something that no known VMM seems to be doing and to avoid an API
special case for in-place conversion, which simply can't support
unaligned sources
- When populating guest_memfd memory, GUP the source page in common
code and pass the refcounted page to the vendor callback, instead
of letting vendor code do the heavy lifting. Doing so avoids a
looming deadlock bug with in-place due an AB-BA conflict betwee
mmap_lock and guest_memfd's filemap invalidate lock
Generic:
- Fix a bug where KVM would ignore the vCPU's selected address space
when creating a vCPU-specific mapping of guest memory. Actually
this bug could not be hit even on x86, the only architecture with
multiple address spaces, but it's a bug nevertheless"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits)
KVM: s390: Increase permitted SE header size to 1 MiB
MAINTAINERS: Replace backup for s390 vfio-pci
KVM: s390: vsie: Fix race in acquire_gmap_shadow()
KVM: s390: vsie: Fix race in walk_guest_tables()
KVM: s390: Use guest address to mark guest page dirty
irqchip/riscv-imsic: Adjust the number of available guest irq files
RISC-V: KVM: Transparent huge page support
RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test
RISC-V: KVM: Allow Zalasr extensions for Guest/VM
KVM: riscv: selftests: Add riscv vm satp modes
KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test
riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM
RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized
RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr()
RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr()
RISC-V: KVM: Remove unnecessary 'ret' assignment
KVM: s390: Add explicit padding to struct kvm_s390_keyop
KVM: LoongArch: selftests: Add steal time test case
LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side
LoongArch: KVM: Add paravirt preempt feature in hypervisor side
...
The rt_stall test measures the runtime ratio between an EXT and an RT
task pinned to the same CPU, verifying that the deadline server prevents
RT tasks from starving SCHED_EXT tasks. It expects the EXT task to get
at least 4% of CPU time.
The test is flaky because sched_stress_test() calls sleep(RUN_TIME)
immediately after fork(), without waiting for the RT child to complete
its setup (set_affinity + set_sched). If the RT child experiences
scheduling latency before completing setup, that delay eats into the
measurement window: the RT child runs for less than RUN_TIME seconds,
and the EXT task's measured ratio drops below the 4% threshold.
For example, in the failing CI run [1]:
EXT=0.140s RT=4.750s total=4.890s (expected ~5.0s)
ratio=2.86% < 4% → FAIL
The 110ms gap (5.0 - 4.89) corresponds to the RT child's setup time
being counted inside the measurement window, during which fewer
deadline server ticks fire for the EXT task.
Fix by using pipes to synchronize: each child signals the parent after
completing its setup, and the parent waits for both signals before
starting sleep(RUN_TIME). This ensures the measurement window only
counts time when both tasks are fully configured and competing.
[1] https://github.com/kernel-patches/bpf/actions/runs/21961895809/job/63442490449
Fixes: be621a7634 ("selftests/sched_ext: Add test for sched_ext dl_server")
Assisted-by: claude-opus-4-6-v1
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull RISC-V updates from Paul Walmsley:
- Add support for control flow integrity for userspace processes.
This is based on the standard RISC-V ISA extensions Zicfiss and
Zicfilp
- Improve ptrace behavior regarding vector registers, and add some
selftests
- Optimize our strlen() assembly
- Enable the ISO-8859-1 code page as built-in, similar to ARM64, for
EFI volume mounting
- Clean up some code slightly, including defining copy_user_page() as
copy_page() rather than memcpy(), aligning us with other
architectures; and using max3() to slightly simplify an expression
in riscv_iommu_init_check()
* tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits)
riscv: lib: optimize strlen loop efficiency
selftests: riscv: vstate_exec_nolibc: Use the regular prctl() function
selftests: riscv: verify ptrace accepts valid vector csr values
selftests: riscv: verify ptrace rejects invalid vector csr inputs
selftests: riscv: verify syscalls discard vector context
selftests: riscv: verify initial vector state with ptrace
selftests: riscv: test ptrace vector interface
riscv: ptrace: validate input vector csr registers
riscv: csr: define vtype register elements
riscv: vector: init vector context with proper vlenb
riscv: ptrace: return ENODATA for inactive vector extension
kselftest/riscv: add kselftest for user mode CFI
riscv: add documentation for shadow stack
riscv: add documentation for landing pad / indirect branch tracking
riscv: create a Kconfig fragment for shadow stack and landing pad support
arch/riscv: add dual vdso creation logic and select vdso based on hw
arch/riscv: compile vdso with landing pad and shadow stack note
riscv: enable kernel access to shadow stack memory via the FWFT SBI call
riscv: add kernel command line option to opt out of user CFI
riscv/hwprobe: add zicfilp / zicfiss enumeration in hwprobe
...
The _get_unused_cpus() function can return CPU numbers >= 16, which
exceeds RPS_MAX_CPUS in toeplitz.c. When this happens, the test fails
with a cryptic message:
# Exception| Traceback (most recent call last):
# Exception| File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/ksft.py", line 319, in ksft_run
# Exception| func(*args)
# Exception| File "/tmp/cur/linux/tools/testing/selftests/drivers/net/hw/toeplitz.py", line 189, in test
# Exception| with bkg(" ".join(rx_cmd), ksft_ready=True, exit_wait=True) as rx_proc:
# Exception| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Exception| File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/utils.py", line 124, in __init__
# Exception| super().__init__(comm, background=True,
# Exception| File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/utils.py", line 77, in __init__
# Exception| raise Exception("Did not receive ready message")
# Exception| Exception: Did not receive ready message
Rename _get_unused_cpus() to _get_unused_rps_cpus() and cap the CPU
search range to RPS_MAX_CPUS.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260210093110.1935149-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The testcase failed as below:
$./vlan_bridge_binding.sh
...
+ adf_ip_link_set_up d1
+ local name=d1
+ shift
+ ip_link_is_up d1
+ ip_link_has_flag d1 UP
+ local name=d1
+ shift
+ local flag=UP
+ shift
++ ip -j link show d1
++ jq --arg flag UP 'any(.[].flags.[]; . == $flag)'
jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START
(Unix shell quoting issues?) at <top-level>, line 1:
any(.[].flags.[]; . == $flag)
jq: 1 compile error
Remove the extra dot (.) after flags array to fix this.
Fixes: 4baa1d3a50 ("selftests: net: lib: Add ip_link_has_flag()")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260211022146.190948-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
People keep removing generated files from .gitignore files even when the
files stay around. Please don't do that: just because the file is no
longer being generated doesn't make it magically go away, and doesn't
make it suddenly be something that should now not be ignored any more.
Fixes: dd2c6ec24f ("selftests/mm: remove virtual_address_range test")
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull CXL updates from Dave Jiang:
- Introduce cxl_memdev_attach and pave way for soft reserved handling,
type2 accelerator enabling, and LSA 2.0 enabling. All these series
require the endpoint driver to settle before continuing the memdev
driver probe.
- Address CXL port error protocol handling and reporting.
The large patch series was split into three parts. The first two
parts are included here with the final part coming later.
The first part consists of a series of code refactoring to PCI AER
sub-system that addresses CXL and also CXL RAS code to prepare for
port error handling.
The second part refactors the CXL code to move management of
component registers to cxl_port objects to allow all CXL AER errors
to be handled through the cxl_port hierarchy.
- Provide AMD Zen5 platform address translation for CXL using ACPI
PRMT. This includes a conventions document to explain why this is
needed and how it's implemented.
- Misc CXL patches of fixes, cleanups, and updates. Including CXL
address translation for unaligned MOD3 regions.
[ TLA service: CXL is "Compute Express Link" ]
* tag 'cxl-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (59 commits)
cxl: Disable HPA/SPA translation handlers for Normalized Addressing
cxl/region: Factor out code into cxl_region_setup_poison()
cxl/atl: Lock decoders that need address translation
cxl: Enable AMD Zen5 address translation using ACPI PRMT
cxl/acpi: Prepare use of EFI runtime services
cxl: Introduce callback for HPA address ranges translation
cxl/region: Use region data to get the root decoder
cxl/region: Add @hpa_range argument to function cxl_calc_interleave_pos()
cxl/region: Separate region parameter setup and region construction
cxl: Simplify cxl_root_ops allocation and handling
cxl/region: Store HPA range in struct cxl_region
cxl/region: Store root decoder in struct cxl_region
cxl/region: Rename misleading variable name @hpa to @hpa_range
Documentation/driver-api/cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
cxl, doc: Moving conventions in separate files
cxl, doc: Remove isonum.txt inclusion
cxl/port: Unify endpoint and switch port lookup
cxl/port: Move endpoint component register management to cxl_port
cxl/port: Map Port RAS registers
cxl/port: Move dport RAS setup to dport add time
...
Pull VFIO updates from Alex Williamson:
"A small cycle with the bulk in selftests and reintroducing poison
handling in the nvgrace-gpu driver. The rest are fixes, cleanups, and
some dmabuf structure consolidation.
- Update outdated mdev comment referencing the renamed
mdev_type_add() function (Julia Lawall)
- Introduce selftest support for IOMMU mapping of PCI MMIO BARs (Alex
Mastro)
- Relax selftest assertion relative to differences in huge page
handling between legacy (v1) TYPE1 IOMMU mapping behavior and the
compatibility mode supported by IOMMUFD (David Matlack)
- Reintroduce memory poison handling support for non-struct-page-
backed memory in the nvgrace-gpu variant driver (Ankit Agrawal)
- Replace dma_buf_phys_vec with phys_vec to avoid duplicate structure
and semantics (Leon Romanovsky)
- Add missing upstream bridge locking across PCI function reset,
resolving an assertion failure when secondary bus reset is used to
provide that reset (Anthony Pighin)
- Fixes to hisi_acc vfio-pci variant driver to resolve corner case
issues related to resets, repeated migration, and error injection
scenarios (Longfang Liu, Weili Qian)
- Restrict vfio selftest builds to arm64 and x86_64, resolving
compiler warnings on 32-bit archs (Ted Logan)
- Un-deprecate the fsl-mc vfio bus driver as a new maintainer has
stepped up (Ioana Ciornei)"
* tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfio:
vfio/fsl-mc: add myself as maintainer
vfio: selftests: only build tests on arm64 and x86_64
hisi_acc_vfio_pci: fix the queue parameter anomaly issue
hisi_acc_vfio_pci: resolve duplicate migration states
hisi_acc_vfio_pci: update status after RAS error
hisi_acc_vfio_pci: fix VF reset timeout issue
vfio/pci: Lock upstream bridge for vfio_pci_core_disable()
types: reuse common phys_vec type instead of DMABUF open‑coded variant
vfio/nvgrace-gpu: register device memory for poison handling
mm: add stubs for PFNMAP memory failure registration functions
vfio: selftests: Drop IOMMU mapping size assertions for VFIO_TYPE1_IOMMU
vfio: selftests: Add vfio_dma_mapping_mmio_test
vfio: selftests: Align BAR mmaps for efficient IOMMU mapping
vfio: selftests: Centralize IOMMU mode name definitions
vfio/mdev: update outdated comment
selftests/memfd: use IPC semaphore instead of SIGSTOP/SIGCONT
In order to synchronize new processes to test inheritance of memfd_noexec
sysctl, memfd_test sets up the sysctl with a value before creating the new
process. The new process then sends itself a SIGSTOP in order to wait for
the parent to flip the sysctl value and send a SIGCONT signal.
This would work as intended if it wasn't the fact that the new process is
being created with CLONE_NEWPID, which creates a new PID namespace and the
new process has PID 1 in this namespace. There're restrictions on sending
signals to PID 1 and, although it's relaxed for other than root PID
namespace, it's biting us here. In this specific case the SIGSTOP sent by
the new process is ignored (no error to kill() is returned) and it never
stops its execution. This is usually not noticiable as the parent usually
manages to set the new sysctl value before the child has a chance to run
and the test succeeds. But if you run the test in a loop, it eventually
reproduces:
while [ 1 ]; do ./memfd_test >log 2>&1 || break; done; cat log
So this patch replaces the SIGSTOP/SIGCONT synchronization with IPC
semaphore.
Link: https://lkml.kernel.org/r/a7776389-b3d6-4b18-b438-0b0e3ed1fd3b@work
Fixes: 6469b66e3f ("selftests: improve vm.memfd_noexec sysctl tests")
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: liuye <liuye@kylinos.cn>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/mm: add memory failure selftests", v4.
Introduce selftests to validate the functionality of memory failure.
These tests help ensure that memory failure handling for anonymous pages,
pagecaches pages works correctly, including proper SIGBUS delivery to user
processes, page isolation, and recovery paths.
Currently madvise syscall is used to inject memory failures. And only
anonymous pages and pagecaches are tested. More test scenarios, e.g.
hugetlb, shmem, thp, will be added. Also more memory failure injecting
methods will be supported, e.g. APEI Error INJection, if required.
This patch (of 3):
This patch adds a new kselftest to validate memory failure handling for
anonymous pages. The test performs the following operations:
1. Allocates anonymous pages using mmap().
2. Injects memory failure via madvise syscall.
3. Verifies expected error handling behavior.
4. Unpoison memory.
This test helps ensure that memory failure handling for anonymous pages
works correctly, including proper SIGBUS delivery to user processes, page
isolation and recovery paths.
Link: https://lkml.kernel.org/r/20260206031639.2707102-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20260206031639.2707102-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now we have the mk_vma_flags() macro helper which permits easy
specification of any number of VMA flags, add helper functions which
operate with vma_flags_t parameters.
This patch provides vma_flags_test[_mask](), vma_flags_set[_mask]() and
vma_flags_clear[_mask]() respectively testing, setting and clearing flags
with the _mask variants accepting vma_flag_t parameters, and the non-mask
variants implemented as macros which accept a list of flags.
This allows us to trivially test/set/clear aggregate VMA flag values as
necessary, for instance:
if (vma_flags_test(&flags, VMA_READ_BIT, VMA_WRITE_BIT))
goto readwrite;
vma_flags_set(&flags, VMA_READ_BIT, VMA_WRITE_BIT);
vma_flags_clear(&flags, VMA_READ_BIT, VMA_WRITE_BIT);
We also add a function for testing that ALL flags are set for convenience,
e.g.:
if (vma_flags_test_all(&flags, VMA_READ_BIT, VMA_MAYREAD_BIT)) {
/* Both READ and MAYREAD flags set */
...
}
The compiler generates optimal assembly for each such that they behave as
if the caller were setting the bitmap flags manually.
This is important for e.g. drivers which manipulate flag values rather
than a VMA's specific flag values.
We also add helpers for testing, setting and clearing flags for VMA's and
VMA descriptors to reduce boilerplate.
Also add the EMPTY_VMA_FLAGS define to aid initialisation of empty flags.
Finally, update the userland VMA tests to add the helpers there so they
can be utilised as part of userland testing.
Link: https://lkml.kernel.org/r/885d4897d67a6a57c0b07fa182a7055ad752df11.1769097829.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Yury Norov <ynorov@nvidia.com>
Cc: Chris Mason <clm@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull non-MM updates from Andrew Morton:
- "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
disk space by teaching ocfs2 to reclaim suballocator block group
space (Heming Zhao)
- "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
ARRAY_END() macro and uses it in various places (Alejandro Colomar)
- "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
page size (Pnina Feder)
- "kallsyms: Prevent invalid access when showing module buildid" cleans
up kallsyms code related to module buildid and fixes an invalid
access crash when printing backtraces (Petr Mladek)
- "Address page fault in ima_restore_measurement_list()" fixes a
kexec-related crash that can occur when booting the second-stage
kernel on x86 (Harshit Mogalapalli)
- "kho: ABI headers and Documentation updates" updates the kexec
handover ABI documentation (Mike Rapoport)
- "Align atomic storage" adds the __aligned attribute to atomic_t and
atomic64_t definitions to get natural alignment of both types on
csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)
- "kho: clean up page initialization logic" simplifies the page
initialization logic in kho_restore_page() (Pratyush Yadav)
- "Unload linux/kernel.h" moves several things out of kernel.h and into
more appropriate places (Yury Norov)
- "don't abuse task_struct.group_leader" removes the usage of
->group_leader when it is "obviously unnecessary" (Oleg Nesterov)
- "list private v2 & luo flb" adds some infrastructure improvements to
the live update orchestrator (Pasha Tatashin)
* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
procfs: fix missing RCU protection when reading real_parent in do_task_stat()
watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
kho: fix doc for kho_restore_pages()
tests/liveupdate: add in-kernel liveupdate test
liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
liveupdate: luo_file: Use private list
list: add kunit test for private list primitives
list: add primitives for private list manipulations
delayacct: fix uapi timespec64 definition
panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
netclassid: use thread_group_leader(p) in update_classid_task()
RDMA/umem: don't abuse current->group_leader
drm/pan*: don't abuse current->group_leader
drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
drm/amdgpu: don't abuse current->group_leader
android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
android/binder: don't abuse current->group_leader
kho: skip memoryless NUMA nodes when reserving scratch areas
...