Commit Graph

49385 Commits

Author SHA1 Message Date
Benjamin Berg
ca38943e83 selftests/nolibc: remove outdated comment about construct order
The constructor order is not (and should not) be tested. Remove the
comment.

Fixes: a782d45c86 ("selftests/nolibc: stop testing constructor order")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://lore.kernel.org/r/20250731201225.323254-3-benjamin@sipsolutions.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-09-01 20:48:41 +02:00
Benjamin Berg
6d33ce3634 selftests/nolibc: fix EXPECT_NZ macro
The expect non-zero macro was incorrect and never used. Fix its
definition.

Fixes: 362aecb2d8 ("selftests/nolibc: add basic infrastructure to ease creation of nolibc tests")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://lore.kernel.org/r/20250731201225.323254-2-benjamin@sipsolutions.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-09-01 20:48:41 +02:00
Thomas Weißschuh
4c2ef951cf tools/nolibc: drop wait4() support
Not all architectures implement the wait4() syscall. It can be
implemented in terms of the waitid() syscall, but that would require
some rework of the other wait-related functions in wait.h.

As wait4() is non-standard and deprecated, remove it.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250821-nolibc-enosys-v1-7-4b63f2caaa89@weissschuh.net
2025-09-01 20:48:40 +02:00
Thomas Weißschuh
61a3cf7934 kselftest/arm64: tpidr2: Switch to waitpid() over wait4()
wait4() is deprecated, non-standard and about to be removed from nolibc.

Switch to the equivalent waitpid() call.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250821-nolibc-enosys-v1-6-4b63f2caaa89@weissschuh.net
2025-09-01 20:48:35 +02:00
Thomas Weißschuh
f11e156e0f tools/nolibc: fold llseek fallback into lseek()
Align the implementation of the fallback handling inside sys_lseek()
with the rest of nolibc.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250821-nolibc-enosys-v1-5-4b63f2caaa89@weissschuh.net
2025-09-01 20:47:54 +02:00
Thomas Weißschuh
fbd47de755 tools/nolibc: remove __nolibc_enosys() fallback from fork functions
All architectures have one of the real functions available.
The additional fallback to __nolibc_enosys() is superfluous.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250821-nolibc-enosys-v1-4-4b63f2caaa89@weissschuh.net
2025-09-01 20:47:54 +02:00
Thomas Weißschuh
09adec1f4b tools/nolibc: remove __nolibc_enosys() fallback from dup2()
All architectures have one of the real functions available.
The additional fallback to __nolibc_enosys() is superfluous.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250821-nolibc-enosys-v1-3-4b63f2caaa89@weissschuh.net
2025-09-01 20:47:53 +02:00
Thomas Weißschuh
4b6ffb2d87 tools/nolibc: remove __nolibc_enosys() fallback from *at() functions
All architectures have had one of the real functions available since
Linux 2.6.12. The additional fallback to __nolibc_enosys() is superfluous.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250821-nolibc-enosys-v1-2-4b63f2caaa89@weissschuh.net
2025-09-01 20:47:52 +02:00
Thomas Weißschuh
e6366101ce tools/nolibc: remove __nolibc_enosys() fallback from time64-related functions
These fallbacks where added when no explicit fallbacks for time64 was
implemented. Now that these fallbacks are in place, the additional
fallback to __nolibc_enosys() is superfluous.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250821-nolibc-enosys-v1-1-4b63f2caaa89@weissschuh.net
2025-09-01 20:47:51 +02:00
Thomas Weißschuh
b22d81ed31 tools/nolibc: use tabs instead of spaces for indentation
Some lines are using spaces for indentation instead of the standard tabs.

Fix them.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-09-01 20:47:50 +02:00
Thomas Weißschuh
d1ff0e2d13 tools/nolibc: avoid error in dup2() if old fd equals new fd
dup2() allows both 'old' and 'new' to have the same value, which dup3()
does not. If libc dup2() is implemented through the dup3() system call,
then it would incorrectly fail in this case.

Avoid the error by handling old == new explicitly.

Fixes: 30ca20517a ("tools headers: Move the nolibc header from rcutorture to tools/include/nolibc/")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250820-nolibc-dup2-einval-v2-1-807185a45c56@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-09-01 20:47:36 +02:00
Dan Carpenter
237bfb76c9 selftests/futex: Fix futex_wait() for 32bit ARM
On 32bit ARM systems gcc-12 will use 32bit timestamps while gcc-13 and later
will use 64bit timestamps.  The problem is that SYS_futex will continue
pointing at the 32bit system call.  This makes the futex_wait test fail like
this:

  waiter failed errno 110
  not ok 1 futex_wake private returned: 0 Success
  waiter failed errno 110
  not ok 2 futex_wake shared (page anon) returned: 0 Success
  waiter failed errno 110
  not ok 3 futex_wake shared (file backed) returned: 0 Success

Instead of compiling differently depending on the gcc version, use the
-D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 options to ensure that 64bit timestamps
are used.  Then use ifdefs to make SYS_futex point to the 64bit system call.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/20250827130011.677600-6-bigeasy@linutronix.de
2025-09-01 16:26:55 +02:00
Gopi Krishna Menon
5fdb877b49 selftests/futex: Fix typos and grammar in futex_priv_hash
Fix multiple typos and small grammar issues in help text, comments and test
messages in the futex_priv_hash test.

Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/20250827130011.677600-5-bigeasy@linutronix.de
2025-09-01 16:21:02 +02:00
Nai-Chen Cheng
f8ef9c2402 selftests/futex: Fix format-security warnings in futex_priv_hash
Fix format-security warnings by using proper format strings when passing
message variables to ksft_exit_fail_msg(), ksft_test_result_pass(), and
ksft_test_result_skip() function.

Thus prevent potential security issues and eliminate compiler warnings when
building with -Wformat-security.

Signed-off-by: Nai-Chen Cheng <bleach1827@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250827130011.677600-4-bigeasy@linutronix.de
2025-09-01 16:20:08 +02:00
Waiman Long
d8e2f91999 selftests/futex: Fix some futex_numa_mpol subtests
The "Memory out of range" subtest of futex_numa_mpol assumes that memory
access outside of the mmap'ed area is invalid. That may not be the case
depending on the actual memory layout of the test application. When that
subtest was run on an x86-64 system with latest upstream kernel, the test
passed as an error was returned from futex_wake(). On another PowerPC system,
the same subtest failed because futex_wake() returned 0.

  Bail out! futex2_wake(64, 0x86) should fail, but didn't

Looking further into the passed subtest on x86-64, it was found that an
-EINVAL was returned instead of -EFAULT. The -EINVAL error was returned
because the node value test with FLAGS_NUMA set failed with a node value
of 0x7f7f. IOW, the futex memory was accessible and futex_wake() failed
because the supposed node number wasn't valid. If that memory location
happens to have a very small value (e.g. 0), the test will pass and no
error will be returned.

Since this subtest is non-deterministic, drop it unless a guard page beyond
the mmap region is explicitly set.

The other problematic test is the "Memory too small" test. The futex_wake()
function returns the -EINVAL error code because the given futex address isn't
8-byte aligned, not because only 4 of the 8 bytes are valid and the other
4 bytes are not. So change the name of this subtest to "Mis-aligned futex" to
reflect the reality.

  [ bp: Massage commit message. ]

Fixes: 3163369407 ("selftests/futex: Add futex_numa_mpol")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/20250827130011.677600-3-bigeasy@linutronix.de
2025-09-01 16:12:54 +02:00
Thomas Huth
74db6cc331 powerpc: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembler code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This is bad since macros starting with two underscores are names
that are reserved by the C language. It can also be very confusing
for the developers when switching between userspace and kernelspace
coding, or when dealing with uapi headers that rather should use
__ASSEMBLER__  instead. So let's standardize now on the __ASSEMBLER__
macro that is provided by the compilers.

This is almost a completely mechanical patch (done with a simple
"sed -i" statement), apart from tweaking two comments manually in
arch/powerpc/include/asm/bug.h and arch/powerpc/include/asm/kasan.h
(which did not have proper underscores at the end) and fixing a
checkpatch error about spaces in arch/powerpc/include/asm/spu_csa.h.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250801082007.32904-3-thuth@redhat.com
2025-09-01 13:23:29 +05:30
Linus Torvalds
c8bc81a52d Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:

 - CFI failure due to kpti_ng_pgd_alloc() signature mismatch

 - Underallocation bug in the SVE ptrace kselftest

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  kselftest/arm64: Don't open code SVE_PT_SIZE() in fp-ptrace
  arm64: mm: Fix CFI failure due to kpti_ng_pgd_alloc function signature
2025-08-30 10:43:53 -07:00
Mark Brown
d82aa5d350 kselftest/arm64: Don't open code SVE_PT_SIZE() in fp-ptrace
In fp-trace when allocating a buffer to write SVE register data we open
code the addition of the header size to the VL depeendent register data
size, which lead to an underallocation bug when we cut'n'pasted the code
for FPSIMD format writes. Use the SVE_PT_SIZE() macro that the kernel
UAPI provides for this.

Fixes: b84d2b2795 ("kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250812-arm64-fp-trace-macro-v1-1-317cfff986a5@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-08-30 11:31:11 +01:00
Takashi Iwai
14f628cb58 Merge branch 'for-linus' into for-next
Pull 6.17 devel branch for further auto-cleanup updates.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-08-30 09:42:45 +02:00
Liao Yuanhong
2a63607bfd vsock/test: Remove redundant semicolons
Remove unnecessary semicolons.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250828083938.400872-1-liaoyuanhong@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29 19:10:52 -07:00
Florian Westphal
d6a367ec6c netfilter: nft_flowtable.sh: re-run with random mtu sizes
Jakub says:
 nft_flowtable.sh is one of the most flake-atious test for netdev CI currently :(

The root cause is two-fold:
1. the failing part of the test is supposed to make sure that ip
   fragments are forwarded for offloaded flows.
   (flowtable has to pass them to classic forward path).
   path mtu discovery for these subtests is disabled.

2. nft_flowtable.sh has two passes.  One with fixed mtus/file size and
  one where link mtus and file sizes are random.

The CI failures all have same pattern:
  re-run with random mtus and file size: -o 27663 -l 4117 -r 10089 -s 54384840
  [..]
  PASS: dscp_egress: dscp packet counters match
  FAIL: file mismatch for ns1 -> ns2

In some cases this error triggers a bit ealier, sometimes in a later
subtest:
  re-run with random mtus and file size: -o 20201 -l 4555 -r 12657 -s 9405856
  [..]
  PASS: dscp_egress: dscp packet counters match
  PASS: dscp_fwd: dscp packet counters match
  2025/08/17 20:37:52 socat[18954] E write(7, 0x560716b96000, 8192): Broken pipe
  FAIL: file mismatch for ns1 -> ns2
  -rw------- 1 root root 9405856 Aug 17 20:36 /tmp/tmp.2n63vlTrQe

But all logs I saw show same scenario:
1. Failing tests have pmtu discovery off (i.e., ip fragmentation)
2. The test file is much larger than first-pass default (2M Byte)
3. peers have much larger MTUs compared to the 'network'.

These errors are very reproducible when re-running the test with
the same commandline arguments.

The timeout became much more prominent with
1d2fbaad7c ("tcp: stronger sk_rcvbuf checks"): reassembled packets
typically have a skb->truesize more than double the skb length.

As that commit is intentional and pmtud-off with
large-tcp-packets-as-fragments is not normal adjust the test to use a
smaller file for the pmtu-off subtests.

While at it, add more information to pass/fail messages and
also run the dscp alteration subtest with pmtu discovery enabled.

Link: https://netdev.bots.linux.dev/contest.html?test=nft-flowtable-sh
Fixes: f84ab63490 ("selftests: netfilter: nft_flowtable.sh: re-run with random mtu sizes")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20250822071330.4168f0db@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250828214918.3385-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29 18:53:53 -07:00
Mauro Carvalho Chehab
242cfe3f77 tools: docs: parse_data_structs.py: add methods to return output
When running it from command line, we want to write an output
file, but when used as a class, one may just want the output
content returned as a string.

Split write_output() on two methods to allow both usecases.

Also add an extra method to produce a TOC.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/c98bdec3380aad54178baf2751a2f1fcd128576b.1755872208.git.mchehab+huawei@kernel.org
2025-08-29 15:54:42 -06:00
Mauro Carvalho Chehab
cde494660f tools: docs: parse-headers.py: move it from sphinx dir
As suggested by Jon, we should start having a tools/docs
directory, instead of placing everything under scripts.

In the specific case of parse-headers.py, the previous
location is where we're placing Sphinx extensions, which is
not the right place for execs.

Move it to tools/docs/parse-headers.py.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/0f5ac2d704cffe9834e589b39549d2393e1237ef.1755872208.git.mchehab+huawei@kernel.org
2025-08-29 15:54:42 -06:00
Linus Torvalds
11e7861d68 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Correctly handle 'invariant' system registers for protected VMs

   - Improved handling of VNCR data aborts, including external aborts

   - Fixes for handling of FEAT_RAS for NV guests, providing a sane
     fault context during SEA injection and preventing the use of
     RASv1p1 fault injection hardware

   - Ensure that page table destruction when a VM is destroyed gives an
     opportunity to reschedule

   - Large fix to KVM's infrastructure for managing guest context loaded
     on the CPU, addressing issues where the output of AT emulation
     doesn't get reflected to the guest

   - Fix AT S12 emulation to actually perform stage-2 translation when
     necessary

   - Avoid attempting vLPI irqbypass when GICv4 has been explicitly
     disabled for a VM

   - Minor KVM + selftest fixes

  RISC-V:

   - Fix pte settings within kvm_riscv_gstage_ioremap()

   - Fix comments in kvm_riscv_check_vcpu_requests()

   - Fix stack overrun when setting vlenb via ONE_REG

  x86:

   - Use array_index_nospec() to sanitize the target vCPU ID when
     handling PV IPIs and yields as the ID is guest-controlled.

   - Drop a superfluous cpumask_empty() check when reclaiming SEV
     memory, as the common case, by far, is that at least one CPU will
     have entered the VM, and wbnoinvd_on_cpus_mask() will naturally
     handle the rare case where the set of have_run_cpus is empty.

  Selftests (not KVM):

   - Rename the is_signed_type() macro in kselftest_harness.h to
     is_signed_var() to fix a collision with linux/overflow.h. The
     collision generates compiler warnings due to the two macros having
     different meaning"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits)
  KVM: arm64: nv: Fix ATS12 handling of single-stage translation
  KVM: arm64: Remove __vcpu_{read,write}_sys_reg_{from,to}_cpu()
  KVM: arm64: Fix vcpu_{read,write}_sys_reg() accessors
  KVM: arm64: Simplify sysreg access on exception delivery
  KVM: arm64: Check for SYSREGS_ON_CPU before accessing the 32bit state
  RISC-V: KVM: fix stack overrun when loading vlenb
  RISC-V: KVM: Correct kvm_riscv_check_vcpu_requests() comment
  RISC-V: KVM: Fix pte settings within kvm_riscv_gstage_ioremap()
  KVM: arm64: selftests: Sync ID_AA64MMFR3_EL1 in set_id_regs
  KVM: arm64: Get rid of ARM64_FEATURE_MASK()
  KVM: arm64: Make ID_AA64PFR1_EL1.RAS_frac writable
  KVM: arm64: Make ID_AA64PFR0_EL1.RAS writable
  KVM: arm64: Ignore HCR_EL2.FIEN set by L1 guest's EL2
  KVM: arm64: Handle RASv1p1 registers
  arm64: Add capability denoting FEAT_RASv1p1
  KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables
  KVM: arm64: Split kvm_pgtable_stage2_destroy()
  selftests: harness: Rename is_signed_type() to avoid collision with overflow.h
  KVM: SEV: don't check have_run_cpus in sev_writeback_caches()
  KVM: arm64: Correctly populate FAR_EL2 on nested SEA injection
  ...
2025-08-29 13:54:26 -07:00
Jakub Kicinski
d23ad54de7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc4).

No conflicts.

Adjacent changes:

drivers/net/ethernet/intel/idpf/idpf_txrx.c
  02614eee26 ("idpf: do not linearize big TSO packets")
  6c4e684802 ("idpf: remove obsolete stashing code")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29 11:48:01 -07:00
Paolo Bonzini
42a0305ab1 Merge tag 'kvmarm-fixes-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, take #2

 - Correctly handle 'invariant' system registers for protected VMs

 - Improved handling of VNCR data aborts, including external aborts

 - Fixes for handling of FEAT_RAS for NV guests, providing a sane
   fault context during SEA injection and preventing the use of
   RASv1p1 fault injection hardware

 - Ensure that page table destruction when a VM is destroyed gives an
   opportunity to reschedule

 - Large fix to KVM's infrastructure for managing guest context loaded
   on the CPU, addressing issues where the output of AT emulation
   doesn't get reflected to the guest

 - Fix AT S12 emulation to actually perform stage-2 translation when
   necessary

 - Avoid attempting vLPI irqbypass when GICv4 has been explicitly
   disabled for a VM

 - Minor KVM + selftest fixes
2025-08-29 12:57:31 -04:00
Sebastian Andrzej Siewior
2e62688d58 selftests/futex: Remove the -g parameter from futex_priv_hash
The -g parameter was meant to the test the immutable global hash instead of the
private hash which has been made immutable. The global hash is tested as part
at the end of the regular test. The immutable private hash been removed.

Remove last traces of the immutable private hash.

Fixes: 16adc7f136 ("selftests/futex: Remove support for IMMUTABLE")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/20250827130011.677600-2-bigeasy@linutronix.de
2025-08-29 12:01:17 +02:00
Linus Torvalds
d1cf752d58 Merge tag 'block-6.17-20250828' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - Fix a lockdep spotted issue on recursive locking for zoned writes, in
   case of errors

 - Update bcache MAINTAINERS entry address for Coly

 - Fix for a ublk release issue, with selftests

 - Fix for a regression introduced in this cycle, where it assumed
   q->rq_qos was always set if the bio flag indicated that

 - Fix for a regression introduced in this cycle, where loop retrieving
   block device sizes got broken

* tag 'block-6.17-20250828' of git://git.kernel.dk/linux:
  bcache: change maintainer's email address
  ublk selftests: add --no_ublk_fixed_fd for not using registered ublk char device
  ublk: avoid ublk_io_release() called after ublk char dev is closed
  block: validate QoS before calling __rq_qos_done_bio()
  blk-zoned: Fix a lockdep complaint about recursive locking
  loop: fix zero sized loop for block special file
2025-08-28 18:51:28 -07:00
Nathan Chancellor
aa943a280e Merge patch series "Bump minimum supported version of LLVM for building the kernel to 15.0.0"
s390 and x86 have required LLVM 15 since

  30d17fac6a ("scripts/min-tool-version.sh: raise minimum clang version to 15.0.0 for s390")
  7861640aac ("x86/build: Raise the minimum LLVM version to 15.0.0")

respectively. This series bumps the rest of the kernel to 15.0.0 to
match, which allows for a decent number of clean ups.

On the distros front, we will only leave behind Debian Bookworm and
Ubuntu Jammy. In both of those cases, builders / developers can either
use the kernel.org toolchains or https://apt.llvm.org to get newer
versions that will run on those distributions, if they cannot upgrade.

  archlinux:latest              clang version 20.1.8
  debian:oldoldstable-slim      Debian clang version 11.0.1-2
  debian:oldstable-slim         Debian clang version 14.0.6
  debian:stable-slim            Debian clang version 19.1.7 (3+b1)
  debian:testing-slim           Debian clang version 19.1.7 (3+b1)
  debian:unstable-slim          Debian clang version 19.1.7 (3+b2)
  fedora:41                     clang version 19.1.7 (Fedora 19.1.7-4.fc41)
  fedora:latest                 clang version 20.1.8 (Fedora 20.1.8-3.fc42)
  fedora:rawhide                clang version 20.1.8 (Fedora 20.1.8-3.fc43)
  opensuse/leap:latest          clang version 17.0.6
  opensuse/tumbleweed:latest    clang version 20.1.8
  ubuntu:focal                  clang version 10.0.0-4ubuntu1
  ubuntu:jammy                  Ubuntu clang version 14.0.0-1ubuntu1.1
  ubuntu:noble                  Ubuntu clang version 18.1.3 (1ubuntu1)
  ubuntu:latest                 Ubuntu clang version 18.1.3 (1ubuntu1)
  ubuntu:rolling                Ubuntu clang version 20.1.2 (0ubuntu1)
  ubuntu:devel                  Ubuntu clang version 20.1.8 (0ubuntu1)

Link: https://lore.kernel.org/r/20250821-bump-min-llvm-ver-15-v2-0-635f3294e5f0@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-08-28 16:58:51 -07:00
Nathan Chancellor
573ad421cc objtool: Drop noinstr hack for KCSAN_WEAK_MEMORY
Now that the minimum supported version of LLVM for building the kernel
has been bumped to 15.0.0, __no_kcsan will always ensure that the thread
sanitizer functions are not generated, so remove the check for tsan
functions in is_profiling_func() and the always true depends and
unnecessary select lines in KCSAN_WEAK_MEMORY.

Acked-by: Marco Elver <elver@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infraded.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250821-bump-min-llvm-ver-15-v2-11-635f3294e5f0@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-08-28 16:58:46 -07:00
Jakub Kicinski
c158b5a570 selftests: drv-net: rss_ctx: fix the queue count check
Commit 0d6ccfe6b3 ("selftests: drv-net: rss_ctx: check for all-zero keys")
added a skip exception if NIC has fewer than 3 queues enabled,
but it's just constructing the object, it's not actually rising
this exception.

Before:

  # Exception| net.lib.py.utils.CmdExitFailure: Command failed: ethtool -X enp1s0 equal 3 hkey d1:cc:77:47:9d:ea:15:f2:b9:6c:ef:68:62:c0:45:d5:b0:99:7d:cf:29:53:40:06:3d:8e:b9:bc:d4:70:89:b8:8d:59:04:ea:a9:c2:21:b3:55:b8:ab:6b:d9:48:b4:bd:4c:ff:a5:f0:a8:c2
  not ok 1 rss_ctx.test_rss_key_indir

After:

  ok 1 rss_ctx.test_rss_key_indir # SKIP Device has fewer than 3 queues (or doesn't support queue stats)

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827173558.3259072-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-28 16:09:14 -07:00
David Matlack
03e073bc4d vfio: selftests: Fix .gitignore for already tracked files
Fix the rules in tools/testing/selftests/vfio/.gitignore to not ignore
some already tracked files (.gitignore, Makefile, lib/libvfio.mk).

This change should be a no-op, since these files are already tracked by git
and thus git will not ignore updates to them even though they match the
ignore rules in the VFIO selftests .gitignore file.

However, they do generate warnings with W=1, as reported by the kernel test
robot.

  $ KBUILD_EXTRA_WARN=1 scripts/misc-check
  tools/testing/selftests/vfio/.gitignore: warning: ignored by one of the .gitignore files
  tools/testing/selftests/vfio/Makefile: warning: ignored by one of the .gitignore files
  tools/testing/selftests/vfio/lib/libvfio.mk: warning: ignored by one of the .gitignore files

Fix this by explicitly un-ignoring the tracked files.

Fixes: 292e9ee22b ("selftests: Create tools/testing/selftests/vfio")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508280918.rFRyiLEU-lkp@intel.com/
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250828185815.382215-1-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-08-28 13:40:51 -06:00
Ricardo B. Marlière
98857d111c selftests/bpf: Fix bpf_prog_detach2 usage in test_lirc_mode2
Commit e9fc3ce99b ("libbpf: Streamline error reporting for high-level
APIs") redefined the way that bpf_prog_detach2() returns. Therefore, adapt
the usage in test_lirc_mode2_user.c.

Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250828-selftests-bpf-v1-1-c7811cd8b98c@suse.com
2025-08-28 09:49:46 -07:00
Namhyung Kim
53a61a6ca2 perf annotate: Add dso__debuginfo() helper
It'd be great if it can get the correct debug information using DSO
build-Id not just the path name.  Instead of adding new callsites of
debuginfo__new(), let's add dso__debuginfo() which can hide the access
using the pathname and help the future conversion.

Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-08-28 12:35:42 -03:00
Namhyung Kim
d69f56545e perf annotate: Hide data-type for stack operation and canary
It's mostly unnecessary to print when it has no actual type information
like in the stack operations and canary.  Let's have them if -v option
is given.

Before:

  $ perf annotate --code-with-type
  ...
         : 0    0xd640 <_dl_relocate_object>:
    0.00 :      0:       endbr64
    0.00 :      4:       pushq   %rbp           # data-type: (stack operation)
    0.00 :      5:       movq    %rsp, %rbp
    0.00 :      8:       pushq   %r15           # data-type: (stack operation)
    0.00 :      a:       pushq   %r14           # data-type: (stack operation)
    0.00 :      c:       pushq   %r13           # data-type: (stack operation)
    0.00 :      e:       pushq   %r12           # data-type: (stack operation)
    0.00 :     10:       pushq   %rbx           # data-type: (stack operation)
    0.00 :     11:       subq    $0xf8, %rsp
    ...
    0.00 :     d4:       testl   %eax, %eax
    0.00 :     d6:       jne     0xf424
    0.00 :     dc:       movq    0xf0(%r14), %rbx               # data-type: struct link_map +0xf0
    0.00 :     e3:       testq   %rbx, %rbx
    0.00 :     e6:       jne     0xf2dd
    0.00 :     ec:       cmpq    $0, 0xf8(%r14)         # data-type: struct link_map +0xf8
    ...

After:

         : 0    0xd640 <_dl_relocate_object>:
    0.00 :      0:       endbr64
    0.00 :      4:       pushq   %rbp
    0.00 :      5:       movq    %rsp, %rbp
    0.00 :      8:       pushq   %r15
    0.00 :      a:       pushq   %r14
    0.00 :      c:       pushq   %r13
    0.00 :      e:       pushq   %r12
    0.00 :     10:       pushq   %rbx
    0.00 :     11:       subq    $0xf8, %rsp
    ...
    0.00 :     d4:       testl   %eax, %eax
    0.00 :     d6:       jne     0xf424
    0.00 :     dc:       movq    0xf0(%r14), %rbx               # data-type: struct link_map +0xf0
    0.00 :     e3:       testq   %rbx, %rbx
    0.00 :     e6:       jne     0xf2dd
    0.00 :     ec:       cmpq    $0, 0xf8(%r14)         # data-type: struct link_map +0xf8
    ...

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-08-28 12:35:30 -03:00
Namhyung Kim
644bbe59af perf annotate: Show warning when debuginfo is not available
When user requests data-type annotation but no DWARF info is available,
show a warning message about it.

  Warning:
  DWARF debuginfo not found.

  Data-type in this DSO will not be displayed.
  Please make sure to have debug information.

  Press any key...

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-10-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-08-28 12:35:12 -03:00
Namhyung Kim
1d4374afd0 perf annotate: Add 'T' hot key to toggle data type display
Support data type display with a key press so that users can toggle the
output dynamically on TUI.  Also display "[Type]" in the title line if
it's enabled.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-08-28 12:33:18 -03:00
Namhyung Kim
7dbe89ca3d perf annotate: Add --code-with-type support for TUI
Until now, the --code-with-type option is available only on stdio.
But it was an artifical limitation because of an implemention issue.

Implement the same logic in annotation_line__write() for stdio2/TUI
and remove the limitation and update the man page.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-08-28 12:33:08 -03:00
Namhyung Kim
e38ea8c41e perf annotate: Return printed number from disasm_line__write()
Like other print functions, make disasm_line__write() return the number
of printed characters.  It'll be used to skip unnecessary operations
when the screen is full.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-08-28 12:32:58 -03:00
Namhyung Kim
7736a6fba0 perf annotate: Simplify width calculation in annotation_line__write()
The width is updated after each part is printed.  It can skip the output
processing if the total printed size is bigger than the width.

No function changes intended.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-08-28 12:32:47 -03:00
Namhyung Kim
d94d5eb54f perf annotate: Pass annotation_print_data to annotation_line__write()
It will be used for data type display later.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250816031635.25318-5-namhyung@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-08-28 12:32:32 -03:00
Namhyung Kim
05a706b157 perf annotate: Remove __annotation_line__write()
Get rid of the internal function and convert function arguments into
local variables if they are used more than once.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-08-28 12:32:19 -03:00
Namhyung Kim
4e3c9dc8b8 perf annotate: Remove annotation_print_data.start
The start field is to control whether the output shows full address or
offset from the function start.  But actually it can be changed
dynamically in annotation__toggle_full_addr().  The informaiton should
be available through struct annotation.  Let's use it directly.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-08-28 12:32:08 -03:00
Namhyung Kim
f06ba25ec5 perf annotate: Rename to __hist_entry__tui_annotate()
There are three different but similar functions for annotation on TUI.
Rename it to __hist_entry__tui_annotate() and make sure it passes 'he'.
It's not used for now but it'll be needed for later use.

Also remove map_symbol__tui_annotate() which was a simple wrapper.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-08-28 12:30:57 -03:00
Ming Lei
9b2785ea85 ublk selftests: add --no_ublk_fixed_fd for not using registered ublk char device
Add a new command line option --no_ublk_fixed_fd that excludes the ublk
control device (/dev/ublkcN) from io_uring's registered files array.
When this option is used, only backing files are registered starting
from index 1, while the ublk control device is accessed using its raw
file descriptor.

Add ublk_get_registered_fd() helper function that returns the appropriate
file descriptor for use with io_uring operations.

Key optimizations implemented:
- Cache UBLKS_Q_NO_UBLK_FIXED_FD flag in ublk_queue.flags to avoid
  reading dev->no_ublk_fixed_fd in fast path
- Cache ublk char device fd in ublk_queue.ublk_fd for fast access
- Update ublk_get_registered_fd() to use ublk_queue * parameter
- Update io_uring_prep_buf_register/unregister() to use ublk_queue *
- Replace ublk_device * access with ublk_queue * access in fast paths

Also pass --no_ublk_fixed_fd to test_stress_04.sh for covering
plain ublk char device mode.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250827121602.2619736-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-28 07:56:57 -06:00
Eric Dumazet
51132b99f0 udp: add drop_counters to udp socket
When a packet flood hits one or more UDP sockets, many cpus
have to update sk->sk_drops.

This slows down other cpus, because currently
sk_drops is in sock_write_rx group.

Add a socket_drop_counters structure to udp sockets.

Using dedicated cache lines to hold drop counters
makes sure that consumers no longer suffer from
false sharing if/when producers only change sk->sk_drops.

This adds 128 bytes per UDP socket.

Tested with the following stress test, sending about 11 Mpps
to a dual socket AMD EPYC 7B13 64-Core.

super_netperf 20 -t UDP_STREAM -H DUT -l10 -- -n -P,1000 -m 120
Note: due to socket lookup, only one UDP socket is receiving
packets on DUT.

Then measure receiver (DUT) behavior. We can see both
consumer and BH handlers can process more packets per second.

Before:

nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams                 615091             0.0
Udp6InErrors                    3904277            0.0
Udp6RcvbufErrors                3904277            0.0

After:

nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams                 816281             0.0
Udp6InErrors                    7497093            0.0
Udp6RcvbufErrors                7497093            0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-5-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-28 13:14:50 +02:00
Zi Yan
5bbc2b785e selftests/mm: fix FORCE_READ to read input value correctly
FORCE_READ() converts input value x to its pointer type then reads from
address x.  This is wrong.  If x is a non-pointer, it would be caught it
easily.  But all FORCE_READ() callers are trying to read from a pointer
and FORCE_READ() basically reads a pointer to a pointer instead of the
original typed pointer.  Almost no access violation was found, except the
one from split_huge_page_test.

Fix it by implementing a simplified READ_ONCE() instead.

Link: https://lkml.kernel.org/r/20250805175140.241656-1-ziy@nvidia.com
Fixes: 3f6bfd4789 ("selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));"")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-27 22:45:42 -07:00
Linus Torvalds
07d9df8008 Merge tag 'perf-tools-fixes-for-v6.17-2025-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf-tools fixes from Namhyung Kim:
 "A number of kernel header sync changes and two build-id fixes"

* tag 'perf-tools-fixes-for-v6.17-2025-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf symbol: Add blocking argument to filename__read_build_id
  perf symbol-minimal: Fix ehdr reading in filename__read_build_id
  tools headers: Sync uapi/linux/vhost.h with the kernel source
  tools headers: Sync uapi/linux/prctl.h with the kernel source
  tools headers: Sync uapi/linux/fs.h with the kernel source
  tools headers: Sync uapi/linux/fcntl.h with the kernel source
  tools headers: Sync syscall tables with the kernel source
  tools headers: Sync powerpc headers with the kernel source
  tools headers: Sync arm64 headers with the kernel source
  tools headers: Sync x86 headers with the kernel source
  tools headers: Sync linux/cfi_types.h with the kernel source
  tools headers: Sync linux/bits.h with the kernel source
  tools headers: Sync KVM headers with the kernel source
  perf test: Fix a build error in x86 topdown test
2025-08-27 19:18:51 -07:00
Matt Fleming
737433c6a5 selftests/bpf: Add LPM trie microbenchmarks
Add benchmarks for the standard set of operations: LOOKUP, INSERT,
UPDATE, DELETE. Also include benchmarks to measure the overhead of the
bench framework itself (NOOP) as well as the overhead of generating keys
(BASELINE). Lastly, this includes a benchmark for FREE (trie_free())
which is known to have terrible performance for maps with many entries.

Benchmarks operate on tries without gaps in the key range, i.e. each
test begins or ends with a trie with valid keys in the range [0,
nr_entries). This is intended to cause maximum branching when traversing
the trie.

LOOKUP, UPDATE, DELETE, and FREE fill a BPF LPM trie from userspace
using bpf_map_update_batch() and run the corresponding benchmark
operation via bpf_loop(). INSERT starts with an empty map and fills it
kernel-side from bpf_loop(). FREE records the time to free a filled LPM
trie by attaching and destroying a BPF prog. NOOP measures the overhead
of the test harness by running an empty function with bpf_loop().
BASELINE is similar to NOOP except that the function generates a key.

Each operation runs 10,000 times using bpf_loop(). Note that this value
is intentionally independent of the number of entries in the LPM trie so
that the stability of the results isn't affected by the number of
entries.

For those benchmarks that need to reset the LPM trie once it's full
(INSERT) or empty (DELETE), throughput and latency results are scaled by
the fraction of a second the operation actually ran to ignore any time
spent reinitialising the trie.

By default, benchmarks run using sequential keys in the range [0,
nr_entries). BASELINE, LOOKUP, and UPDATE can use random keys via the
--random parameter but beware there is a runtime cost involved in
generating random keys. Other benchmarks are prohibited from using
random keys because it can skew the results, e.g. when inserting an
existing key or deleting a missing one.

All measurements are recorded from within the kernel to eliminate
syscall overhead. Most benchmarks run an XDP program to generate stats
but FREE needs to collect latencies using fentry/fexit on
map_free_deferred() because it's not possible to use fentry directly on
lpm_trie.c since commit c83508da56 ("bpf: Avoid deadlock caused by
nested kprobe and fentry bpf programs") and there's no way to
create/destroy a map from within an XDP program.

Here is example output from an AMD EPYC 9684X 96-Core machine for each
of the benchmarks using a trie with 10K entries and a 32-bit prefix
length, e.g.

  $ ./bench lpm-trie-$op \
  	--prefix_len=32  \
	--producers=1     \
	--nr_entries=10000

     noop: throughput   74.417 ± 0.032 M ops/s ( 74.417M ops/prod), latency   13.438 ns/op
 baseline: throughput   70.107 ± 0.171 M ops/s ( 70.107M ops/prod), latency   14.264 ns/op
   lookup: throughput    8.467 ± 0.047 M ops/s (  8.467M ops/prod), latency  118.109 ns/op
   insert: throughput    2.440 ± 0.015 M ops/s (  2.440M ops/prod), latency  409.290 ns/op
   update: throughput    2.806 ± 0.042 M ops/s (  2.806M ops/prod), latency  356.322 ns/op
   delete: throughput    4.625 ± 0.011 M ops/s (  4.625M ops/prod), latency  215.613 ns/op
     free: throughput    0.578 ± 0.006 K ops/s (  0.578K ops/prod), latency    1.730 ms/op

And the same benchmarks using random keys:

  $ ./bench lpm-trie-$op \
  	--prefix_len=32  \
	--producers=1     \
	--nr_entries=10000 \
	--random

     noop: throughput   74.259 ± 0.335 M ops/s ( 74.259M ops/prod), latency   13.466 ns/op
 baseline: throughput   35.150 ± 0.144 M ops/s ( 35.150M ops/prod), latency   28.450 ns/op
   lookup: throughput    7.119 ± 0.048 M ops/s (  7.119M ops/prod), latency  140.469 ns/op
   insert: N/A
   update: throughput    2.736 ± 0.012 M ops/s (  2.736M ops/prod), latency  365.523 ns/op
   delete: N/A
     free: N/A

Signed-off-by: Matt Fleming <mfleming@cloudflare.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20250827140149.1001557-1-matt@readmodwrite.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-27 17:28:14 -07:00
Puranjay Mohan
22b22bf9ee selftests/bpf: Enable timed may_goto tests for arm64
As arm64 JIT now supports timed may_goto instruction, make sure all
relevant tests run on this architecture. Some tests were enabled and
other required modifications to work properly on arm64.

 $ ./test_progs -a "stream*","*may_goto*",verifier_bpf_fastcall

 #404     stream_errors:OK
 [...]
 #406/2   stream_success/stream_cond_break:OK
 [...]
 #494/23  verifier_bpf_fastcall/may_goto_interaction_x86_64:SKIP
 #494/24  verifier_bpf_fastcall/may_goto_interaction_arm64:OK
 [...]
 #539/1   verifier_may_goto_1/may_goto 0:OK
 #539/2   verifier_may_goto_1/batch 2 of may_goto 0:OK
 #539/3   verifier_may_goto_1/may_goto batch with offsets 2/1/0:OK
 #539/4   verifier_may_goto_1/may_goto batch with offsets 2/0:OK
 #539     verifier_may_goto_1:OK
 #540/1   verifier_may_goto_2/C code with may_goto 0:OK
 #540     verifier_may_goto_2:OK
 Summary: 7/16 PASSED, 25 SKIPPED, 0 FAILED

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20250827113245.52629-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-27 17:16:22 -07:00