Commit Graph

49430 Commits

Author SHA1 Message Date
Jakub Sitnicki
29960e635b selftests/bpf: Cover skb metadata access after bpf_skb_adjust_room
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_adjust_room(), which modifies the packet headroom and can trigger
head reallocation.

The helper expects an Ethernet frame carrying an IP packet so switch test
packet identification by source MAC address since we can no longer rely on
Ethernet proto being set to zero.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-14-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:33 -08:00
Jakub Sitnicki
354d020c29 selftests/bpf: Cover skb metadata access after vlan push/pop helper
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_vlan_push() and bpf_skb_vlan_pop(), which modify the packet
headroom.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-13-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:32 -08:00
Jakub Sitnicki
1e1357fde8 selftests/bpf: Expect unclone to preserve skb metadata
Since pskb_expand_head() no longer clears metadata on unclone, update tests
for cloned packets to expect metadata to remain intact.

Also simplify the clone_dynptr_kept_on_{data,meta}_slice_write tests.
Creating an r/w dynptr slice is sufficient to trigger an unclone in the
prologue, so remove the extraneous writes to the data/meta slice.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-12-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:32 -08:00
Jakub Sitnicki
9ef9ac15a5 selftests/bpf: Dump skb metadata on verification failure
Add diagnostic output when metadata verification fails to help with
troubleshooting test failures. Introduce a check_metadata() helper that
prints both expected and received metadata to the BPF program's stderr
stream on mismatch. The userspace test reads and dumps this stream on
failure.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-11-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:32 -08:00
Jakub Sitnicki
967534e57c selftests/bpf: Verify skb metadata in BPF instead of userspace
Move metadata verification into the BPF TC programs. Previously,
userspace read metadata from a map and verified it once at test end.

Now TC programs compare metadata directly using __builtin_memcmp() and
set a test_pass flag. This enables verification at multiple points during
test execution rather than a single final check.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-10-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:32 -08:00
Linus Torvalds
4ea7c1717f Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "Arm:

   - Fix trapping regression when no in-kernel irqchip is present

   - Check host-provided, untrusted ranges and offsets in pKVM

   - Fix regression restoring the ID_PFR1_EL1 register

   - Fix vgic ITS locking issues when LPIs are not directly injected

  Arm selftests:

   - Correct target CPU programming in vgic_lpi_stress selftest

   - Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest

  RISC-V:

   - Fix check for local interrupts on riscv32

   - Read HGEIP CSR on the correct cpu when checking for IMSIC
     interrupts

   - Remove automatic I/O mapping from kvm_arch_prepare_memory_region()

  x86:

   - Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as
     KVM doesn't support virtualization the instructions, but the
     instructions are gated only by VMXON. That is, they will VM-Exit
     instead of taking a #UD and until now this resulted in KVM exiting
     to userspace with an emulation error.

   - Unload the "FPU" when emulating INIT of XSTATE features if and only
     if the FPU is actually loaded, instead of trying to predict when
     KVM will emulate an INIT (CET support missed the MP_STATE path).
     Add sanity checks to detect and harden against similar bugs in the
     future.

   - Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is
     unloaded.

   - Use a raw spinlock for svm->ir_list_lock as the lock is taken
     during schedule(), and "normal" spinlocks are sleepable locks when
     PREEMPT_RT=y.

   - Remove guest_memfd bindings on memslot deletion when a gmem file is
     dying to fix a use-after-free race found by syzkaller.

   - Fix a goof in the EPT Violation handler where KVM checks the wrong
     variable when determining if the reported GVA is valid.

   - Fix and simplify the handling of LBR virtualization on AMD, which
     was made buggy and unnecessarily complicated by nested VM support

  Misc:

   - Update Oliver's email address"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  KVM: nSVM: Fix and simplify LBR virtualization handling with nested
  KVM: nSVM: Always recalculate LBR MSR intercepts in svm_update_lbrv()
  KVM: SVM: Mark VMCB_LBR dirty when MSR_IA32_DEBUGCTLMSR is updated
  MAINTAINERS: Switch myself to using kernel.org address
  KVM: arm64: vgic-v3: Release reserved slot outside of lpi_xa's lock
  KVM: arm64: vgic-v3: Reinstate IRQ lock ordering for LPI xarray
  KVM: arm64: Limit clearing of ID_{AA64PFR0,PFR1}_EL1.GIC to userspace irqchip
  KVM: arm64: Set ID_{AA64PFR0,PFR1}_EL1.GIC when GICv3 is configured
  KVM: arm64: Make all 32bit ID registers fully writable
  KVM: VMX: Fix check for valid GVA on an EPT violation
  KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying
  KVM: SVM: switch to raw spinlock for svm->ir_list_lock
  KVM: SVM: Make avic_ga_log_notifier() local to avic.c
  KVM: SVM: Unregister KVM's GALog notifier on kvm-amd.ko exit
  KVM: SVM: Initialize per-CPU svm_data at the end of hardware setup
  KVM: x86: Call out MSR_IA32_S_CET is not handled by XSAVES
  KVM: x86: Harden KVM against imbalanced load/put of guest FPU state
  KVM: x86: Unload "FPU" state on INIT if and only if its currently in-use
  KVM: arm64: Check the untrusted offset in FF-A memory share
  KVM: arm64: Check range args for pKVM mem transitions
  ...
2025-11-10 08:54:36 -08:00
Paolo Bonzini
ca00c3af8e Merge tag 'kvmarm-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm654 fixes for 6.18, take #2

* Core fixes

  - Fix trapping regression when no in-kernel irqchip is present
    (20251021094358.1963807-1-sascha.bischoff@arm.com)

  - Check host-provided, untrusted ranges and offsets in pKVM
    (20251016164541.3771235-1-vdonnefort@google.com)
    (20251017075710.2605118-1-sebastianene@google.com)

  - Fix regression restoring the ID_PFR1_EL1 register
    (20251030122707.2033690-1-maz@kernel.org

  - Fix vgic ITS locking issues when LPIs are not directly injected
    (20251107184847.1784820-1-oupton@kernel.org)

* Test fixes

  - Correct target CPU programming in vgic_lpi_stress selftest
    (20251020145946.48288-1-mdittgen@amazon.de)

  - Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest
    (20251023-b4-kvm-arm64-get-reg-list-sctlr-el2-v1-1-088f88ff992a@kernel.org)
    (20251024-kvm-arm64-get-reg-list-zcr-el2-v1-1-0cd0ff75e22f@kernel.org)

* Misc

  - Update Oliver's email address
    (20251107012830.1708225-1-oupton@kernel.org)
2025-11-09 08:07:55 +01:00
Daniel Zahka
2098cec328 selftests: drv-net: psp: add assertions on core-tracked psp dev stats
Add assertions to existing test cases to cover key rotations and
'stale-events'.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251106002608.1578518-3-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07 18:53:56 -08:00
Alexander Sverdlin
57531b3416 selftests: net: local_termination: Wait for interfaces to come up
It seems that most of the tests prepare the interfaces once before the test
run (setup_prepare()), rely on setup_wait() to wait for link and only then
run the test(s).

local_termination brings the physical interfaces down and up during test
run but never wait for them to come up. If the auto-negotiation takes
some seconds, first test packets are being lost, which leads to
false-negative test results.

Use setup_wait() in run_test() to make sure auto-negotiation has been
completed after all simple_if_init() calls on physical interfaces and test
packets will not be lost because of the race against link establishment.

Fixes: 90b9566aa5 ("selftests: forwarding: add a test for local_termination.sh")
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://patch.msgid.link/20251106161213.459501-1-alexander.sverdlin@siemens.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07 18:46:36 -08:00
Kuniyuki Iwashima
ffc56c9081 selftest: packetdrill: Add max RTO test for SYN+ACK.
This script sets net.ipv4.tcp_rto_max_ms to 1000 and checks
if SYN+ACK RTO is capped at 1s for TFO and non-TFO.

Without the previous patch, the max RTO is applied to TFO
SYN+ACK only, and non-TFO SYN+ACK RTO increases exponentially.

  # selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt
  # TAP version 13
  # 1..2
  # tcp_rto_synack_rto_max.pkt:46: error handling packet: timing error:
     expected outbound packet at 5.091936 sec but happened at 6.107826 sec; tolerance 0.127974 sec
  # script packet:  5.091936 S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
  # actual packet:  6.107826 S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK>
  # not ok 1 ipv4
  # tcp_rto_synack_rto_max.pkt:46: error handling packet: timing error:
     expected outbound packet at 5.075901 sec but happened at 6.091841 sec; tolerance 0.127976 sec
  # script packet:  5.075901 S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
  # actual packet:  6.091841 S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK>
  # not ok 2 ipv6
  # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
  not ok 49 selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt # exit=1

With the previous patch, all SYN+ACKs are retransmitted
after 1s.

  # selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt
  # TAP version 13
  # 1..2
  # ok 1 ipv4
  # ok 2 ipv6
  # # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
  ok 49 selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251106003357.273403-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07 18:05:26 -08:00
Linus Torvalds
a2e33fb926 Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd fixes from Jason Gunthorpe:

 - Syzkaller found a case where maths overflows can cause divide by 0

 - Typo in a compiler bug warning fix in the selftests broke the
   selftests

 - type1 compatability had a mismatch when unmapping an already unmapped
   range, it should succeed

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Make vfio_compat's unmap succeed if the range is already empty
  iommufd/selftest: Fix ioctl return value in _test_cmd_trigger_vevents()
  iommufd: Don't overflow during division for dirty tracking
2025-11-07 13:13:09 -08:00
Linus Torvalds
5b95a50001 Merge tag 'trace-v6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Check for reader catching up in ring_buffer_map_get_reader()

   If the reader catches up to the writer in the memory mapped ring
   buffer then calling rb_get_reader_page() will return NULL as there's
   no pages left. But this isn't checked for before calling
   rb_get_reader_page() and the return of NULL causes a warning.

   If it is detected that the reader caught up to the writer, then
   simply exit the routine

 - Fix memory leak in histogram create_field_var()

   The couple of the error paths in create_field_var() did not properly
   clean up what was allocated. Make sure everything is freed properly
   on error

 - Fix help message of tools latency_collector

   The help message incorrectly stated that "-t" was the same as
   "--threads" whereas "--threads" is actually represented by "-e"

* tag 'trace-v6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/tools: Fix incorrcet short option in usage text for --threads
  tracing: Fix memory leaks in create_field_var()
  ring-buffer: Do not warn in ring_buffer_map_get_reader() when reader catches up
2025-11-07 08:07:11 -08:00
Zhang Chujun
53afec2c8f tracing/tools: Fix incorrcet short option in usage text for --threads
The help message incorrectly listed '-t' as the short option for
--threads, but the actual getopt_long configuration uses '-e'.
This mismatch can confuse users and lead to incorrect command-line
usage. This patch updates the usage string to correctly show:
	"-e, --threads NRTHR"
to match the implementation.

Note: checkpatch.pl reports a false-positive spelling warning on
'Run', which is intentional.

Link: https://patch.msgid.link/20251106031040.1869-1-zhangchujun@cmss.chinamobile.com
Signed-off-by: Zhang Chujun <zhangchujun@cmss.chinamobile.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-07 07:59:37 -05:00
Linus Torvalds
f5f2e20b1c Merge tag 'perf-tools-fixes-for-v6.18-1-2025-11-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Add James Clark as a perf tools reviewer

 - Handle '1' type symbols in /proc/kallsyms, related to anonymous
   Rust closures in the DRM panic QR encoder, caught by 'perf test'

 - Sync kernel header copies: MSRs, uprobe syscall,
   DRM_IOCTL_GEM_CHANGE_HANDLE, KVM exit reasons, etc

* tag 'perf-tools-fixes-for-v6.18-1-2025-11-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf symbols: Handle '1' symbols in /proc/kallsyms
  tools headers asm: Sync fls headers header with the kernel sources
  tools headers UAPI: Sync KVM's vmx.h header with the kernel sources to handle new exit reasons
  tools headers svm: Sync svm headers with the kernel sources
  tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
  MAINTAINERS: Add James Clark as a perf tools reviewer
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  tools headers UAPI: Update tools's copy of drm.h to pick DRM_IOCTL_GEM_CHANGE_HANDLE
  tools headers x86 cpufeatures: Sync with the kernel sources
  tools headers x86: Sync table due to introducion of uprobe syscall
  tools headers: Sync uapi/linux/fcntl.h with the kernel sources
  tools headers: Sync uapi/linux/prctl.h with the kernel source
  tools headers uapi: Update fs.h with the kernel sources
  tools arch x86: Sync msr-index.h to pick AMD64_{PERF_CNTR_GLOBAL_STATUS_SET,SAVIC_CONTROL}, IA32_L3_QOS_{ABMC,EXT}_CFG
2025-11-06 16:05:33 -08:00
Jakub Kicinski
1ec9871fbb Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.18-rc5).

Conflicts:

drivers/net/wireless/ath/ath12k/mac.c
  9222582ec5 ("Revert "wifi: ath12k: Fix missing station power save configuration"")
  6917e268c4 ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon")
https://lore.kernel.org/11cece9f7e36c12efd732baa5718239b1bf8c950.camel@sipsolutions.net

Adjacent changes:

drivers/net/ethernet/intel/Kconfig
  b1d16f7c00 ("libie: depend on DEBUG_FS when building LIBIE_FWLOG")
  93f53db9f9 ("ice: switch to Page Pool")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06 09:27:40 -08:00
Linus Torvalds
c2c2ccfd4b Merge tag 'net-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
  Including fixes from bluetooth and wireless.

  Current release - new code bugs:

   - ptp: expose raw cycles only for clocks with free-running counter

   - bonding: fix null-deref in actor_port_prio setting

   - mdio: ERR_PTR-check regmap pointer returned by
     device_node_to_regmap()

   - eth: libie: depend on DEBUG_FS when building LIBIE_FWLOG

  Previous releases - regressions:

   - virtio_net: fix perf regression due to bad alignment of
     virtio_net_hdr_v1_hash

   - Revert "wifi: ath10k: avoid unnecessary wait for service ready
     message" caused regressions for QCA988x and QCA9984

   - Revert "wifi: ath12k: Fix missing station power save configuration"
     caused regressions for WCN7850

   - eth: bnxt_en: shutdown FW DMA in bnxt_shutdown(), fix memory
     corruptions after kexec

  Previous releases - always broken:

   - virtio-net: fix received packet length check for big packets

   - sctp: fix races in socket diag handling

   - wifi: add an hrtimer-based delayed work item to avoid low
     granularity of timers set relatively far in the future, and use it
     where it matters (e.g. when performing AP-scheduled channel switch)

   - eth: mlx5e:
       - correctly propagate error in case of module EEPROM read failure
       - fix HW-GRO on systems with PAGE_SIZE == 64kB

   - dsa: b53: fixes for tagging, link configuration / RMII, FDB,
     multicast

   - phy: lan8842: implement latest errata"

* tag 'net-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
  selftests/vsock: avoid false-positives when checking dmesg
  net: bridge: fix MST static key usage
  net: bridge: fix use-after-free due to MST port state bypass
  lan966x: Fix sleeping in atomic context
  bonding: fix NULL pointer dereference in actor_port_prio setting
  net: dsa: microchip: Fix reserved multicast address table programming
  net: wan: framer: pef2256: Switch to devm_mfd_add_devices()
  net: libwx: fix device bus LAN ID
  net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pages
  net/mlx5e: SHAMPO, Fix skb size check for 64K pages
  net/mlx5e: SHAMPO, Fix header mapping for 64K pages
  net: ti: icssg-prueth: Fix fdb hash size configuration
  net/mlx5e: Fix return value in case of module EEPROM read error
  net: gro_cells: Reduce lock scope in gro_cell_poll
  libie: depend on DEBUG_FS when building LIBIE_FWLOG
  wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup
  netpoll: Fix deadlock in memory allocation under spinlock
  net: ethernet: ti: netcp: Standardize knav_dma_open_channel to return NULL on error
  virtio-net: fix received length check in big packets
  bnxt_en: Fix warning in bnxt_dl_reload_down()
  ...
2025-11-06 08:52:30 -08:00
Bobby Eshleman
3534e03e0e selftests/vsock: avoid false-positives when checking dmesg
Sometimes VMs will have some intermittent dmesg warnings that are
unrelated to vsock. Change the dmesg parsing to filter on strings
containing 'vsock' to avoid false positive failures that are unrelated
to vsock. The downside is that it is possible for some vsock related
warnings to not contain the substring 'vsock', so those will be missed.

Fixes: a4a65c6fe0 ("selftests/vsock: add initial vmtest.sh for vsock")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251105-vsock-vmtest-dmesg-fix-v2-1-1a042a14892c@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06 07:34:50 -08:00
Jason Gunthorpe
afb47765f9 iommufd: Make vfio_compat's unmap succeed if the range is already empty
iommufd returns ENOENT when attempting to unmap a range that is already
empty, while vfio type1 returns success. Fix vfio_compat to match.

Fixes: d624d6652a ("iommufd: vfio container FD ioctl compatibility")
Link: https://patch.msgid.link/r/0-v1-76be45eff0be+5d-iommufd_unmap_compat_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Alex Mastro <amastro@fb.com>
Reported-by: Alex Mastro <amastro@fb.com>
Closes: https://lore.kernel.org/r/aP0S5ZF9l3sWkJ1G@devgpu012.nha5.facebook.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-05 15:11:26 -04:00
Kees Cook
2b5e9f9b7e net: Convert struct sockaddr to fixed-size "sa_data[14]"
Revert struct sockaddr from flexible array to fixed 14-byte "sa_data",
to solve over 36,000 -Wflex-array-member-not-at-end warnings, since
struct sockaddr is embedded within many network structs.

With socket/proto sockaddr-based internal APIs switched to use struct
sockaddr_unsized, there should be no more uses of struct sockaddr that
depend on reading beyond the end of struct sockaddr::sa_data that might
trigger bounds checking.

Comparing an x86_64 "allyesconfig" vmlinux build before and after this
patch showed no new "ud1" instructions from CONFIG_UBSAN_BOUNDS nor any
new "field-spanning" memcpy CONFIG_FORTIFY_SOURCE instrumentations.

Cc: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-8-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 19:10:33 -08:00
Kees Cook
85cb0757d7 net: Convert proto_ops connect() callbacks to use sockaddr_unsized
Update all struct proto_ops connect() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.

No binary changes expected.

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 19:10:32 -08:00
Kees Cook
0e50474fa5 net: Convert proto_ops bind() callbacks to use sockaddr_unsized
Update all struct proto_ops bind() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.

No binary changes expected.

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 19:10:32 -08:00
Matthieu Baerts (NGI0)
5c59df126b selftests: mptcp: join: validate extra bind cases
By design, an MPTCP connection will not accept extra subflows where no
MPTCP listening sockets can accept such requests.

In other words, it means that if the 'server' listens on a specific
address / device, it cannot accept MP_JOIN sent to a different address /
device. Except if there is another MPTCP listening socket accepting
them.

This is what the new tests are validating:

 - Forcing a bind on the main v4/v6 address, and checking that MP_JOIN
   to announced addresses are not accepted.

 - Also forcing a bind on the main v4/v6 address, but before, another
   listening socket is created to accept additional subflows. Note that
   'mptcpize run nc -l' -- or something else only doing: socket(MPTCP),
   bind(<IP>), listen(0) -- would be enough, but here mptcp_connect is
   reused not to depend on another tool just for that.

 - Same as the previous one, but using v6 link-local addresses: this is
   a bit particular because it is required to specify the outgoing
   network interface when connecting to a link-local address announced
   by the other peer. When using the routing rules, this doesn't work
   (the outgoing interface is not known) ; but it does work with a
   'laminar' endpoint having a specified interface.

Note that extra small modifications are needed for these tests to work:

 - mptcp_connect's check_getpeername_connect() check should strip the
   specified interface when comparing addresses.

 - With IPv6 link-local addresses, it is required to wait for them to
   be ready (no longer in 'tentative' mode) before using them, otherwise
   the bind() will not be allowed.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/591
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-4-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 17:15:07 -08:00
Matthieu Baerts (NGI0)
4a6220a453 selftests: mptcp: join: do_transfer: reduce code dup
The same extra long commands are present twice, with small differences:
the variable for the stdin file is different.

Use new dedicated variables in one command to avoid this code
duplication.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-3-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 17:15:06 -08:00
Matthieu Baerts (NGI0)
e461e8a799 mptcp: pm: in kernel: only use fullmesh endp if any
Our documentation is saying that the in-kernel PM is only using fullmesh
endpoints to establish subflows to announced addresses when at least one
endpoint has a fullmesh flag. But this was not totally correct: only
fullmesh endpoints were used if at least one endpoint *from the same
address family as the received ADD_ADDR* has the fullmesh flag.

This is confusing, and it seems clearer not to have differences
depending on the address family.

So, now, when at least one MPTCP endpoint has a fullmesh flag, the local
addresses are picked from all fullmesh endpoints, which might be 0 if
there are no endpoints for the correct address family.

One selftest needs to be adapted for this behaviour change.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-2-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 17:15:06 -08:00
Samiullah Khawaja
add3c1324a selftests: Add napi threaded busy poll test in busy_poller
Add testcase to run busy poll test with threaded napi busy poll enabled.

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Martin Karsten <mkarsten@uwaterloo.ca>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
Link: https://patch.msgid.link/20251028203007.575686-3-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-03 18:11:40 -08:00
Samiullah Khawaja
c18d4b190a net: Extend NAPI threaded polling to allow kthread based busy polling
Add a new state NAPI_STATE_THREADED_BUSY_POLL to the NAPI state enum to
enable and disable threaded busy polling.

When threaded busy polling is enabled for a NAPI, enable
NAPI_STATE_THREADED also.

When the threaded NAPI is scheduled, set NAPI_STATE_IN_BUSY_POLL to
signal napi_complete_done not to rearm interrupts.

Whenever NAPI_STATE_THREADED_BUSY_POLL is unset, the
NAPI_STATE_IN_BUSY_POLL will be unset, napi_complete_done unsets the
NAPI_STATE_SCHED_THREADED bit also, which in turn will make the kthread
go to sleep.

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Martin Karsten <mkarsten@uwaterloo.ca>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
Link: https://patch.msgid.link/20251028203007.575686-2-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-03 18:11:40 -08:00
Arnaldo Carvalho de Melo
7f17ef0d47 perf symbols: Handle '1' symbols in /proc/kallsyms
I started seeing this in recent Fedora 42 kernels:

  root@x1:~# uname -a
  Linux x1 6.17.4-200.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Oct 19 18:47:49 UTC 2025 x86_64 GNU/Linux
  root@x1:~#

  root@x1:~# perf test 1
    1: vmlinux symtab matches kallsyms     : FAILED!
  root@x1:~#

Related to:

  root@x1:~# grep ' 1 ' /proc/kallsyms
  ffffffffb098bc00 1 __pfx__RNCINvNtNtNtCsfwaGRd4cjqE_4core4iter8adapters3map12map_try_foldjNtCskFudTml27HW_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_
  ffffffffb098bc10 1 _RNCINvNtNtNtCsfwaGRd4cjqE_4core4iter8adapters3map12map_try_foldjNtCskFudTml27HW_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_
  root@x1:~#

That is found in:

  root@x1:~# pahole --running_kernel_vmlinux
  /usr/lib/debug/lib/modules/6.17.4-200.fc42.x86_64/vmlinux
  root@x1:~#

  root@x1:~# readelf -sW /usr/lib/debug/lib/modules/6.17.4-200.fc42.x86_64/vmlinux | grep __pfx__RNCINvNtNtNtCsfwaGRd4cjqE_4core4iter8adapters3map12map_try_foldjNtCskFudTml27HW_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_
  150649: ffffffff81f8bc00    16 FUNC    LOCAL  DEFAULT    1 __pfx__RNCINvNtNtNtCsfwaGRd4cjqE_4core4iter8adapters3map12map_try_foldjNtCskFudTml27HW_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_
  root@x1:~#

But was being filtered out when reading /proc/kallsyms, as the '1'
symbol type was not being handled, do it, there are just two of them at
this point.

Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Benno Lossin <lossin@kernel.org>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Trevor Gross <tmgross@umich.edu>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-03 14:54:31 -03:00
Arnaldo Carvalho de Melo
549042f167 tools headers asm: Sync fls headers header with the kernel sources
To pick the changes in:

  6606c8c7e8 ("bitops: Add __attribute_const__ to generic ffs()-family implementations")

This addresses these tools build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/asm-generic/bitops/__fls.h include/asm-generic/bitops/__fls.h
    diff -u tools/include/asm-generic/bitops/fls.h include/asm-generic/bitops/fls.h
    diff -u tools/include/asm-generic/bitops/fls64.h include/asm-generic/bitops/fls64.h

Please see tools/include/uapi/README for further details.

Cc: Kees Cook <kees@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-03 13:35:06 -03:00
Arnaldo Carvalho de Melo
fc9ef9118d tools headers UAPI: Sync KVM's vmx.h header with the kernel sources to handle new exit reasons
To pick the changes in:

  885df2d210 ("KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel")
  c42856af8f ("KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL)")

That makes 'perf kvm-stat' aware of these new TDCALL and
MSR_{READ,WRITE}_IMM exit reasons, thus addressing the following perf
build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h

Please see tools/include/uapi/README for further details.

Cc: Sean Christopherson <seanjc@google.com>
Cc: Xin Li <xin@zytor.com>
Cc: Isaku Yamahata <isaku.yamahata@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-03 13:29:53 -03:00
Arnaldo Carvalho de Melo
649a0cc96e tools headers svm: Sync svm headers with the kernel sources
To pick the changes in:

  b8c3c9f5d0 ("x86/apic: Initialize Secure AVIC APIC backing page")

That triggers:

  CC      /tmp/build/perf-tools/arch/x86/util/kvm-stat.o
  LD      /tmp/build/perf-tools/arch/x86/util/perf-util-in.o
  LD      /tmp/build/perf-tools/arch/x86/perf-util-in.o
  LD      /tmp/build/perf-tools/arch/perf-util-in.o
  LD      /tmp/build/perf-tools/perf-util-in.o
  AR      /tmp/build/perf-tools/libperf-util.a
  LINK    /tmp/build/perf-tools/perf

But this time causes no changes in tooling results, as the introduced
SVM_VMGEXIT_SAVIC exit reason wasn't added to SVM_EXIT_REASONS, that is
used in kvm-stat.c.

And addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h

Please see tools/include/uapi/README for further details.

Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-03 13:29:53 -03:00
Arnaldo Carvalho de Melo
b1d46bc10f tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
To pick the changes in:

  fddd07626b ("KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors")
  f2f5519aa4 ("KVM: x86: Define Control Protection Exception (#CP) vector")
  9d6812d415 ("KVM: x86: Enable guest SSP read/write interface with new uAPIs")
  06f2969c6a ("KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support")

That just rebuilds kvm-stat.c on x86, no change in functionality.

This silences these perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Please see tools/include/uapi/README for further details.

Cc: Sean Christopherson <seanjc@google.com>
Cc: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-03 13:29:53 -03:00
Arnaldo Carvalho de Melo
29e4d12a29 tools headers UAPI: Sync linux/kvm.h with the kernel sources
To pick the changes in:

  fe2bf6234e ("KVM: guest_memfd: Add INIT_SHARED flag, reject user page faults if not set")
  d2042d8f96 ("KVM: Rework KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS")
  3d3a04fad2 ("KVM: Allow and advertise support for host mmap() on guest_memfd files")

That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the 'perf trace' ioctl syscall argument beautifiers.

This addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Please see tools/include/uapi/README for further details.

Cc: Sean Christopherson <seanjc@google.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-03 13:10:19 -03:00
Arnaldo Carvalho de Melo
f8950b47db tools headers UAPI: Update tools's copy of drm.h to pick DRM_IOCTL_GEM_CHANGE_HANDLE
Picking the changes from:

  0864197382 ("drm: Move drm_gem ioctl kerneldoc to uapi file")
  53096728b8 ("drm: Add DRM prime interface to reassign GEM handle")

Addressing these perf build warnings:

  Warning: Kernel ABI header differences:

Now 'perf trace' and other code that might use the tools/perf/trace/beauty
autogenerated tables will be able to translate this new ioctl command into
a string:

  $ tools/perf/trace/beauty/drm_ioctl.sh > before
  $ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h
  $ tools/perf/trace/beauty/drm_ioctl.sh > after
  $ diff -u before after
  --- before	2025-11-03 09:57:34.832553174 -0300
  +++ after	2025-11-03 09:57:47.969409428 -0300
  @@ -111,6 +111,7 @@
   	[0xCF] = "SYNCOBJ_EVENTFD",
   	[0xD0] = "MODE_CLOSEFB",
   	[0xD1] = "SET_CLIENT_NAME",
  +	[0xD2] = "GEM_CHANGE_HANDLE",
   	[DRM_COMMAND_BASE + 0x00] = "I915_INIT",
   	[DRM_COMMAND_BASE + 0x01] = "I915_FLUSH",
   	[DRM_COMMAND_BASE + 0x02] = "I915_FLIP",
  $

Please see tools/include/uapi/README for further details.

Cc: Christian König <christian.koenig@amd.com>
Cc: David Francis <David.Francis@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-03 12:14:18 -03:00
Arnaldo Carvalho de Melo
ccaba800e7 tools headers x86 cpufeatures: Sync with the kernel sources
To pick the changes from:

  e19c062199 ("x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)")
  7b59c73fd6 ("x86/cpufeatures: Add SNP Secure TSC")
  3c7cb84145 ("x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions")
  2f8f173413 ("x86/vmscape: Add conditional IBPB mitigation")
  a508cec6e5 ("x86/vmscape: Enumerate VMSCAPE bug")

This causes these perf files to be rebuilt and brings some X86_FEATURE
that may be used by:

      CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
      CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Please see tools/include/uapi/README for further details.

Cc: Babu Moger <babu.moger@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Nikunj A Dadhania <nikunj@amd.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Xin Li <xin@zytor.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-01 13:16:47 -03:00
Arnaldo Carvalho de Melo
e0acec3369 tools headers x86: Sync table due to introducion of uprobe syscall
To pick the changes in this cset:

  56101b69c9 ("uprobes/x86: Add uprobe syscall to speed up uprobe")

That add support for this new 'uprobe' syscall in tools such as 'perf trace'.

Now it is possible to do a system wide 'perf trace' to look if this new
syscall is being used:

  root@number:~# perf trace -v -e uprobe
  <SNIP>
  event qualifier tracepoint filter: (common_pid != 33989) && (id == 336)
  ^C
  root@number#

  $ grep -w uprobe tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
  336	common	uprobe			sys_uprobe
  $

This addresses these perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl

Please see tools/include/uapi/README for further details.

Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-01 13:04:20 -03:00
Arnaldo Carvalho de Melo
5466858a70 tools headers: Sync uapi/linux/fcntl.h with the kernel sources
To pick up the changes in this cset:

  e83f0b5d10 ("nsfs: support exhaustive file handles")

That doesn't introduce anything of interest for tools/, just addresses
these perf build warnings:

Warning: Kernel ABI header differences:
  diff -u tools/perf/trace/beauty/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h

Please see tools/include/uapi/README for further details.

Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-01 13:01:13 -03:00
Arnaldo Carvalho de Melo
5be93389f3 tools headers: Sync uapi/linux/prctl.h with the kernel source
To pick up the changes in these csets:

  8cdc4d2701 ("mm/huge_memory: respect MADV_COLLAPSE with PR_THP_DISABLE_EXCEPT_ADVISED")
  9dc21bbd62 ("prctl: extend PR_SET_THP_DISABLE to optionally exclude VM_HUGEPAGE")

That don't introduce anything of interest for the tools/, just
addressing these perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/perf/trace/beauty/include/uapi/linux/prctl.h include/uapi/linux/prctl.h

Please see tools/include/uapi/README for further details.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-01 12:53:56 -03:00
Arnaldo Carvalho de Melo
76977baa83 tools headers uapi: Update fs.h with the kernel sources
To pick up changes from:

  db2ab24a34 ("Add RWF_NOSIGNAL flag for pwritev2")

These are used to beautify fs syscall arguments, albeit the changes in
this update are not affecting those beautifiers.

This addresses these tools/ build warnings:

  Warning: Kernel ABI header differences:
  diff -u tools/perf/trace/beauty/include/uapi/linux/fs.h include/uapi/linux/fs.h

Please see tools/include/uapi/README for details (it's in the first patch
of this series).

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Lauri Vasama <git@vasama.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-01 12:24:27 -03:00
Arnaldo Carvalho de Melo
3f67355979 tools arch x86: Sync msr-index.h to pick AMD64_{PERF_CNTR_GLOBAL_STATUS_SET,SAVIC_CONTROL}, IA32_L3_QOS_{ABMC,EXT}_CFG
To pick up the changes in:

  cdfed9370b ("KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header")
  bc6397cf0b ("x86/cpu/topology: Define AMD64_CPUID_EXT_FEAT MSR")
  84ecefb766 ("x86/resctrl: Add data structures and definitions for ABMC assignment")
  faebbc58cd ("x86/resctrl: Add support to enable/disable AMD ABMC feature")
  c4074ab87f ("x86/apic: Enable Secure AVIC in the control MSR")
  869e36b966 ("x86/apic: Allow NMI to be injected from hypervisor for Secure AVIC")
  30c2b98aa8 ("x86/apic: Add new driver for Secure AVIC")
  0c5caea762 ("perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag")
  68e61f6fd6 ("KVM: SVM: Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2")
  a3c4f3396b ("x86/msr-index: Add AMD workload classification MSRs")
  65f55a3017 ("x86/CPU/AMD: Add CPUID faulting support")
  17ec2f9653 ("KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is supported")

Addressing this tools/perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

That makes the beautification scripts to pick some new entries:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  --- before	2025-10-30 09:34:49.283533597 -0300
  +++ after	2025-10-30 09:35:00.971426811 -0300
  @@ -272,6 +272,9 @@
   	[0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS",
   	[0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL",
   	[0xc0000302 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS_CLR",
  +	[0xc0000303 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS_SET",
  +	[0xc00003fd - x86_64_specific_MSRs_offset] = "IA32_L3_QOS_ABMC_CFG",
  +	[0xc00003ff - x86_64_specific_MSRs_offset] = "IA32_L3_QOS_EXT_CFG",
   	[0xc0000400 - x86_64_specific_MSRs_offset] = "IA32_EVT_CFG_BASE",
   	[0xc0000500 - x86_64_specific_MSRs_offset] = "AMD_WORKLOAD_CLASS_CONFIG",
   	[0xc0000501 - x86_64_specific_MSRs_offset] = "AMD_WORKLOAD_CLASS_ID",
  @@ -319,6 +322,7 @@
   	[0xc0010133 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_RMP_END",
   	[0xc0010134 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_GUEST_TSC_FREQ",
   	[0xc0010136 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_RMP_CFG",
  +	[0xc0010138 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_SAVIC_CONTROL",
   	[0xc0010140 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_OSVW_ID_LENGTH",
   	[0xc0010141 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_OSVW_STATUS",
   	[0xc0010200 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PERF_CTL",
  $

Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written:

  root@x1:~# perf trace -e msr:*_msr/max-stack=32/ --filter="msr==IA32_L3_QOS_ABMC_CFG"
  ^Croot@x1:~#

If we use -v (verbose mode) we can see what it does behind the scenes:

  root@x1:~# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_L3_QOS_ABMC_CFG"
  0xc00003fd
  New filter for msr:write_msr: (msr==0xc00003fd) && (common_pid != 449842 && common_pid != 433756)
  0xc00003fd
  New filter for msr:read_msr: (msr==0xc00003fd) && (common_pid != 449842 && common_pid != 433756)
  mmap size 528384B
  ^Croot@x1:~#

Example with a frequent msr:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
  Using CPUID AuthenticAMD-25-21-0
  0x48
  New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  0x48
  New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  mmap size 528384B
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
   0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
                                   do_trace_write_msr ([kernel.kallsyms])
                                   do_trace_write_msr ([kernel.kallsyms])
                                   __switch_to_xtra ([kernel.kallsyms])
                                   __switch_to ([kernel.kallsyms])
                                   __schedule ([kernel.kallsyms])
                                   schedule ([kernel.kallsyms])
                                   futex_wait_queue_me ([kernel.kallsyms])
                                   futex_wait ([kernel.kallsyms])
                                   do_futex ([kernel.kallsyms])
                                   __x64_sys_futex ([kernel.kallsyms])
                                   do_syscall_64 ([kernel.kallsyms])
                                   entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                   __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
   0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
                                   do_trace_write_msr ([kernel.kallsyms])
                                   do_trace_write_msr ([kernel.kallsyms])
                                   __switch_to_xtra ([kernel.kallsyms])
                                   __switch_to ([kernel.kallsyms])
                                   __schedule ([kernel.kallsyms])
                                   schedule_idle ([kernel.kallsyms])
                                   do_idle ([kernel.kallsyms])
                                   cpu_startup_entry ([kernel.kallsyms])
                                   secondary_startup_64_no_verify ([kernel.kallsyms])
  #

Please see tools/include/uapi/README for further details.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Cc: Perry Yuan <perry.yuan@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-01 12:24:27 -03:00
Josh Poimboeuf
c44b4b9eeb objtool: Fix skip_alt_group() for non-alternative STAC/CLAC
If an insn->alt points to a STAC/CLAC instruction, skip_alt_group()
assumes it's part of an alternative ("alt group") as opposed to some
other kind of "alt" such as an exception fixup.

While that assumption may hold true in the current code base, Linus has
an out-of-tree patch which breaks that assumption by replacing the
STAC/CLAC alternatives with raw STAC/CLAC instructions.

Make skip_alt_group() more robust by making sure it's actually an alt
group before continuing.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 2d12c6fb78 ("objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC")
Closes: https://lore.kernel.org/CAHk-=wi6goUT36sR8GE47_P-aVrd5g38=VTRHpktWARbyE-0ow@mail.gmail.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://patch.msgid.link/3d22415f7b8e06a64e0873b21f48389290eeaa49.1761767616.git.jpoimboe@kernel.org
2025-11-01 07:43:20 +01:00
Linus Torvalds
ba36dd5ee6 Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:

 - Mark migrate_disable/enable() as always_inline to avoid issues with
   partial inlining (Yonghong Song)

 - Fix powerpc stack register definition in libbpf bpf_tracing.h (Andrii
   Nakryiko)

 - Reject negative head_room in __bpf_skb_change_head (Daniel Borkmann)

 - Conditionally include dynptr copy kfuncs (Malin Jonsson)

 - Sync pending IRQ work before freeing BPF ring buffer (Noorain Eqbal)

 - Do not audit capability check in x86 do_jit() (Ondrej Mosnacek)

 - Fix arm64 JIT of BPF_ST insn when it writes into arena memory
   (Puranjay Mohan)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf/arm64: Fix BPF_ST into arena memory
  bpf: Make migrate_disable always inline to avoid partial inlining
  bpf: Reject negative head_room in __bpf_skb_change_head
  bpf: Conditionally include dynptr copy kfuncs
  libbpf: Fix powerpc's stack register definition in bpf_tracing.h
  bpf: Do not audit capability check in do_jit()
  bpf: Sync pending IRQ work before freeing ring buffer
2025-10-31 18:22:26 -07:00
Wang Liang
d01f8136d4 selftests: netdevsim: Fix ethtool-coalesce.sh fail by installing ethtool-common.sh
The script "ethtool-common.sh" is not installed in INSTALL_PATH, and
triggers some errors when I try to run the test
'drivers/net/netdevsim/ethtool-coalesce.sh':

  TAP version 13
  1..1
  # timeout set to 600
  # selftests: drivers/net/netdevsim: ethtool-coalesce.sh
  # ./ethtool-coalesce.sh: line 4: ethtool-common.sh: No such file or directory
  # ./ethtool-coalesce.sh: line 25: make_netdev: command not found
  # ethtool: bad command line argument(s)
  # ./ethtool-coalesce.sh: line 124: check: command not found
  # ./ethtool-coalesce.sh: line 126: [: -eq: unary operator expected
  # FAILED /0 checks
  not ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh # exit=1

Install this file to avoid this error. After this patch:

  TAP version 13
  1..1
  # timeout set to 600
  # selftests: drivers/net/netdevsim: ethtool-coalesce.sh
  # PASSED all 22 checks
  ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh

Fixes: fbb8531e58 ("selftests: extract common functions in ethtool-common.sh")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20251030040340.3258110-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31 17:41:54 -07:00
Anubhav Singh
f8e8486702 selftests/net: use destination options instead of hop-by-hop
The GRO self-test, gro.c, currently constructs IPv6 packets containing a
Hop-by-Hop Options header (IPPROTO_HOPOPTS) to ensure the GRO path
correctly handles IPv6 extension headers.

However, network elements may be configured to drop packets with the
Hop-by-Hop Options header (HBH). This causes the self-test to fail
in environments where such network elements are present.

To improve the robustness and reliability of this test in diverse
network environments, switch from using IPPROTO_HOPOPTS to
IPPROTO_DSTOPTS (Destination Options).

The Destination Options header is less likely to be dropped by
intermediate routers and still serves the core purpose of the test:
validating GRO's handling of an IPv6 extension header. This change
ensures the test can execute successfully without being incorrectly
failed by network policies outside the kernel's control.

Fixes: 7d1575014a ("selftests/net: GRO coalesce test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Anubhav Singh <anubhavsinggh@google.com>
Link: https://patch.msgid.link/20251030060436.1556664-1-anubhavsinggh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31 17:33:17 -07:00
Anubhav Singh
02d064de05 selftests/net: fix out-of-order delivery of FIN in gro:tcp test
Due to the gro_sender sending data packets and FIN packets
in very quick succession, these are received almost simultaneously
by the gro_receiver. FIN packets are sometimes processed before the
data packets leading to intermittent (~1/100) test failures.

This change adds a delay of 100ms before sending FIN packets
in gro:tcp test to avoid the out-of-order delivery. The same
mitigation already exists for the gro:ip test.

Fixes: 7d1575014a ("selftests/net: GRO coalesce test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Anubhav Singh <anubhavsinggh@google.com>
Link: https://patch.msgid.link/20251030062818.1562228-1-anubhavsinggh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31 17:32:24 -07:00
Linus Torvalds
39bcf0f7d4 Merge tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:

 - Fix overflows in vfio type1 backend for mappings at the end of the
   64-bit address space, resulting in leaked pinned memory.

   New selftest support included to avoid such issues in the future
   (Alex Mastro)

* tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio:
  vfio: selftests: add end of address space DMA map/unmap tests
  vfio: selftests: update DMA map/unmap helpers to support more test kinds
  vfio/type1: handle DMA map/unmap up to the addressable limit
  vfio/type1: move iova increment to unmap_unpin_*() caller
  vfio/type1: sanitize for overflow using check_*_overflow()
2025-10-31 14:20:09 -07:00
Jakub Kicinski
1a2352ad82 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.18-rc4).

No conflicts, adjacent changes:

drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
  ded9813d17 ("net: stmmac: Consider Tx VLAN offload tag length for maxSDU")
  26ab9830be ("net: stmmac: replace has_xxxx with core_type")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31 06:46:03 -07:00
Linus Torvalds
d127176862 Merge tag 'linux_kselftest-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
 "Fix build warning in cachestat found during clang build and add
  tmpshmcstat to .gitignore"

* tag 'linux_kselftest-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: cachestat: Fix warning on declaration under label
  selftests/cachestat: add tmpshmcstat file to .gitignore
2025-10-30 19:48:13 -07:00
Linus Torvalds
e576349123 Merge tag 'net-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from wireless, Bluetooth and netfilter.

  Current release - regressions:

    - tcp: fix too slow tcp_rcvbuf_grow() action

    - bluetooth: fix corruption in h4_recv_buf() after cleanup

  Previous releases - regressions:

    - mptcp: restore window probe

    - bluetooth:
       - fix connection cleanup with BIG with 2 or more BIS
       - fix crash in set_mesh_sync and set_mesh_complete

    - batman-adv: release references to inactive interfaces

    - nic:
       - ice: fix usage of logical PF id
       - sfc: fix potential memory leak in efx_mae_process_mport()

  Previous releases - always broken:

    - devmem: refresh devmem TX dst in case of route invalidation

    - netfilter: add seqadj extension for natted connections

    - wifi:
       - iwlwifi: fix potential use after free in iwl_mld_remove_link()
       - brcmfmac: fix crash while sending action frames in standalone AP Mode

    - eth:
       - mlx5e: cancel tls RX async resync request in error flows
       - ixgbe: fix memory leak and use-after-free in ixgbe_recovery_probe()
       - hibmcge: fix rx buf avl irq is not re-enabled in irq_handle issue
       - cxgb4: fix potential use-after-free in ipsec callback
       - nfp: fix memory leak in nfp_net_alloc()"

* tag 'net-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits)
  net: sctp: fix KMSAN uninit-value in sctp_inq_pop
  net: devmem: refresh devmem TX dst in case of route invalidation
  net: stmmac: est: Fix GCL bounds checks
  net: stmmac: Consider Tx VLAN offload tag length for maxSDU
  net: stmmac: vlan: Disable 802.1AD tag insertion offload
  net/mlx5e: kTLS, Cancel RX async resync request in error flows
  net: tls: Cancel RX async resync request on rcd_delta overflow
  net: tls: Change async resync helpers argument
  net: phy: dp83869: fix STRAP_OPMODE bitmask
  selftests: net: use BASH for bareudp testing
  net: mctp: Fix tx queue stall
  net/mlx5: Don't zero user_count when destroying FDB tables
  net: usb: asix_devices: Check return value of usbnet_get_endpoints
  mptcp: zero window probe mib
  mptcp: restore window probe
  mptcp: fix MSG_PEEK stream corruption
  mptcp: drop bogus optimization in __mptcp_check_push()
  netconsole: Fix race condition in between reader and writer of userdata
  Documentation: netconsole: Remove obsolete contact people
  nfp: xsk: fix memory leak in nfp_net_alloc()
  ...
2025-10-30 18:35:35 -07:00
Jakub Kicinski
ecca75ae5a selftests: drv-net: replace the nsim ring test with a drv-net one
We are trying to move away from netdevsim-only tests and towards
tests which can be run both against netdevsim and real drivers.

Replace the simple bash script we have for checking ethtool -g/-G
on netdevsim with a Python test tweaking those params as well
as channel count.

The new test is not exactly equivalent to the netdevsim one,
but real drivers don't often support random ring sizes,
let alone modifying max values via debugfs.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251029164930.2923448-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-30 17:32:18 -07:00
Mark Brown
a186fbcfd8 KVM: arm64: selftests: Filter ZCR_EL2 in get-reg-list
get-reg-list includes ZCR_EL2 in the list of EL2 registers that it looks
for when NV is enabled but does not have any feature gate for this register,
meaning that testing any combination of features that includes EL2 but does
not include SVE will result in a test failure due to a missing register
being reported:

| The following lines are missing registers:
|
|	ARM64_SYS_REG(3, 4, 1, 2, 0),

Add ZCR_EL2 to feat_id_regs so that the test knows not to expect to see it
without SVE being enabled.

Fixes: 3a90b6f279 ("KVM: arm64: selftests: get-reg-list: Add base EL2 registers")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20251024-kvm-arm64-get-reg-list-zcr-el2-v1-1-0cd0ff75e22f@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-30 16:13:27 +00:00