Commit Graph

48247 Commits

Author SHA1 Message Date
Sean Christopherson
1f2bbbbbda KVM: x86: Merge 'selftests' into 'cet' to pick up ex_str()
Merge the queue of KVM selftests changes for 6.18 to pick up the ex_str()
helper so that it can be used to pretty print expected versus actual
exceptions in a new MSR selftest.  CET virtualization will add support for
several MSRs with non-trivial semantics, along with new uAPI for accessing
the guest's Shadow Stack Pointer (SSP) from userspace.
2025-09-23 09:00:18 -07:00
Sean Christopherson
df1f294013 KVM: selftests: Add ex_str() to print human friendly name of exception vectors
Steal exception_mnemonic() from KVM-Unit-Tests as ex_str() (to keep line
lengths reasonable) and use it in assert messages that currently print the
raw vector number.

Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-45-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:39:02 -07:00
Sukrut Heroorkar
ff86b48d4c selftests/kvm: remove stale TODO in xapic_state_test
The TODO about using the number of vCPUs instead of vcpu.id + 1
was already addressed by commit 376bc1b458 ("KVM: selftests: Don't
assume vcpu->id is '0' in xAPIC state test"). The comment is now
stale and can be removed.

Signed-off-by: Sukrut Heroorkar <hsukrut3@gmail.com>
Link: https://lore.kernel.org/r/20250908210547.12748-1-hsukrut3@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:39:01 -07:00
dongsheng
c435978e4f KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount
Add a PMU errata framework and use it to relax precise event counts on
Atom platforms that overcount "Instruction Retired" and "Branch Instruction
Retired" events, as the overcount issues on VM-Exit/VM-Entry are impossible
to prevent from userspace, e.g. the test can't prevent host IRQs.

Setup errata during early initialization and automatically sync the mask
to VMs so that tests can check for errata without having to manually
manage host=>guest variables.

For Intel Atom CPUs, the PMU events "Instruction Retired" or
"Branch Instruction Retired" may be overcounted for some certain
instructions, like FAR CALL/JMP, RETF, IRET, VMENTRY/VMEXIT/VMPTRLD
and complex SGX/SMX/CSTATE instructions/flows.

The detailed information can be found in the errata (section SRF7):
https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/sierra-forest/xeon-6700-series-processor-with-e-cores-specification-update/errata-details/

For the Atom platforms before Sierra Forest (including Sierra Forest),
Both 2 events "Instruction Retired" and "Branch Instruction Retired" would
be overcounted on these certain instructions, but for Clearwater Forest
only "Instruction Retired" event is overcounted on these instructions.

Signed-off-by: dongsheng <dongsheng.x.zhang@intel.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:38:59 -07:00
Dapeng Mi
2922b59588 KVM: selftests: Validate more arch-events in pmu_counters_test
Add support for 5 new architectural events (4 topdown level 1 metrics
events and LBR inserts event) that will first show up in Intel's
Clearwater Forest CPUs.  Detailed info about the new events can be found
in SDM section 21.2.7 "Pre-defined Architectural  Performance Events".

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
[sean: drop "unavailable_mask" changes]
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:38:59 -07:00
Sean Christopherson
1fcd3053aa KVM: selftests: Reduce number of "unavailable PMU events" combos tested
Reduce the number of combinations of unavailable PMU events masks that are
testing by the PMU counters test.  In reality, testing every possible
combination isn't all that interesting, and certainly not worth the tens
of seconds (or worse, minutes) of runtime.  Fully testing the N^2 space
will be especially problematic in the near future, as 5! new arch events
are on their way.

Use alternating bit patterns (and 0 and -1u) in the hopes that _if_ there
is ever a KVM bug, it's not something horribly convoluted that shows up
only with a super specific pattern/value.

Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:38:59 -07:00
Sean Christopherson
571fc2833e KVM: selftests: Track unavailable_mask for PMU events as 32-bit value
Track the mask of "unavailable" PMU events as a 32-bit value.  While bits
31:9 are currently reserved, silently truncating those bits is unnecessary
and asking for missed coverage.  To avoid running afoul of the sanity check
in vcpu_set_cpuid_property(), explicitly adjust the mask based on the
non-reserved bits as reported by KVM's supported CPUID.

Opportunistically update the "all ones" testcase to pass -1u instead of
0xff.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:38:59 -07:00
Dapeng Mi
210b09fa42 KVM: selftests: Add timing_info bit support in vmx_pmu_caps_test
A new bit PERF_CAPABILITIES[17] called "PEBS_TIMING_INFO" bit is added
to indicated if PEBS supports to record timing information in a new
"Retried Latency" field.

Since KVM requires user can only set host consistent PEBS capabilities,
otherwise the PERF_CAPABILITIES setting would fail, add pebs_timing_info
into the "immutable_caps" to block host inconsistent PEBS configuration
and cause errors.

Opportunistically drop the anythread_deprecated bit.  It isn't and likely
never was a PERF_CAPABILITIES flag, the test's definition snuck in when
the union was copy+pasted from the kernel's definition.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
[sean: call out anythread_deprecated change]
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:38:59 -07:00
Sean Christopherson
e8f85d7884 KVM: x86: Don't treat ENTER and LEAVE as branches, because they aren't
Remove the IsBranch flag from ENTER and LEAVE in KVM's emulator, as ENTER
and LEAVE are stack operations, not branches.  Add forced emulation of
said instructions to the PMU counters test to prove that KVM diverges from
hardware, and to guard against regressions.

Opportunistically add a missing "1 MOV" to the selftest comment regarding
the number of instructions per loop, which commit 7803339fa9 ("KVM:
selftests: Use data load to trigger LLC references/misses in Intel PMU")
forgot to add.

Fixes: 018d70ffcf ("KVM: x86: Update vPMCs when retiring branch instructions")
Cc: Jim Mattson <jmattson@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919004639.1360453-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-22 07:14:05 -07:00
Sean Christopherson
aebc62b3de KVM: selftests: Add support for DIV and IDIV in the fastops test
Extend the fastops test coverage to DIV and IDIV, specifically to provide
coverage for #DE (divide error) exceptions, as #DE is the only exception
that can occur in KVM's fastops path, i.e. that requires exception fixup.

Link: https://lore.kernel.org/r/20250909202835.333554-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-11 08:55:44 -07:00
Sean Christopherson
fe08478b1d KVM: selftests: Dedup the gnarly constraints of the fastops tests (more macros!)
Add a fastop() macro along with macros to define its required constraints,
and use the macros to dedup the innermost guts of the fastop testcases.

No functional change intended.

Link: https://lore.kernel.org/r/20250909202835.333554-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-11 08:55:44 -07:00
Sean Christopherson
9bf5da1ca4 KVM: selftests: Add coverage for 'b' (byte) sized fastops emulation
Extend the fastops test to cover instructions that operate on 8-bit data.
Support for 8-bit instructions was omitted from the original commit purely
due to complications with BT not having a r/m8 variant.  To keep the
RFLAGS.CF behavior deterministic and not heavily biased to '0' or '1',
continue using BT, but cast and load the to-be-tested value into a
dedicated 32-bit constraint.

Supporting 8-bit operations will allow using guest_test_fastops() as-is to
provide full coverage for DIV and IDIV.  For divide operations, covering
all operand sizes _is_ interesting, because KVM needs provide exception
fixup for each size (failure to handle a #DE could panic the host).

Link: https://lore.kernel.org/all/aIF7ZhWZxlkcpm4y@google.com
Link: https://lore.kernel.org/r/20250909202835.333554-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-11 08:55:44 -07:00
Sean Christopherson
7b39b6c769 KVM: selftests: Add support for #DE exception fixup
Add support for handling #DE (divide error) exceptions in KVM selftests
so that the fastops test can verify KVM correctly handles #DE when
emulating DIV or IDIV on behalf of the guest.  Morph #DE to 0xff (i.e.
to -1) as a mostly-arbitrary vector to indicate #DE, so that '0' (the
real #DE vector) can still be used to indicate "no exception".

Link: https://lore.kernel.org/r/20250909202835.333554-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-11 08:55:44 -07:00
Alok Tiwari
665071186c KVM: selftests: Fix typo in hyperv cpuid test message
Fix a typo in hyperv_cpuid.c test assertion log:
replace "our of supported range" -> "out of supported range".

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://lore.kernel.org/r/20250824181642.629297-1-alok.a.tiwari@oracle.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-08 13:03:22 -07:00
Gopi Krishna Menon
05f297c3e3 KVM: selftests: fix minor typo in cpumodel_subfuncs
Specifically, fix spelling of "available" in main function.

Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com>
Link: https://lore.kernel.org/r/20250813154751.5725-1-krishnagopi487@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19 10:50:59 -07:00
Sean Christopherson
e2bcf62a2e KVM: selftests: Move Intel and AMD module param helpers to x86/processor.h
Move the x86 specific helpers for getting kvm_{amd,intel} module params to
x86 where they belong.  Expose the module-agnostic helpers globally, there
is nothing secret about the logic.

Link: https://lore.kernel.org/r/20250806225159.1687326-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19 08:00:29 -07:00
James Houghton
a585b87614 KVM: selftests: Fix signedness issue with vCPU mmap size check
Check that the return value of KVM_GET_VCPU_MMAP_SIZE is non-negative
before comparing with sizeof(kvm_run). If KVM_GET_VCPU_MMAP_SIZE fails,
it will return -1, and `-1 > sizeof(kvm_run)` is true, so the ASSERT
passes.

There are no other locations in tools/testing/selftests/kvm that make
the same mistake.

Signed-off-by: James Houghton <jthoughton@google.com>
Link: https://lore.kernel.org/r/20250711001742.1965347-1-jthoughton@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19 07:43:56 -07:00
Linus Torvalds
63467137ec Merge tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from Netfilter and IPsec.

  Current release - regressions:

   - netfilter: nft_set_pipapo:
      - don't return bogus extension pointer
      - fix null deref for empty set

  Current release - new code bugs:

   - core: prevent deadlocks when enabling NAPIs with mixed kthread
     config

   - eth: netdevsim: Fix wild pointer access in nsim_queue_free().

  Previous releases - regressions:

   - page_pool: allow enabling recycling late, fix false positive
     warning

   - sched: ets: use old 'nbands' while purging unused classes

   - xfrm:
      - restore GSO for SW crypto
      - bring back device check in validate_xmit_xfrm

   - tls: handle data disappearing from under the TLS ULP

   - ptp: prevent possible ABBA deadlock in ptp_clock_freerun()

   - eth:
      - bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
      - hv_netvsc: fix panic during namespace deletion with VF

  Previous releases - always broken:

   - netfilter: fix refcount leak on table dump

   - vsock: do not allow binding to VMADDR_PORT_ANY

   - sctp: linearize cloned gso packets in sctp_rcv

   - eth:
      - hibmcge: fix the division by zero issue
      - microchip: fix KSZ8863 reset problem"

* tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
  net: usb: asix_devices: add phy_mask for ax88772 mdio bus
  net: kcm: Fix race condition in kcm_unattach()
  selftests: net/forwarding: test purge of active DWRR classes
  net/sched: ets: use old 'nbands' while purging unused classes
  bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
  netdevsim: Fix wild pointer access in nsim_queue_free().
  net: mctp: Fix bad kfree_skb in bind lookup test
  netfilter: nf_tables: reject duplicate device on updates
  ipvs: Fix estimator kthreads preferred affinity
  netfilter: nft_set_pipapo: fix null deref for empty set
  selftests: tls: test TCP stealing data from under the TLS socket
  tls: handle data disappearing from under the TLS ULP
  ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
  ixgbe: prevent from unwanted interface name changes
  devlink: let driver opt out of automatic phys_port_name generation
  net: prevent deadlocks when enabling NAPIs with mixed kthread config
  net: update NAPI threaded config even for disabled NAPIs
  selftests: drv-net: don't assume device has only 2 queues
  docs: Fix name for net.ipv4.udp_child_hash_entries
  riscv: dts: thead: Add APB clocks for TH1520 GMACs
  ...
2025-08-14 07:14:30 -07:00
Davide Caratti
774a2ae661 selftests: net/forwarding: test purge of active DWRR classes
Extend sch_ets.sh to add a reproducer for problematic list deletions when
active DWRR class are purged by ets_qdisc_change() [1] [2].

[1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/
[2] https://lore.kernel.org/netdev/f3b9bacc73145f265c19ab80785933da5b7cbdec.1754581577.git.dcaratti@redhat.com/

Suggested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/489497cb781af7389011ca1591fb702a7391f5e7.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-13 18:11:48 -07:00
Linus Torvalds
91325f31af Merge tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "12 hotfixes. 5 are cc:stable and the remainder address post-6.16
  issues or aren't considered necessary for -stable kernels.

  10 of these fixes are for MM"

* tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  proc: proc_maps_open allow proc_mem_open to return NULL
  mm/mremap: avoid expensive folio lookup on mremap folio pte batch
  userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
  mm: pass page directly instead of using folio_page
  selftests/proc: fix string literal warning in proc-maps-race.c
  fs/proc/task_mmu: hold PTL in pagemap_hugetlb_range and gather_hugetlb_stats
  mm/smaps: fix race between smaps_hugetlb_range and migration
  mm: fix the race between collapse and PT_RECLAIM under per-vma lock
  mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup()
  MAINTAINERS: add Masami as a reviewer of hung task detector
  mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock
  kasan/test: fix protection against compiler elision
2025-08-13 08:28:33 -07:00
Jakub Kicinski
d7e82594a4 selftests: tls: test TCP stealing data from under the TLS socket
Check a race where data disappears from the TCP socket after
TLS signaled that its ready to receive.

  ok 6 global.data_steal
  #  RUN           tls_basic.base_base ...
  #            OK  tls_basic.base_base

Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250807232907.600366-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-12 18:59:06 -07:00
Jakub Kicinski
bda053d644 selftests: drv-net: don't assume device has only 2 queues
The test is implicitly assuming the device only has 2 queues.
A real device will likely have more. The exact problem is that
because NAPIs get added to the list from the head, the netlink
dump reports them in reverse order. So the naive napis[0] will
actually likely give us the _last_ NAPI, not the first one.
Re-enable all the NAPIs instead of hard-coding 2 in the test.
This way the NAPIs we operated on will always reappear,
doesn't matter where they were in the registration order.

Fixes: e6d7626881 ("net: Update threaded state in napi config in netif_set_threaded")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250809001205.1147153-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-12 14:43:05 +02:00
Sukrut Heroorkar
ab5ac789ef selftests/proc: fix string literal warning in proc-maps-race.c
This change resolves non literal string format warning invoked for
proc-maps-race.c while compiling.

proc-maps-race.c:205:17: warning: format not a string literal and no format arguments [-Wformat-security]
 205 |                 printf(text);
     |                 ^~~~~~
proc-maps-race.c:209:17: warning: format not a string literal and no format arguments [-Wformat-security]
 209 |                 printf(text);
     |                 ^~~~~~
proc-maps-race.c: In function `print_last_lines':
proc-maps-race.c:224:9: warning: format not a string literal and no format arguments [-Wformat-security]
 224 |         printf(start);
     |         ^~~~~~

Add string format specifier %s for the printf calls in both
print_first_lines() and print_last_lines() thus resolving the warnings.

The test executes fine after this change thus causing no effect to the
functional behavior of the test.

Link: https://lkml.kernel.org/r/20250804225633.841777-1-hsukrut3@gmail.com
Fixes: aadc099c48 ("selftests/proc: add verbose mode for /proc/pid/maps tearing tests")
Signed-off-by: Sukrut Heroorkar <hsukrut3@gmail.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Hunter <david.hunter.linux@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-11 23:00:59 -07:00
Linus Torvalds
2b38afce25 Merge tag 'turbostat-2025.09.09' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
 "tools/power turbostat: version 2025.09.09

   - Probe and display L3 Cache topology

   - Add ability to average an added counter (useful for pre-integrated
     "counters", such as Watts)

   - Break the limit of 64 built-in counters

   - Assorted bug fixes and minor feature tweaks"

* tag 'turbostat-2025.09.09' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: version 2025.09.09
  tools/power turbostat: Handle non-root legacy-uncore sysfs permissions
  tools/power turbostat: standardize PER_THREAD_PARAMS
  tools/power turbostat: Fix DMR support
  tools/power turbostat: add format "average" for external attributes
  tools/power turbostat: delete GET_PKG()
  tools/power turbostat: probe and display L3 cache topology
  tools/power turbostat: Support more than 64 built-in-counters
  tools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columns
  tools/power turbostat: Fix bogus SysWatt for forked program
  tools/power turbostat: Handle cap_get_proc() ENOSYS
  tools/power turbostat: Fix build with musl
  tools/power turbostat: verify arguments to params --show and --hide
  tools/power turbostat: regression fix: --show C1E%
2025-08-10 09:02:36 +03:00
Len Brown
5e98a5e73e tools/power turbostat: version 2025.09.09
Probe and display L3 Cache topology
Add ability to average an added counter
	(useful for pre-integrated "counters", such as Watts)
Break the limit of 64 built-in counters.
Assorted bug fixes and minor feature tweaks

Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09 21:24:46 -04:00
Len Brown
e60a13bcef tools/power turbostat: Handle non-root legacy-uncore sysfs permissions
/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/
may be readable by all, but
/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/current_freq_khz
may be readable only by root.

Non-root turbostat users see complaints in this scenario.

Fail probe of the interface if we can't read current_freq_khz.

Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Original-patch-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09 21:24:46 -04:00
Len Brown
378e901160 tools/power turbostat: standardize PER_THREAD_PARAMS
use a macro for PER_THREAD_PARAMS to make adding one later more clear.

no functional change

Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09 21:24:46 -04:00
Zhang Rui
3a088b07c4 tools/power turbostat: Fix DMR support
Together with the RAPL MSRs, there are more MSRs gone on DMR, including
PLR (Perf Limit Reasons), and IRTL (Package cstate Interrupt Response
Time Limit) MSRs. The configurable TDP info should also be retrieved
from TPMI based Intel Speed Select Technology feature.

Remove the access of these MSRs for DMR. Improve the DMR platform
feature table to make it more readable at the same time.

Fixes: 83075bd59d ("tools/power turbostat: Add initial support for DMR")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09 21:24:46 -04:00
Michael Hebenstreit
dcd1c379b0 tools/power turbostat: add format "average" for external attributes
External atributes with format "raw" are not printed in summary lines
for nodes/packages (or with option -S). The new format "average"
behaves like "raw" but also adds the summary data

Signed-off-by: Michael Hebenstreit <michael.hebenstreit@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09 21:24:46 -04:00
Len Brown
a5015d945d tools/power turbostat: delete GET_PKG()
pkg_base[pkg_id] is a simple array of structure pointers,
let the compiler treat it that way.

Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09 21:24:46 -04:00
Len Brown
5f961fb2a7 tools/power turbostat: probe and display L3 cache topology
Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09 21:24:46 -04:00
Len Brown
8d14a098b4 tools/power turbostat: Support more than 64 built-in-counters
We have out-grown the ability to use a 64-bit memory location
to inventory every possible built-in counter.
Leverage the the CPU_SET(3) macros to break this barrier.

Also, break the Joules & Watts counters into two,
since we can no longer 'or' them together...

Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09 21:23:45 -04:00
Len Brown
d240b441b5 tools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columns
Explain the meaning of the Totl%C0, Any%C0, GFX%C0, CPUGFX% columns.

Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09 11:14:30 -04:00
Linus Torvalds
c30a13538d Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:

 - Fix memory leak of bpf_scc_info objects (Eduard Zingerman)

 - Fix a regression in the 'perf' tool caused by moving UID filtering to
   BPF (Ilya Leoshkevich)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  perf bpf-filter: Enable events manually
  libbpf: Add the ability to suppress perf event enablement
  bpf: Fix memory leak of bpf_scc_info objects
2025-08-09 09:03:21 +03:00
Zhang Rui
44207567fa tools/power turbostat: Fix bogus SysWatt for forked program
Similar to delta_cpu(), delta_platform() is called in turbostat main
loop. This ensures accurate SysWatt readings in periodic monitoring mode
$ sudo turbostat -S -q --show power -i 1
CoreTmp	PkgTmp	PkgWatt	CorWatt	GFXWatt	RAMWatt	PKG_%	RAM_%	SysWatt
60	61	6.21	1.13	0.16	0.00	0.00	0.00	13.07
58	61	6.00	1.07	0.18	0.00	0.00	0.00	12.75
58	61	5.74	1.05	0.17	0.00	0.00	0.00	12.22
58	60	6.27	1.11	0.24	0.00	0.00	0.00	13.55

However, delta_platform() is missing for forked program and causes bogus
SysWatt reporting,
$ sudo turbostat -S -q --show power sleep 1
1.004736 sec
CoreTmp	PkgTmp	PkgWatt	CorWatt	GFXWatt	RAMWatt	PKG_%	RAM_%	SysWatt
57	58	6.05	1.02	0.16	0.00	0.00	0.00	0.03

Add missing delta_platform() for forked program.

Fixes: e5f687b89b ("tools/power turbostat: Add RAPL psys as a built-in counter")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08 16:05:54 -04:00
Calvin Owens
d34fe509f5 tools/power turbostat: Handle cap_get_proc() ENOSYS
Kernels configured with CONFIG_MULTIUSER=n have no cap_get_proc().
Check for ENOSYS to recognize this case, and continue on to
attempt to access the requested MSRs (such as temperature).

Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08 16:05:53 -04:00
Calvin Owens
6ea0ec1b95 tools/power turbostat: Fix build with musl
turbostat.c: In function 'parse_int_file':
    turbostat.c:5567:19: error: 'PATH_MAX' undeclared (first use in this function)
     5567 |         char path[PATH_MAX];
          |                   ^~~~~~~~

    turbostat.c: In function 'probe_graphics':
    turbostat.c:6787:19: error: 'PATH_MAX' undeclared (first use in this function)
     6787 |         char path[PATH_MAX];
          |                   ^~~~~~~~

Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08 16:05:53 -04:00
Len Brown
d44c40e4e3 tools/power turbostat: verify arguments to params --show and --hide
$ sudo turbostat --quiet --show junk
turbostat: Counter 'junk' can not be added.

Previously, invalid arguments to --show and --hide were silently ignored

Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08 16:05:27 -04:00
Linus Torvalds
3781648824 Merge tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
  Previous releases - regressions:

   - netlink: avoid infinite retry looping in netlink_unicast()

  Previous releases - always broken:

   - packet: fix a race in packet_set_ring() and packet_notifier()

   - ipv6: reject malicious packets in ipv6_gso_segment()

   - sched: mqprio: fix stack out-of-bounds write in tc entry parsing

   - net: drop UFO packets (injected via virtio) in udp_rcv_segment()

   - eth: mlx5: correctly set gso_segs when LRO is used, avoid false
     positive checksum validation errors

   - netpoll: prevent hanging NAPI when netcons gets enabled

   - phy: mscc: fix parsing of unicast frames for PTP timestamping

   - a number of device tree / OF reference leak fixes"

* tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
  pptp: fix pptp_xmit() error path
  net: ti: icssg-prueth: Fix skb handling for XDP_PASS
  net: Update threaded state in napi config in netif_set_threaded
  selftests: netdevsim: Xfail nexthop test on slow machines
  eth: fbnic: Lock the tx_dropped update
  eth: fbnic: Fix tx_dropped reporting
  eth: fbnic: remove the debugging trick of super high page bias
  net: ftgmac100: fix potential NULL pointer access in ftgmac100_phy_disconnect
  dt-bindings: net: Replace bouncing Alexandru Tachici emails
  dpll: zl3073x: ZL3073X_I2C and ZL3073X_SPI should depend on NET
  net/sched: mqprio: fix stack out-of-bounds write in tc entry parsing
  Revert "net: mdio_bus: Use devm for getting reset GPIO"
  selftests: net: packetdrill: xfail all problems on slow machines
  net/packet: fix a race in packet_set_ring() and packet_notifier()
  benet: fix BUG when creating VFs
  net: airoha: npu: Add missing MODULE_FIRMWARE macros
  net: devmem: fix DMA direction on unmapping
  ipa: fix compile-testing with qcom-mdt=m
  eth: fbnic: unlink NAPIs from queues on error to open
  net: Add locking to protect skb->dev access in ip_output
  ...
2025-08-08 07:03:25 +03:00
Ilya Leoshkevich
5e2ac8e857 perf bpf-filter: Enable events manually
On s390, and, in general, on all platforms where the respective event
supports auxiliary data gathering, the command:

   # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.011 MB perf.data ]
   # ./perf report --stats | grep SAMPLE
   #

does not generate samples in the perf.data file. On x86 the command:

  # sudo perf record -e intel_pt// -u 0 ls

is broken too.

Looking at the sequence of calls in 'perf record' reveals this
behavior:

1. The event 'cycles' is created and enabled:

   record__open()
   +-> evlist__apply_filters()
       +-> perf_bpf_filter__prepare()
	   +-> bpf_program.attach_perf_event()
	       +-> bpf_program.attach_perf_event_opts()
	           +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...)

   The event 'cycles' is enabled and active now. However the event's
   ring-buffer to store the samples generated by hardware is not
   allocated yet.

2. The event's fd is mmap()ed to create the ring buffer:

   record__open()
   +-> record__mmap()
       +-> record__mmap_evlist()
	   +-> evlist__mmap_ex()
	       +-> perf_evlist__mmap_ops()
	           +-> mmap_per_cpu()
	               +-> mmap_per_evsel()
	                   +-> mmap__mmap()
	                       +-> perf_mmap__mmap()
	                           +-> mmap()

   This allocates the ring buffer for the event 'cycles'. With mmap()
   the kernel creates the ring buffer:

   perf_mmap(): kernel function to create the event's ring
   |            buffer to save the sampled data.
   |
   +-> ring_buffer_attach(): Allocates memory for ring buffer.
       |        The PMU has auxiliary data setup function. The
       |        has_aux(event) condition is true and the PMU's
       |        stop() is called to stop sampling. It is not
       |        restarted:
       |
       |        if (has_aux(event))
       |                perf_event_stop(event, 0);
       |
       +-> cpumsf_pmu_stop():

   Hardware sampling is stopped. No samples are generated and saved
   anymore.

3. After the event 'cycles' has been mapped, the event is enabled a
   second time in:

   __cmd_record()
   +-> evlist__enable()
       +-> __evlist__enable()
	   +-> evsel__enable_cpu()
	       +-> perf_evsel__enable_cpu()
	           +-> perf_evsel__run_ioctl()
	               +-> perf_evsel__ioctl()
	                   +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .)

   The second

      ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);

   is just a NOP in this case. The first invocation in (1.) sets the
   event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions

   perf_ioctl()
   +-> _perf_ioctl()
       +-> _perf_event_enable()
           +-> __perf_event_enable()

   return immediately because event::state is already set to
   PERF_EVENT_STATE_ACTIVE.

This happens on s390, because the event 'cycles' offers the possibility
to save auxilary data. The PMU callbacks setup_aux() and free_aux() are
defined. Without both callback functions, cpumsf_pmu_stop() is not
invoked and sampling continues.

To remedy this, remove the first invocation of

   ioctl(..., PERF_EVENT_IOC_ENABLE, ...).

in step (1.) Create the event in step (1.) and enable it in step (3.)
after the ring buffer has been mapped.

Output after:

 # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2
 [ perf record: Woken up 3 times to write data ]
 [ perf record: Captured and wrote 0.876 MB perf.data ]
 # ./perf  report --stats | grep SAMPLE
              SAMPLE events:      16200  (99.5%)
              SAMPLE events:      16200
 #

The software event succeeded both before and after the patch:

 # ./perf record -e cpu-clock -aB --synth=no -u 0 -- \
					  ./perf test -w thloop 2
 [ perf record: Woken up 7 times to write data ]
 [ perf record: Captured and wrote 2.870 MB perf.data ]
 # ./perf  report --stats | grep SAMPLE
              SAMPLE events:      53506  (99.8%)
              SAMPLE events:      53506
 #

Fixes: b4c658d4d6 ("perf target: Remove uid from target")
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Co-developed-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250806162417.19666-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-07 09:03:44 -07:00
Ilya Leoshkevich
9474e27a24 libbpf: Add the ability to suppress perf event enablement
Automatically enabling a perf event after attaching a BPF prog to it is
not always desirable.

Add a new "dont_enable" field to struct bpf_perf_event_opts. While
introducing "enable" instead would be nicer in that it would avoid
a double negation in the implementation, it would make
DECLARE_LIBBPF_OPTS() less efficient.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Co-developed-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250806162417.19666-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-07 09:01:41 -07:00
Linus Torvalds
adf12a394c Merge tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
Pull perf fixes from Thomas Gleixner:
 "Perf fixes for perf_mmap() reference counting to prevent potential
  reference count leaks which are caused by:

   - VMA splits, which change the offset or size of a mapping, which
     causes perf_mmap_close() to ignore the unmap or unmap the wrong
     buffer.

   - Several internal issues of perf_mmap(), which can cause reference
     count leaks in the perf mmap, corrupt accounting or cause leaks in
     perf drivers.

  The main fix is to prevent VMA splits by implementing the
  [may_]split() callback for vm operations.

  The other issues are addressed by rearranging code, early returns on
  failure and invocation of cleanups.

  Also provide a selftest to validate the fixes.

  The reference counting should be converted to refcount_t, but that
  requires larger refactoring of the code and will be done once these
  fixes are upstream"

* tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git:
  selftests/perf_events: Add a mmap() correctness test
  perf/core: Prevent VMA split of buffer mappings
  perf/core: Handle buffer mapping fail correctly in perf_mmap()
  perf/core: Exit early on perf_mmap() fail
  perf/core: Don't leak AUX buffer refcount on allocation failure
  perf/core: Preserve AUX buffer allocation failure result
2025-08-06 04:41:21 +03:00
Samiullah Khawaja
e6d7626881 net: Update threaded state in napi config in netif_set_threaded
Commit 2677010e77 ("Add support to set NAPI threaded for individual
NAPI") added support to enable/disable threaded napi using netlink. This
also extended the napi config save/restore functionality to set the napi
threaded state. This breaks netdev reset for drivers that use napi
threaded at device level and also use napi config save/restore on
napi_disable/napi_enable. Basically on netdev with napi threaded enabled
at device level, a napi_enable call will get stuck trying to stop the
napi kthread. This is because the napi->config->threaded is set to
disabled when threaded is enabled at device level.

The issue can be reproduced on virtio-net device using qemu. To
reproduce the issue run following,

  echo 1 > /sys/class/net/threaded
  ethtool -L eth0 combined 1

Update the threaded state in napi config in netif_set_threaded and add a
new test that verifies this scenario.

Tested on qemu with virtio-net:
 NETIF=eth0 ./tools/testing/selftests/drivers/net/napi_threaded.py
 TAP version 13
 1..2
 ok 1 napi_threaded.change_num_queues
 ok 2 napi_threaded.enable_dev_threaded_disable_napi_threaded
 # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0

Fixes: 2677010e77 ("Add support to set NAPI threaded for individual NAPI")
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Link: https://patch.msgid.link/20250804164457.2494390-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-05 17:46:15 -07:00
Ido Schimmel
8d22aea8af selftests: netdevsim: Xfail nexthop test on slow machines
A lot of test cases in the file are related to the idle and unbalanced
timers of resilient nexthop groups and these tests are reported to be
flaky on slow machines running debug kernels.

Rather than marking a lot of individual tests with xfail_on_slow(),
simply mark all the tests. Note that the test is stable on non-debug
machines and that with debug kernels we are mainly interested in the
output of various sanitizers in order to determine pass / fail.

Before:

 # make -C tools/testing/selftests KSFT_MACHINE_SLOW=yes \
 	TARGETS=drivers/net/netdevsim TEST_PROGS=nexthop.sh \
 	TEST_GEN_PROGS="" run_tests
 [...]
 # TEST: Bucket migration after idle timer (with delete)               [FAIL]
 #       Group expected to still be unbalanced
 [...]
 not ok 1 selftests: drivers/net/netdevsim: nexthop.sh # exit=1

After:

 # make -C tools/testing/selftests KSFT_MACHINE_SLOW=yes \
 	TARGETS=drivers/net/netdevsim TEST_PROGS=nexthop.sh \
 	TEST_GEN_PROGS="" run_tests
 [...]
 # TEST: Bucket migration after idle timer (with delete)               [XFAIL]
 #       Group expected to still be unbalanced
 [...]
 ok 1 selftests: drivers/net/netdevsim: nexthop.sh

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20250729160609.02e0f157@kernel.org/
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250804114320.193203-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-05 16:11:03 -07:00
Lorenzo Stoakes
084d2ac403 selftests/perf_events: Add a mmap() correctness test
Exercise various mmap(), munmap() and mremap() invocations, which might
cause a perf buffer mapping to be split or truncated.

To avoid hard coding the perf event and having dependencies on
architectures and configuration options, scan through event types in sysfs
and try to open them. On success, try to mmap() and if that succeeds try to
mmap() the AUX buffer.

In case that no AUX buffer supporting event is found, only test the base
buffer mapping. If no mappable event is found or permissions are not
sufficient, skip the tests.

Reserve a PROT_NONE region for both rb and aux tests to allow testing the
case where mremap unmaps beyond the end of a mapped VMA to prevent it from
unmapping unrelated mappings.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>            
Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
2025-08-05 21:55:29 +02:00
Linus Torvalds
da23ea194d Merge tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
 "Significant patch series in this pull request:

   - "mseal cleanups" (Lorenzo Stoakes)

     Some mseal cleaning with no intended functional change.

   - "Optimizations for khugepaged" (David Hildenbrand)

     Improve khugepaged throughput by batching PTE operations for large
     folios. This gain is mainly for arm64.

   - "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes" (Mike Rapoport)

     A bugfix, additional debug code and cleanups to the execmem code.

   - "mm/shmem, swap: bugfix and improvement of mTHP swap in" (Kairui Song)

     Bugfixes, cleanups and performance improvememnts to the mTHP swapin
     code"

* tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (38 commits)
  mm: mempool: fix crash in mempool_free() for zero-minimum pools
  mm: correct type for vmalloc vm_flags fields
  mm/shmem, swap: fix major fault counting
  mm/shmem, swap: rework swap entry and index calculation for large swapin
  mm/shmem, swap: simplify swapin path and result handling
  mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO
  mm/shmem, swap: tidy up swap entry splitting
  mm/shmem, swap: tidy up THP swapin checks
  mm/shmem, swap: avoid redundant Xarray lookup during swapin
  x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
  x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations
  execmem: drop writable parameter from execmem_fill_trapping_insns()
  execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)
  execmem: move execmem_force_rw() and execmem_restore_rox() before use
  execmem: rework execmem_cache_free()
  execmem: introduce execmem_alloc_rw()
  execmem: drop unused execmem_update_copy()
  mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
  mm/rmap: add anon_vma lifetime debug check
  mm: remove mm/io-mapping.c
  ...
2025-08-05 16:02:07 +03:00
Jakub Kicinski
5ef7fdf52c selftests: net: packetdrill: xfail all problems on slow machines
We keep seeing flakes on packetdrill on debug kernels, while
non-debug kernels are stable, not a single flake in 200 runs.
Time to give up, debug kernels appear to suffer from 10msec
latency spikes and any timing-sensitive test is bound to flake.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250801181638.2483531-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-04 17:21:44 -07:00
Linus Torvalds
806381e1a2 Merge tag 'powerpc-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:

 - Fixes for several issues in the powernv PCI hotplug path

 - Fix htmldoc generation for htm.rst in toctree

 - Add jit support for load_acquire and store_release in ppc64 bpf jit

Thanks to Bjorn Helgaas, Hari Bathini, Puranjay Mohan, Saket Kumar
Bhaskar, Shawn Anastasio, Timothy Pearson, and Vishal Parmar

* tag 'powerpc-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc64/bpf: Add jit support for load_acquire and store_release
  docs: powerpc: add htm.rst to toctree
  PCI: pnv_php: Enable third attention indicator state
  PCI: pnv_php: Fix surprise plug detection and recovery
  powerpc/eeh: Make EEH driver device hotplug safe
  powerpc/eeh: Export eeh_unfreeze_pe()
  PCI: pnv_php: Work around switches with broken presence detection
  PCI: pnv_php: Clean up allocated IRQs on unplug
2025-08-03 19:15:04 -07:00
Linus Torvalds
e991acf1bc Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
 "Significant patch series in this pull request:

   - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
     us closer to being able to remove page->mapping

   - "relayfs: misc changes" (Jason Xing) does some maintenance and
     minor feature addition work in relayfs

   - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
     us from static preallocation of the kdump crashkernel's working
     memory over to dynamic allocation. So the difficulty of a-priori
     estimation of the second kernel's needs is removed and the first
     kernel obtains extra memory

   - "generalize panic_print's dump function to be used by other
     kernel parts" (Feng Tang) implements some consolidation and
     rationalization of the various ways in which a failing kernel
     splats information at the operator

* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
  tools/getdelays: add backward compatibility for taskstats version
  kho: add test for kexec handover
  delaytop: enhance error logging and add PSI feature description
  samples: Kconfig: fix spelling mistake "instancess" -> "instances"
  fat: fix too many log in fat_chain_add()
  scripts/spelling.txt: add notifer||notifier to spelling.txt
  xen/xenbus: fix typo "notifer"
  net: mvneta: fix typo "notifer"
  drm/xe: fix typo "notifer"
  cxl: mce: fix typo "notifer"
  KVM: x86: fix typo "notifer"
  MAINTAINERS: add maintainers for delaytop
  ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
  ucount: fix atomic_long_inc_below() argument type
  kexec: enable CMA based contiguous allocation
  stackdepot: make max number of pools boot-time configurable
  lib/xxhash: remove unused functions
  init/Kconfig: restore CONFIG_BROKEN help text
  lib/raid6: update recov_rvv.c zero page usage
  docs: update docs after introducing delaytop
  ...
2025-08-03 16:23:09 -07:00
Lorenzo Stoakes
f225b34f1e mm/mseal: always define VM_SEALED
Patch series "mseal cleanups", v4.

Perform a number of cleanups to the mseal logic.  Firstly, VM_SEALED is
treated differently from every other VMA flag, it really doesn't make
sense to do this, so we start by making this consistent with everything
else.

Next we place the madvise logic where it belongs - in mm/madvise.c.  It
really makes no sense to abstract this elsewhere.  In doing so, we go to
great lengths to explain very clearly the previously very confusing logic
as to what sealed mappings are impacted here.

In doing so, we retain existing logic regarding treatment of madvise()
discard operations for a sealed, read-only MAP_PRIVATE file-backed
mapping.  This is something we likely need to revisit.

We then abstract out and explain the 'are there are any gaps in this range
in the mm?' check being performed as a prerequisite to mseal being
performed.

Finally, we simplify the actual mseal logic which is really quite
straightforward.

No functional change is intended.


This patch (of 4):

There is no reason to treat VM_SEALED in a special way, in each other case
in which a VMA flag is unavailable due to configuration, we simply assign
that flag to VM_NONE, so make VM_SEALED consistent with all other VMA
flags in this respect.

Additionally, use the next available bit for VM_SEALED, 42, rather than
arbitrarily putting it at 63 and update the declaration to match all other
VMA flags.

No functional change intended.

Link: https://lkml.kernel.org/r/cover.1753431105.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/aeb398a77029b6e7377cd944328bc9bbc3c90537.1753431105.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-02 12:06:09 -07:00