Commit Graph

1401 Commits

Author SHA1 Message Date
Linus Torvalds
9591fdb061 Merge tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more x86 updates from Borislav Petkov:

 - Remove a bunch of asm implementing condition flags testing in KVM's
   emulator in favor of int3_emulate_jcc() which is written in C

 - Replace KVM fastops with C-based stubs which avoids problems with the
   fastop infra related to latter not adhering to the C ABI due to their
   special calling convention and, more importantly, bypassing compiler
   control-flow integrity checking because they're written in asm

 - Remove wrongly used static branches and other ugliness accumulated
   over time in hyperv's hypercall implementation with a proper static
   function call to the correct hypervisor call variant

 - Add some fixes and modifications to allow running FRED-enabled
   kernels in KVM even on non-FRED hardware

 - Add kCFI improvements like validating indirect calls and prepare for
   enabling kCFI with GCC. Add cmdline params documentation and other
   code cleanups

 - Use the single-byte 0xd6 insn as the official #UD single-byte
   undefined opcode instruction as agreed upon by both x86 vendors

 - Other smaller cleanups and touchups all over the place

* tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86,retpoline: Optimize patch_retpoline()
  x86,ibt: Use UDB instead of 0xEA
  x86/cfi: Remove __noinitretpoline and __noretpoline
  x86/cfi: Add "debug" option to "cfi=" bootparam
  x86/cfi: Standardize on common "CFI:" prefix for CFI reports
  x86/cfi: Document the "cfi=" bootparam options
  x86/traps: Clarify KCFI instruction layout
  compiler_types.h: Move __nocfi out of compiler-specific header
  objtool: Validate kCFI calls
  x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y
  x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware
  x86/fred: Install system vector handlers even if FRED isn't fully enabled
  x86/hyperv: Use direct call to hypercall-page
  x86/hyperv: Clean up hv_do_hypercall()
  KVM: x86: Remove fastops
  KVM: x86: Convert em_salc() to C
  KVM: x86: Introduce EM_ASM_3WCL
  KVM: x86: Introduce EM_ASM_1SRC2
  KVM: x86: Introduce EM_ASM_2CL
  KVM: x86: Introduce EM_ASM_2W
  ...
2025-10-11 11:19:16 -07:00
Linus Torvalds
256e341706 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm updates from Paolo Bonzini:
 "Generic:

   - Rework almost all of KVM's exports to expose symbols only to KVM's
     x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko

  x86:

   - Rework almost all of KVM x86's exports to expose symbols only to
     KVM's vendor modules, i.e. to kvm-{amd,intel}.ko

   - Add support for virtualizing Control-flow Enforcement Technology
     (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD
     (Shadow Stacks).

     It is worth noting that while SHSTK and IBT can be enabled
     separately in CPUID, it is not really possible to virtualize them
     separately. Therefore, Intel processors will really allow both
     SHSTK and IBT under the hood if either is made visible in the
     guest's CPUID. The alternative would be to intercept
     XSAVES/XRSTORS, which is not feasible for performance reasons

   - Fix a variety of fuzzing WARNs all caused by checking L1 intercepts
     when completing userspace I/O. KVM has already committed to
     allowing L2 to to perform I/O at that point

   - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the
     MSR is supposed to exist for v2 PMUs

   - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs

   - Add support for the immediate forms of RDMSR and WRMSRNS, sans full
     emulator support (KVM should never need to emulate the MSRs outside
     of forced emulation and other contrived testing scenarios)

   - Clean up the MSR APIs in preparation for CET and FRED
     virtualization, as well as mediated vPMU support

   - Clean up a pile of PMU code in anticipation of adding support for
     mediated vPMUs

   - Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI
     vmexits needed to faithfully emulate an I/O APIC for such guests

   - Many cleanups and minor fixes

   - Recover possible NX huge pages within the TDP MMU under read lock
     to reduce guest jitter when restoring NX huge pages

   - Return -EAGAIN during prefault if userspace concurrently
     deletes/moves the relevant memslot, to fix an issue where
     prefaulting could deadlock with the memslot update

  x86 (AMD):

   - Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is
     supported

   - Require a minimum GHCB version of 2 when starting SEV-SNP guests
     via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate
     errors instead of latent guest failures

   - Add support for SEV-SNP's CipherText Hiding, an opt-in feature that
     prevents unauthorized CPU accesses from reading the ciphertext of
     SNP guest private memory, e.g. to attempt an offline attack. This
     feature splits the shared SEV-ES/SEV-SNP ASID space into separate
     ranges for SEV-ES and SEV-SNP guests, therefore a new module
     parameter is needed to control the number of ASIDs that can be used
     for VMs with CipherText Hiding vs. how many can be used to run
     SEV-ES guests

   - Add support for Secure TSC for SEV-SNP guests, which prevents the
     untrusted host from tampering with the guest's TSC frequency, while
     still allowing the the VMM to configure the guest's TSC frequency
     prior to launch

   - Validate the XCR0 provided by the guest (via the GHCB) to avoid
     bugs resulting from bogus XCR0 values

   - Save an SEV guest's policy if and only if LAUNCH_START fully
     succeeds to avoid leaving behind stale state (thankfully not
     consumed in KVM)

   - Explicitly reject non-positive effective lengths during SNP's
     LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with
     them

   - Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the
     host's desired TSC_AUX, to fix a bug where KVM was keeping a
     different vCPU's TSC_AUX in the host MSR until return to userspace

  KVM (Intel):

   - Preparation for FRED support

   - Don't retry in TDX's anti-zero-step mitigation if the target
     memslot is invalid, i.e. is being deleted or moved, to fix a
     deadlock scenario similar to the aforementioned prefaulting case

   - Misc bugfixes and minor cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
  KVM: x86: Export KVM-internal symbols for sub-modules only
  KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks
  KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c
  KVM: Export KVM-internal symbols for sub-modules only
  KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent
  KVM: VMX: Make CR4.CET a guest owned bit
  KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
  KVM: selftests: Add coverage for KVM-defined registers in MSRs test
  KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
  KVM: selftests: Extend MSRs test to validate vCPUs without supported features
  KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
  KVM: selftests: Add an MSR test to exercise guest/host and read/write
  KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
  KVM: x86: Define Control Protection Exception (#CP) vector
  KVM: x86: Add human friendly formatting for #XM, and #VE
  KVM: SVM: Enable shadow stack virtualization for SVM
  KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  KVM: SVM: Pass through shadow stack MSRs as appropriate
  KVM: SVM: Update dump_vmcb with shadow stack save area additions
  KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
  ...
2025-10-06 12:37:34 -07:00
Linus Torvalds
ee2fe81cdc Merge tag 'docs-6.18' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
 "It has been a relatively busy cycle in docsland, with changes all
  over:

   - Bring the kernel memory-model docs into the Sphinx build in the
     "literal include" mode.

   - Lots of build-infrastructure work, further cleaning up long-term
     kernel-doc technical debt. The sphinx-pre-install tool has been
     converted to Python and updated for current systems.

   - A new tool to detect when documents have been moved and generate
     HTML redirects; this can be used on kernel.org (or any other site
     hosting the rendered docs) to avoid breaking links.

   - Automated processing of the YAML files describing the netlink
     protocol.

   - A significant update of the maintainer's PGP guide.

  ... and a seemingly endless series of typo fixes, build-problem fixes,
  etc"

* tag 'docs-6.18' of git://git.lwn.net/linux: (193 commits)
  Documentation/features: Update feature lists for 6.17-rc7
  docs: remove cdomain.py
  Documentation/process: submitting-patches: fix typo in "were do"
  docs: dev-tools/lkmm: Fix typo of missing file extension
  Documentation: trace: histogram: Convert ftrace docs cross-reference
  Documentation: trace: histogram-design: Wrap introductory note in note:: directive
  Documentation: trace: historgram-design: Separate sched_waking histogram section heading and the following diagram
  Documentation: trace: histogram-design: Trim trailing vertices in diagram explanation text
  Documentation: trace: histogram: Fix histogram trigger subsection number order
  docs: driver-api: fix spelling of "buses".
  Documentation: fbcon: Use admonition directives
  Documentation: fbcon: Reindent 8th step of attach/detach/unload
  Documentation: fbcon: Add boot options and attach/detach/unload section headings
  docs: filesystems: sysfs: add remaining top level sysfs directory descriptions
  docs: filesystems: sysfs: clarify symlink destinations in dev and bus/devices descriptions
  docs: filesystems: sysfs: remove top level sysfs net directory
  docs: maintainer: Fix ambiguous subheading formatting
  docs: kdoc: a few more dump_typedef() tweaks
  docs: kdoc: remove redundant comment stripping in dump_typedef()
  docs: kdoc: remove some dead code in dump_typedef()
  ...
2025-10-03 17:16:13 -07:00
Linus Torvalds
e406d57be7 Merge tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - "ida: Remove the ida_simple_xxx() API" from Christophe Jaillet
   completes the removal of this legacy IDR API

 - "panic: introduce panic status function family" from Jinchao Wang
   provides a number of cleanups to the panic code and its various
   helpers, which were rather ad-hoc and scattered all over the place

 - "tools/delaytop: implement real-time keyboard interaction support"
   from Fan Yu adds a few nice user-facing usability changes to the
   delaytop monitoring tool

 - "efi: Fix EFI boot with kexec handover (KHO)" from Evangelos
   Petrongonas fixes a panic which was happening with the combination of
   EFI and KHO

 - "Squashfs: performance improvement and a sanity check" from Phillip
   Lougher teaches squashfs's lseek() about SEEK_DATA/SEEK_HOLE. A mere
   150x speedup was measured for a well-chosen microbenchmark

 - plus another 50-odd singleton patches all over the place

* tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (75 commits)
  Squashfs: reject negative file sizes in squashfs_read_inode()
  kallsyms: use kmalloc_array() instead of kmalloc()
  MAINTAINERS: update Sibi Sankar's email address
  Squashfs: add SEEK_DATA/SEEK_HOLE support
  Squashfs: add additional inode sanity checking
  lib/genalloc: fix device leak in of_gen_pool_get()
  panic: remove CONFIG_PANIC_ON_OOPS_VALUE
  ocfs2: fix double free in user_cluster_connect()
  checkpatch: suppress strscpy warnings for userspace tools
  cramfs: fix incorrect physical page address calculation
  kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit
  Squashfs: fix uninit-value in squashfs_get_parent
  kho: only fill kimage if KHO is finalized
  ocfs2: avoid extra calls to strlen() after ocfs2_sprintf_system_inode_name()
  kernel/sys.c: fix the racy usage of task_lock(tsk->group_leader) in sys_prlimit64() paths
  sched/task.h: fix the wrong comment on task_lock() nesting with tasklist_lock
  coccinelle: platform_no_drv_owner: handle also built-in drivers
  coccinelle: of_table: handle SPI device ID tables
  lib/decompress: use designated initializers for struct compress_format
  efi: support booting with kexec handover (KHO)
  ...
2025-10-02 18:44:54 -07:00
Linus Torvalds
3b2074c77d Merge tag 'irq-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq core updates from Thomas Gleixner:
 "A set of updates for the interrupt core subsystem:

   - Introduce irq_chip_[startup|shutdown]_parent() to prepare for
     addressing a few short comings in the PCI/MSI interrupt subsystem.

     It allows to utilize the interrupt chip startup/shutdown callbacks
     for initializing the interrupt chip hierarchy properly on certain
     RISCV implementations and provides a mechanism to reduce the
     overhead of masking and unmasking PCI/MSI interrupts during
     operation when the underlying MSI provider can mask the interrupt.

     The actual usage comes with the interrupt driver pull request.

   - Add generic error handling for devm_request_*_irq()

     This allows to remove the zoo of random error printk's all over the
     usage sites.

   - Add a mechanism to warn about long-running interrupt handlers

     Long running interrupt handlers can introduce latencies and
     tracking them down is a tedious task. The tracking has to be
     enabled with a threshold on the kernel command line and utilizes a
     static branch to remove the overhead when disabled.

   - Update and extend the selftests which validate the CPU hotplug
     interrupt migration logic

   - Allow dropping the per CPU softirq lock on PREEMPT_RT kernels,
     which causes contention and latencies all over the place.

     The serialization requirements have been pushed down into the
     actual affected usage sites already.

   - The usual small cleanups and improvements"

* tag 'irq-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT
  softirq: Provide a handshake for canceling tasklets via polling
  genirq/test: Ensure CPU 1 is online for hotplug test
  genirq/test: Drop CONFIG_GENERIC_IRQ_MIGRATION assumptions
  genirq/test: Depend on SPARSE_IRQ
  genirq/test: Fail early if interrupt request fails
  genirq/test: Factor out fake-virq setup
  genirq/test: Select IRQ_DOMAIN
  genirq/test: Fix depth tests on architectures with NOREQUEST by default.
  genirq: Add support for warning on long-running interrupt handlers
  genirq/devres: Add error handling in devm_request_*_irq()
  genirq: Add irq_chip_(startup/shutdown)_parent()
  genirq: Remove GENERIC_IRQ_LEGACY
2025-09-30 15:55:25 -07:00
Linus Torvalds
2cb8eeaf00 Merge tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
 "Add support on AMD for assigning QoS bandwidth counters to resources
  (RMIDs) with the ability for those resources to be tracked by the
  counters as long as they're assigned to them.

  Previously, due to hw limitations, bandwidth counts from untracked
  resources would get lost when those resources are not tracked.

  Refactor the code and user interfaces to be able to also support
  other, similar features on ARM, for example"

* tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  fs/resctrl: Fix counter auto-assignment on mkdir with mbm_event enabled
  MAINTAINERS: resctrl: Add myself as reviewer
  x86/resctrl: Configure mbm_event mode if supported
  fs/resctrl: Introduce the interface to switch between monitor modes
  fs/resctrl: Disable BMEC event configuration when mbm_event mode is enabled
  fs/resctrl: Introduce the interface to modify assignments in a group
  fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group
  fs/resctrl: Auto assign counters on mkdir and clean up on group removal
  fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir
  fs/resctrl: Provide interface to update the event configurations
  fs/resctrl: Add event configuration directory under info/L3_MON/
  fs/resctrl: Support counter read/reset with mbm_event assignment mode
  x86/resctrl: Implement resctrl_arch_reset_cntr() and resctrl_arch_cntr_read()
  x86/resctrl: Refactor resctrl_arch_rmid_read()
  fs/resctrl: Introduce counter ID read, reset calls in mbm_event mode
  fs/resctrl: Pass struct rdtgroup instead of individual members
  fs/resctrl: Add the functionality to unassign MBM events
  fs/resctrl: Add the functionality to assign MBM events
  x86,fs/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  fs/resctrl: Introduce event configuration field in struct mon_evt
  ...
2025-09-30 13:29:42 -07:00
Linus Torvalds
bd91417a96 Merge tag 'x86_microcode_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Borislav Petkov:

 - Add infrastructure to be able to debug the microcode loader in a guest

 - Refresh Intel old microcode revisions

* tag 'x86_microcode_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Add microcode loader debugging functionality
  x86/microcode: Add microcode= cmdline parsing
  x86/microcode/intel: Refresh the revisions that determine old_microcode
2025-09-30 12:41:10 -07:00
Paolo Bonzini
10ef74c06b Merge tag 'kvm-x86-ciphertext-6.18' of https://github.com/kvm-x86/linux into HEAD
KVM SEV-SNP CipherText Hiding support for 6.18

Add support for SEV-SNP's CipherText Hiding, an opt-in feature that prevents
unauthorized CPU accesses from reading the ciphertext of SNP guest private
memory, e.g. to attempt an offline attack.  Instead of ciphertext, the CPU
will always read back all FFs when CipherText Hiding is enabled.

Add new module parameter to the KVM module to enable CipherText Hiding and
control the number of ASIDs that can be used for VMs with CipherText Hiding,
which is in effect the number of SNP VMs.  When CipherText Hiding is enabled,
the shared SEV-ES/SEV-SNP ASID space is split into separate ranges for SEV-ES
and SEV-SNP guests, i.e. ASIDs that can be used for CipherText Hiding cannot
be used to run SEV-ES guests.
2025-09-30 13:34:32 -04:00
Linus Torvalds
feafee2845 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "There's good stuff across the board, including some nice mm
  improvements for CPUs with the 'noabort' BBML2 feature and a clever
  patch to allow ptdump to play nicely with block mappings in the
  vmalloc area.

  Confidential computing:

   - Add support for accepting secrets from firmware (e.g. ACPI CCEL)
     and mapping them with appropriate attributes.

  CPU features:

   - Advertise atomic floating-point instructions to userspace

   - Extend Spectre workarounds to cover additional Arm CPU variants

   - Extend list of CPUs that support break-before-make level 2 and
     guarantee not to generate TLB conflict aborts for changes of
     mapping granularity (BBML2_NOABORT)

   - Add GCS support to our uprobes implementation.

  Documentation:

   - Remove bogus SME documentation concerning register state when
     entering/exiting streaming mode.

  Entry code:

   - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY)

   - Micro-optimise syscall entry path with a compiler branch hint.

  Memory management:

   - Enable huge mappings in vmalloc space even when kernel page-table
     dumping is enabled

   - Tidy up the types used in our early MMU setup code

   - Rework rodata= for closer parity with the behaviour on x86

   - For CPUs implementing BBML2_NOABORT, utilise block mappings in the
     linear map even when rodata= applies to virtual aliases

   - Don't re-allocate the virtual region between '_text' and '_stext',
     as doing so confused tools parsing /proc/vmcore.

  Miscellaneous:

   - Clean-up Kconfig menuconfig text for architecture features

   - Avoid redundant bitmap_empty() during determination of supported
     SME vector lengths

   - Re-enable warnings when building the 32-bit vDSO object

   - Avoid breaking our eggs at the wrong end.

  Perf and PMUs:

   - Support for v3 of the Hisilicon L3C PMU

   - Support for Hisilicon's MN and NoC PMUs

   - Support for Fujitsu's Uncore PMU

   - Support for SPE's extended event filtering feature

   - Preparatory work to enable data source filtering in SPE

   - Support for multiple lanes in the DWC PCIe PMU

   - Support for i.MX94 in the IMX DDR PMU driver

   - MAINTAINERS update (Thank you, Yicong)

   - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets).

  Selftests:

   - Add basic LSFE check to the existing hwcaps test

   - Support nolibc in GCS tests

   - Extend SVE ptrace test to pass unsupported regsets and invalid
     vector lengths

   - Minor cleanups (typos, cosmetic changes).

  System registers:

   - Fix ID_PFR1_EL1 definition

   - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1

   - Sync TCR_EL1 definition with the latest Arm ARM (L.b)

   - Be stricter about the input fed into our AWK sysreg generator
     script

   - Typo fixes and removal of redundant definitions.

  ACPI, EFI and PSCI:

   - Decouple Arm's "Software Delegated Exception Interface" (SDEI)
     support from the ACPI GHES code so that it can be used by platforms
     booted with device-tree

   - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
     runtime calls

   - Fix a node refcount imbalance in the PSCI device-tree code.

  CPU Features:

   - Ensure register sanitisation is applied to fields in ID_AA64MMFR4

   - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM
     guests can reliably query the underlying CPU types from the VMM

   - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
     to our context-switching, signal handling and ptrace code"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
  arm64: cpufeature: Remove duplicate asm/mmu.h header
  arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN
  perf/dwc_pcie: Fix use of uninitialized variable
  arm/syscalls: mark syscall invocation as likely in invoke_syscall
  Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU
  Documentation: hisi-pmu: Fix of minor format error
  drivers/perf: hisi: Add support for L3C PMU v3
  drivers/perf: hisi: Refactor the event configuration of L3C PMU
  drivers/perf: hisi: Extend the field of tt_core
  drivers/perf: hisi: Extract the event filter check of L3C PMU
  drivers/perf: hisi: Simplify the probe process of each L3C PMU version
  drivers/perf: hisi: Export hisi_uncore_pmu_isr()
  drivers/perf: hisi: Relax the event ID check in the framework
  perf: Fujitsu: Add the Uncore PMU driver
  arm64: map [_text, _stext) virtual address range non-executable+read-only
  arm64/sysreg: Update TCR_EL1 register
  arm64: Enable vmalloc-huge with ptdump
  arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list
  arm64: errata: Apply workarounds for Neoverse-V3AE
  arm64: cputype: Add Neoverse-V3AE definitions
  ...
2025-09-29 18:48:39 -07:00
Linus Torvalds
b7ce6fa90f Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Add "initramfs_options" parameter to set initramfs mount options.
     This allows to add specific mount options to the rootfs to e.g.,
     limit the memory size

   - Add RWF_NOSIGNAL flag for pwritev2()

     Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
     signal from being raised when writing on disconnected pipes or
     sockets. The flag is handled directly by the pipe filesystem and
     converted to the existing MSG_NOSIGNAL flag for sockets

   - Allow to pass pid namespace as procfs mount option

     Ever since the introduction of pid namespaces, procfs has had very
     implicit behaviour surrounding them (the pidns used by a procfs
     mount is auto-selected based on the mounting process's active
     pidns, and the pidns itself is basically hidden once the mount has
     been constructed)

     This implicit behaviour has historically meant that userspace was
     required to do some special dances in order to configure the pidns
     of a procfs mount as desired. Examples include:

     * In order to bypass the mnt_too_revealing() check, Kubernetes
       creates a procfs mount from an empty pidns so that user
       namespaced containers can be nested (without this, the nested
       containers would fail to mount procfs)

       But this requires forking off a helper process because you cannot
       just one-shot this using mount(2)

     * Container runtimes in general need to fork into a container
       before configuring its mounts, which can lead to security issues
       in the case of shared-pidns containers (a privileged process in
       the pidns can interact with your container runtime process)

       While SUID_DUMP_DISABLE and user namespaces make this less of an
       issue, the strict need for this due to a minor uAPI wart is kind
       of unfortunate

       Things would be much easier if there was a way for userspace to
       just specify the pidns they want. So this pull request contains
       changes to implement a new "pidns" argument which can be set
       using fsconfig(2):

           fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
           fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);

       or classic mount(2) / mount(8):

           // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
           mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");

  Cleanups:

   - Remove the last references to EXPORT_OP_ASYNC_LOCK

   - Make file_remove_privs_flags() static

   - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used

   - Use try_cmpxchg() in start_dir_add()

   - Use try_cmpxchg() in sb_init_done_wq()

   - Replace offsetof() with struct_size() in ioctl_file_dedupe_range()

   - Remove vfs_ioctl() export

   - Replace rwlock() with spinlock in epoll code as rwlock causes
     priority inversion on preempt rt kernels

   - Make ns_entries in fs/proc/namespaces const

   - Use a switch() statement() in init_special_inode() just like we do
     in may_open()

   - Use struct_size() in dir_add() in the initramfs code

   - Use str_plural() in rd_load_image()

   - Replace strcpy() with strscpy() in find_link()

   - Rename generic_delete_inode() to inode_just_drop() and
     generic_drop_inode() to inode_generic_drop()

   - Remove unused arguments from fcntl_{g,s}et_rw_hint()

  Fixes:

   - Document @name parameter for name_contains_dotdot() helper

   - Fix spelling mistake

   - Always return zero from replace_fd() instead of the file descriptor
     number

   - Limit the size for copy_file_range() in compat mode to prevent a
     signed overflow

   - Fix debugfs mount options not being applied

   - Verify the inode mode when loading it from disk in minixfs

   - Verify the inode mode when loading it from disk in cramfs

   - Don't trigger automounts with RESOLVE_NO_XDEV

     If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
     through automounts, but could still trigger them

   - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in
     tracepoints

   - Fix unused variable warning in rd_load_image() on s390

   - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD

   - Use ns_capable_noaudit() when determining net sysctl permissions

   - Don't call path_put() under namespace semaphore in listmount() and
     statmount()"

* tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
  fcntl: trim arguments
  listmount: don't call path_put() under namespace semaphore
  statmount: don't call path_put() under namespace semaphore
  pid: use ns_capable_noaudit() when determining net sysctl permissions
  fs: rename generic_delete_inode() and generic_drop_inode()
  init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
  initramfs: Replace strcpy() with strscpy() in find_link()
  initrd: Use str_plural() in rd_load_image()
  initramfs: Use struct_size() helper to improve dir_add()
  initrd: Fix unused variable warning in rd_load_image() on s390
  fs: use the switch statement in init_special_inode()
  fs/proc/namespaces: make ns_entries const
  filelock: add FL_RECLAIM to show_fl_flags() macro
  eventpoll: Replace rwlock with spinlock
  selftests/proc: add tests for new pidns APIs
  procfs: add "pidns" mount option
  pidns: move is-ancestor logic to helper
  openat2: don't trigger automounts with RESOLVE_NO_XDEV
  namei: move cross-device check to __traverse_mounts
  namei: remove LOOKUP_NO_XDEV check from handle_mounts
  ...
2025-09-29 09:03:07 -07:00
Huang Shijie
c0f303d7d4 arm64: mm: Rework the 'rodata=' options
As per admin guide documentation, "rodata=on" should be the default on
platforms. Documentation/admin-guide/kernel-parameters.txt describes
these options as

   rodata=         [KNL,EARLY]
           on      Mark read-only kernel memory as read-only (default).
           off     Leave read-only kernel memory writable for debugging.
           full    Mark read-only kernel memory and aliases as read-only
                   [arm64]

But on arm64 platform, RODATA_FULL_DEFAULT_ENABLED is enabled by default,
so "rodata=full" is the default instead.

For parity with other architectures, namely x86, rework 'rodata=on' to
match the current "full" behaviour and replace 'rodata=full' with a new
'rodata=noalias' option which retains writable aliases in the direct map
for memory regions outside of the kernel image.

Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-16 20:53:13 +01:00
Ashish Kalra
6c7c620585 KVM: SEV: Add SEV-SNP CipherTextHiding support
Ciphertext hiding prevents host accesses from reading the ciphertext of
SNP guest private memory. Instead of reading ciphertext, the host reads
will see constant default values (0xff).

The SEV ASID space is split into SEV and SEV-ES/SEV-SNP ASID ranges.
Enabling ciphertext hiding further splits the SEV-ES/SEV-SNP ASID space
into separate ASID ranges for SEV-ES and SEV-SNP guests.

Add a new off-by-default kvm-amd module parameter to enable ciphertext
hiding and allow the admin to configure the SEV-ES and SEV-SNP ASID
ranges. Simply cap the maximum SEV-SNP ASID as appropriate, i.e. don't
reject loading KVM or disable ciphertest hiding for a too-big value, as
KVM's general approach for module params is to sanitize inputs based on
hardware/kernel support, not burn the world down. This also allows the
admin to use -1u to assign all SEV-ES/SNP ASIDs to SNP without needing
dedicated handling in KVM.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/95abc49edfde36d4fb791570ea2a4be6ad95fd0d.1755721927.git.ashish.kalra@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-15 10:14:11 -07:00
Babu Moger
bebf57bf05 x86/resctrl: Add ABMC feature in the command line options
Add a kernel command-line parameter to enable or disable the exposure of
the ABMC (Assignable Bandwidth Monitoring Counters) hardware feature to
resctrl.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:05:23 +02:00
Feng Tang
8c2b91fbb0 panic: refine the document for 'panic_print'
User reported current document about SYS_INFO_PANIC_CONSOLE_REPLAY is
confusing, that people could expect all user space console messages to be
replayed.

Specify that only 'kernel' messages will be replayed to solve the confusion.

Link: https://lkml.kernel.org/r/20250825025701.81921-3-feng.tang@linux.alibaba.com
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Reported-by: Askar Safin <safinaskar@zohomail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 17:32:50 -07:00
Kees Cook
026211c40b x86/cfi: Add "debug" option to "cfi=" bootparam
Add "debug" option for "cfi=" bootparam to get details on early CFI
initialization steps so future Kees can find breakage easier.

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250904034656.3670313-5-kees@kernel.org
2025-09-04 21:59:08 +02:00
Kees Cook
24452d9ef1 x86/cfi: Document the "cfi=" bootparam options
The kernel-parameters.txt didn't have a section for the cfi= options.
Add it.

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20250904034656.3670313-3-kees@kernel.org
2025-09-04 21:59:08 +02:00
Borislav Petkov (AMD)
43181a4726 x86/microcode: Add microcode loader debugging functionality
Instead of adding ad-hoc debugging glue to the microcode loader each
time I need it, add debugging functionality which is not built by
default.

Simulate all patch handling the loader does except the actual loading of
the microcode patch into the hardware.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250820135043.19048-3-bp@kernel.org
2025-09-04 16:15:19 +02:00
Borislav Petkov (AMD)
632ff61706 x86/microcode: Add microcode= cmdline parsing
Add a "microcode=" command line argument after which all options can be
passed in a comma-separated list.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com>
Link: https://lore.kernel.org/20250820135043.19048-2-bp@kernel.org
2025-09-04 16:02:20 +02:00
Wladislav Wiebe
673f1244b3 genirq: Add support for warning on long-running interrupt handlers
Introduce a mechanism to detect and warn about prolonged interrupt handlers.
With a new command-line parameter (irqhandler.duration_warn_us=), users can
configure the duration threshold in microseconds when a warning in such
format should be emitted:

"[CPU14] long duration of IRQ[159:bad_irq_handler [long_irq]], took: 1330 us"

The implementation uses local_clock() to measure the execution duration of the
generic IRQ per-CPU event handler.

Signed-off-by: Wladislav Wiebe <wladislav.wiebe@nokia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/all/20250804093525.851-1-wladislav.wiebe@nokia.com
2025-09-03 16:10:40 +02:00
Lichen Liu
278033a225 fs: Add 'initramfs_options' to set initramfs mount options
When CONFIG_TMPFS is enabled, the initial root filesystem is a tmpfs.
By default, a tmpfs mount is limited to using 50% of the available RAM
for its content. This can be problematic in memory-constrained
environments, particularly during a kdump capture.

In a kdump scenario, the capture kernel boots with a limited amount of
memory specified by the 'crashkernel' parameter. If the initramfs is
large, it may fail to unpack into the tmpfs rootfs due to insufficient
space. This is because to get X MB of usable space in tmpfs, 2*X MB of
memory must be available for the mount. This leads to an OOM failure
during the early boot process, preventing a successful crash dump.

This patch introduces a new kernel command-line parameter,
initramfs_options, which allows passing specific mount options directly
to the rootfs when it is first mounted. This gives users control over
the rootfs behavior.

For example, a user can now specify initramfs_options=size=75% to allow
the tmpfs to use up to 75% of the available memory. This can
significantly reduce the memory pressure for kdump.

Consider a practical example:

To unpack a 48MB initramfs, the tmpfs needs 48MB of usable space. With
the default 50% limit, this requires a memory pool of 96MB to be
available for the tmpfs mount. The total memory requirement is therefore
approximately: 16MB (vmlinuz) + 48MB (loaded initramfs) + 48MB (unpacked
kernel) + 96MB (for tmpfs) + 12MB (runtime overhead) ≈ 220MB.

By using initramfs_options=size=75%, the memory pool required for the
48MB tmpfs is reduced to 48MB / 0.75 = 64MB. This reduces the total
memory requirement by 32MB (96MB - 64MB), allowing the kdump to succeed
with a smaller crashkernel size, such as 192MB.

An alternative approach of reusing the existing rootflags parameter was
considered. However, a new, dedicated initramfs_options parameter was
chosen to avoid altering the current behavior of rootflags (which
applies to the final root filesystem) and to prevent any potential
regressions.

Also add documentation for the new kernel parameter "initramfs_options"

This approach is inspired by prior discussions and patches on the topic.
Ref: https://www.lightofdawn.org/blog/?viewDetailed=00128
Ref: https://landley.net/notes-2015.html#01-01-2015
Ref: https://lkml.org/lkml/2021/6/29/783
Ref: https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html#what-is-rootfs

Signed-off-by: Lichen Liu <lichliu@redhat.com>
Link: https://lore.kernel.org/20250815121459.3391223-1-lichliu@redhat.com
Tested-by: Rob Landley <rob@landley.net>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-21 10:23:48 +02:00
Bjorn Helgaas
c349216707 Documentation: Fix admin-guide typos
Fix typos.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250813200526.290420-4-helgaas@kernel.org
2025-08-18 10:31:19 -06:00
Pawan Gupta
556c1ad666 x86/vmscape: Enable the mitigation
Enable the previously added mitigation for VMscape. Add the cmdline
vmscape={off|ibpb|force} and sysfs reporting.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
2025-08-14 10:37:33 -07:00
Linus Torvalds
35a813e010 Merge tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:

 - Add new "hash_pointers=[auto|always|never]" boot parameter to force
   the hashing even with "slab_debug" enabled

 - Allow to stop CPU, after losing nbcon console ownership during
   panic(), even without proper NMI

 - Allow to use the printk kthread immediately even for the 1st
   registered nbcon

 - Compiler warning removal

* tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: nbcon: Allow reacquire during panic
  printk: Allow to use the printk kthread immediately even for 1st nbcon
  slab: Decouple slab_debug and no_hash_pointers
  vsprintf: Use __diag macros to disable '-Wsuggest-attribute=format'
  compiler-gcc.h: Introduce __diag_GCC_all
2025-08-04 10:54:36 -07:00
Linus Torvalds
e991acf1bc Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
 "Significant patch series in this pull request:

   - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
     us closer to being able to remove page->mapping

   - "relayfs: misc changes" (Jason Xing) does some maintenance and
     minor feature addition work in relayfs

   - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
     us from static preallocation of the kdump crashkernel's working
     memory over to dynamic allocation. So the difficulty of a-priori
     estimation of the second kernel's needs is removed and the first
     kernel obtains extra memory

   - "generalize panic_print's dump function to be used by other
     kernel parts" (Feng Tang) implements some consolidation and
     rationalization of the various ways in which a failing kernel
     splats information at the operator

* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
  tools/getdelays: add backward compatibility for taskstats version
  kho: add test for kexec handover
  delaytop: enhance error logging and add PSI feature description
  samples: Kconfig: fix spelling mistake "instancess" -> "instances"
  fat: fix too many log in fat_chain_add()
  scripts/spelling.txt: add notifer||notifier to spelling.txt
  xen/xenbus: fix typo "notifer"
  net: mvneta: fix typo "notifer"
  drm/xe: fix typo "notifer"
  cxl: mce: fix typo "notifer"
  KVM: x86: fix typo "notifer"
  MAINTAINERS: add maintainers for delaytop
  ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
  ucount: fix atomic_long_inc_below() argument type
  kexec: enable CMA based contiguous allocation
  stackdepot: make max number of pools boot-time configurable
  lib/xxhash: remove unused functions
  init/Kconfig: restore CONFIG_BROKEN help text
  lib/raid6: update recov_rvv.c zero page usage
  docs: update docs after introducing delaytop
  ...
2025-08-03 16:23:09 -07:00
Matt Fleming
ed4f142f72 stackdepot: make max number of pools boot-time configurable
We're hitting the WARN in depot_init_pool() about reaching the stack depot
limit because we have long stacks that don't dedup very well.

Introduce a new start-up parameter to allow users to set the number of
maximum stack depot pools.

Link: https://lkml.kernel.org/r/20250718153928.94229-1-matt@readmodwrite.com
Signed-off-by: Matt Fleming <mfleming@cloudflare.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-02 12:01:38 -07:00
Linus Torvalds
6aee5aed2e Merge tag 'cgroup-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - Allow css_rstat_updated() in NMI context to enable memory accounting
   for allocations in NMI context.

 - /proc/cgroups doesn't contain useful information for cgroup2 and was
   updated to only show v1 controllers. This unfortunately broke
   something in the wild. Add an option to bring back the old behavior
   to ease transition.

 - selftest updates and other cleanups.

* tag 'cgroup-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Add compatibility option for content of /proc/cgroups
  selftests/cgroup: fix cpu.max tests
  cgroup: llist: avoid memory tears for llist_node
  selftests: cgroup: Fix missing newline in test_zswap_writeback_one
  selftests: cgroup: Allow longer timeout for kmem_dead_cgroups cleanup
  memcg: cgroup: call css_rstat_updated irrespective of in_nmi()
  cgroup: remove per-cpu per-subsystem locks
  cgroup: make css_rstat_updated nmi safe
  cgroup: support to enable nmi-safe css_rstat_updated
  selftests: cgroup: Fix compilation on pre-cgroupns kernels
  selftests: cgroup: Optionally set up v1 environment
  selftests: cgroup: Add support for named v1 hierarchies in test_core
  selftests: cgroup_util: Add helpers for testing named v1 hierarchies
  Documentation: cgroup: add section explaining controller availability
  cgroup: Drop sock_cgroup_classid() dummy implementation
2025-07-31 16:04:19 -07:00
Linus Torvalds
02523d2d93 Merge tag 'integrity-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity update from Mimi Zohar:
 "A single commit to permit disabling IMA from the boot command line for
  just the kdump kernel.

  The exception itself sort of makes sense. My concern is that
  exceptions do not remain as exceptions, but somehow morph to become
  the norm"

* tag 'integrity-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: add a knob ima= to allow disabling IMA in kdump kernel
2025-07-31 11:42:11 -07:00
Linus Torvalds
e8d780dcd9 Merge tag 'slab-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:

 - Convert struct slab to its own flags instead of referencing page
   flags, which is another preparation step before separating it from
   struct page completely.

   Along with that, a bunch of documentation fixes and cleanups (Matthew
   Wilcox)

 - Convert large kmalloc to use frozen pages in order to be consistent
   with non-large kmalloc slabs (Vlastimil Babka)

 - MAINTAINERS updates (Matthew Wilcox, Lorenzo Stoakes)

 - Restore NUMA policy support for large kmalloc, broken by mistake in
   v6.1 (Vlastimil Babka)

* tag 'slab-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  MAINTAINERS: add missing files to slab section
  slab: Update MAINTAINERS entry
  memcg_slabinfo: Fix use of PG_slab
  kfence: Remove mention of PG_slab
  vmcoreinfo: Remove documentation of PG_slab and PG_hugetlb
  doc: Add slab internal kernel-doc
  slub: Fix a documentation build error for krealloc()
  slab: Add SL_pfmemalloc flag
  slab: Add SL_partial flag
  slab: Rename slab->__page_flags to slab->flags
  doc: Move SLUB documentation to the admin guide
  mm, slab: use frozen pages for large kmalloc
  mm, slab: restore NUMA policy support for large kmalloc
2025-07-30 11:32:38 -07:00
Linus Torvalds
2db4df0c09 Merge tag 'rcu.release.v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Neeraj Upadhyay:
 "Expedited grace period updates:

   - Protect against early RCU exp quiescent state reporting during exp
     grace period initialization

   - Remove superfluous barrier in task unblock path

   - Remove the CPU online quiescent state report optimization, which is
     error prone for certain scenarios

   - Add warning for unexpected pending requested expedited quiescent
     state on dying CPU

  Core:

   - Robustify rcu_is_cpu_rrupt_from_idle() by using more accurate
     indicators of the actual context tracking state of a CPU

   - Handle ->defer_qs_iw_pending field data race

   - Enable rcu_normal_wake_from_gp by default on systems with <= 16
     CPUs

   - Fix lockup in rcu_read_unlock() due to recursive irq_exit() calls

   - Refactor expedited handling condition in rcu_read_unlock_special()

   - Documentation updates for hotplug and GP init scan ordering,
     separation of rcu_state and rnp's gp_seq states, quiescent state
     reporting for offline CPUs

  torture-scripts:

   - Cleanup and improve scripts : remove superfluous warnings for
     disabled tests; better handling of kvm.sh --kconfig arg; suppress
     some confusing diagnostics; tolerate bad kvm.sh args; add new
     diagnostic for build output; fail allmodconfig testing on warnings

   - Include RCU_TORTURE_TEST_CHK_RDR_STATE config for KCSAN kernels

   - Disable default RCU-tasks and clocksource-wdog testing on arm64

   - Add EXPERT Kconfig option for arm64 KCSAN runs

   - Remove SRCU-lite testing

  rcutorture:

   - Start torture writer threads creation after reader threads to
     handle race in SRCU-P scenario

   - Add SRCU down_read()/up_read() test

   - Add diagnostics for delayed SRCU up_read(), unmatched up_read(),
     print number of up/down readers and the number of such readers
     which migrated to other CPU

   - Ignore certain unsupported configurations for trivial RCU test

   - Fix splats in RT kernels due to inaccurate checks for BH-disabled
     context

   - Enable checks and logs to capture intentionally exercised
     unexpected scenarios (too short readers) for BUSTED test

   - Remove SRCU-lite testing

  srcu:

   - Expedite SRCU-fast grace periods

   - Remove SRCU-lite implementation

   - Add guards for SRCU-fast readers

  rcu nocb:

   - Dump NOCB group leader state on stall detection

   - Robustify nocb_cb_kthread pointer accesses

   - Fix delayed execution of hurry callbacks when LAZY_RCU is enabled

  refscale:

   - Fix multiplication overflow in "loops" and "nreaders" calculations"

* tag 'rcu.release.v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (49 commits)
  rcu: Document concurrent quiescent state reporting for offline CPUs
  rcu: Document separation of rcu_state and rnp's gp_seq
  rcu: Document GP init vs hotplug-scan ordering requirements
  srcu: Add guards for SRCU-fast readers
  rcu: Fix delayed execution of hurry callbacks
  rcu: Refactor expedited handling check in rcu_read_unlock_special()
  checkpatch: Remove SRCU-lite deprecation
  srcu: Remove SRCU-lite implementation
  srcu: Expedite SRCU-fast grace periods
  rcutorture: Remove support for SRCU-lite
  rcutorture: Remove SRCU-lite scenarios
  torture: Remove support for SRCU-lite
  torture: Make torture.sh --allmodconfig testing fail on warnings
  torture: Add "ERROR" diagnostic for testing kernel-build output
  torture: Make torture.sh tolerate runs having bad kvm.sh arguments
  torture: Add textid.txt file to --do-allmodconfig and --do-rcu-rust runs
  torture: Extract testid.txt generation to separate script
  torture: Suppress "find" diagnostics from torture.sh --do-none run
  torture: Provide EXPERT Kconfig option for arm64 KCSAN torture.sh runs
  rcu: Fix rcu_read_unlock() deadloop due to IRQ work
  ...
2025-07-30 11:01:41 -07:00
Linus Torvalds
bf76f23aa1 Merge tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Core scheduler changes:

   - Better tracking of maximum lag of tasks in presence of different
     slices duration, for better handling of lag in the fair scheduler
     (Vincent Guittot)

   - Clean up and standardize #if/#else/#endif markers throughout the
     entire scheduler code base (Ingo Molnar)

   - Make SMP unconditional: build the SMP scheduler's data structures
     and logic on UP kernel too, even though they are not used, to
     simplify the scheduler and remove around 200 #ifdef/[#else]/#endif
     blocks from the scheduler (Ingo Molnar)

   - Reorganize cgroup bandwidth control interface handling for better
     interfacing with sched_ext (Tejun Heo)

  Balancing:

   - Bump sd->max_newidle_lb_cost when newidle balance fails (Chris
     Mason)

   - Remove sched_domain_topology_level::flags to simplify the code
     (Prateek Nayak)

   - Simplify and clean up build_sched_topology() (Li Chen)

   - Optimize build_sched_topology() on large machines (Li Chen)

  Real-time scheduling:

   - Add initial version of proxy execution: a mechanism for
     mutex-owning tasks to inherit the scheduling context of higher
     priority waiters.

     Currently limited to a single runqueue and conditional on
     CONFIG_EXPERT, and other limitations (John Stultz, Peter Zijlstra,
     Valentin Schneider)

   - Deadline scheduler (Juri Lelli):
      - Fix dl_servers initialization order (Juri Lelli)
      - Fix DL scheduler's root domain reinitialization logic (Juri
        Lelli)
      - Fix accounting bugs after global limits change (Juri Lelli)
      - Fix scalability regression by implementing less agressive
        dl_server handling (Peter Zijlstra)

  PSI:

   - Improve scalability by optimizing psi_group_change() cpu_clock()
     usage (Peter Zijlstra)

  Rust changes:

   - Make Task, CondVar and PollCondVar methods inline to avoid
     unnecessary function calls (Kunwu Chan, Panagiotis Foliadis)

   - Add might_sleep() support for Rust code: Rust's "#[track_caller]"
     mechanism is used so that Rust's might_sleep() doesn't need to be
     defined as a macro (Fujita Tomonori)

   - Introduce file_from_location() (Boqun Feng)

  Debugging & instrumentation:

   - Make clangd usable with scheduler source code files again (Peter
     Zijlstra)

   - tools: Add root_domains_dump.py which dumps root domains info (Juri
     Lelli)

   - tools: Add dl_bw_dump.py for printing bandwidth accounting info
     (Juri Lelli)

  Misc cleanups & fixes:

   - Remove play_idle() (Feng Lee)

   - Fix check_preemption_disabled() (Sebastian Andrzej Siewior)

   - Do not call __put_task_struct() on RT if pi_blocked_on is set (Luis
     Claudio R. Goncalves)

   - Correct the comment in place_entity() (wang wei)"

* tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits)
  sched/idle: Remove play_idle()
  sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
  sched: Start blocked_on chain processing in find_proxy_task()
  sched: Fix proxy/current (push,pull)ability
  sched: Add an initial sketch of the find_proxy_task() function
  sched: Fix runtime accounting w/ split exec & sched contexts
  sched: Move update_curr_task logic into update_curr_se
  locking/mutex: Add p->blocked_on wrappers for correctness checks
  locking/mutex: Rework task_struct::blocked_on
  sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
  sched/topology: Remove sched_domain_topology_level::flags
  x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
  x86/smpboot: moves x86_topology to static initialize and truncate
  x86/smpboot: remove redundant CONFIG_SCHED_SMT
  smpboot: introduce SDTL_INIT() helper to tidy sched topology setup
  tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info
  tools/sched: Add root_domains_dump.py which dumps root domains info
  sched/deadline: Fix accounting after global limits change
  sched/deadline: Reset extra_bw to max_bw when clearing root domains
  sched/deadline: Initialize dl_servers after SMP
  ...
2025-07-29 17:42:52 -07:00
Linus Torvalds
04d29e3609 Merge tag 'x86_bugs_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CPU mitigation updates from Borislav Petkov:

 - Untangle the Retbleed from the ITS mitigation on Intel. Allow for ITS
   to enable stuffing independently from Retbleed, do some cleanups to
   simplify and streamline the code

 - Simplify SRSO and make mitigation types selection more versatile
   depending on the Retbleed mitigation selection. Simplify code some

 - Add the second part of the attack vector controls which provide a lot
   friendlier user interface to the speculation mitigations than
   selecting each one by one as it is now.

   Instead, the selection of whole attack vectors which are relevant to
   the system in use can be done and protection against only those
   vectors is enabled, thus giving back some performance to the users

* tag 'x86_bugs_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  x86/bugs: Print enabled attack vectors
  x86/bugs: Add attack vector controls for TSA
  x86/pti: Add attack vector controls for PTI
  x86/bugs: Add attack vector controls for ITS
  x86/bugs: Add attack vector controls for SRSO
  x86/bugs: Add attack vector controls for L1TF
  x86/bugs: Add attack vector controls for spectre_v2
  x86/bugs: Add attack vector controls for BHI
  x86/bugs: Add attack vector controls for spectre_v2_user
  x86/bugs: Add attack vector controls for retbleed
  x86/bugs: Add attack vector controls for spectre_v1
  x86/bugs: Add attack vector controls for GDS
  x86/bugs: Add attack vector controls for SRBDS
  x86/bugs: Add attack vector controls for RFDS
  x86/bugs: Add attack vector controls for MMIO
  x86/bugs: Add attack vector controls for TAA
  x86/bugs: Add attack vector controls for MDS
  x86/bugs: Define attack vectors relevant for each bug
  x86/Kconfig: Add arch attack vector support
  cpu: Define attack vectors
  ...
2025-07-29 16:34:45 -07:00
Linus Torvalds
0b29600a30 Merge tag 'irq-drivers-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt chip driver updates from Thomas Gleixner:

 - Add support of forced affinity setting to yet offline CPUs for the
   MIPS-GIC to ensure that the affinity of per CPU interrupts can be set
   during the early bringup phase of a secondary CPU in the hotplug code
   before the CPU is set online and interrupts are enabled

 - Add support for the MIPS (RISC-V !?!?) P8700 SoC in the ACLINT_SSWI
   interrupt chip

 - Make the interrupt routing to RISV-V harts specification compliant so
   it supports arbitrary hart indices

 - Add a command line parameter and related handling to disable the
   generic RISCV IMSIC mechanism on platforms which use a trap-emulated
   IMSIC. Unfortunatly this is required because there is no mechanism
   available to discover this programatically.

 - Enable wakeup sources on the Renesas RZV2H driver

 - Convert interrupt chip drivers, which use a open coded variant of
   msi_create_parent_irq_domain() to use the new functionality

 - Convert interrupt chip drivers, which use the old style two level
   implementation of MSI support over to the MSI parent mechanism to
   prepare for removing at least one of the three PCI/MSI backend
   variants.

 - The usual cleanups and improvements all over the place

* tag 'irq-drivers-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
  irqchip/renesas-irqc: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
  irqchip/renesas-intc-irqpin: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
  irqchip/riscv-imsic: Add kernel parameter to disable IPIs
  irqchip/gic-v3: Fix GICD_CTLR register naming
  irqchip/ls-scfg-msi: Fix NULL dereference in error handling
  irqchip/ls-scfg-msi: Switch to use msi_create_parent_irq_domain()
  irqchip/armada-370-xp: Switch to msi_create_parent_irq_domain()
  irqchip/alpine-msi: Switch to msi_create_parent_irq_domain()
  irqchip/alpine-msi: Convert to __free
  irqchip/alpine-msi: Convert to lock guards
  irqchip/alpine-msi: Clean up whitespace style
  irqchip/sg2042-msi: Switch to msi_create_parent_irq_domain()
  irqchip/loongson-pch-msi.c: Switch to msi_create_parent_irq_domain()
  irqchip/imx-mu-msi: Convert to msi_create_parent_irq_domain() helper
  irqchip/riscv-imsic: Convert to msi_create_parent_irq_domain() helper
  irqchip/bcm2712-mip: Switch to msi_create_parent_irq_domain()
  irqdomain: Add device pointer to irq_domain_info and msi_domain_info
  irqchip/renesas-rzv2h: Remove unneeded includes
  irqchip/renesas-rzv2h: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
  irqchip/aslint-sswi: Resolve hart index
  ...
2025-07-29 13:26:05 -07:00
Linus Torvalds
53edfecef6 Merge tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "As is tradition, cpufreq is the part with the largest number of
  updates that include core fixes and cleanups as well as updates of
  several assorted drivers, but there are also quite a few updates
  related to system sleep, mostly focused on asynchronous suspend and
  resume of devices and on making the integration of system suspend
  and resume with runtime PM easier.

  Runtime PM is also updated to allow some code duplication in drivers
  to be eliminated going forward and to work more consistently overall
  in some cases.

  Apart from that, there are some driver core updates related to PM
  domains that should help to address ordering issues with devm_ cleanup
  routines relying on PM domains, some assorted devfreq updates
  including core fixes and cleanups, tooling updates, and documentation
  and MAINTAINERS updates.

  Specifics:

   - Fix two initialization ordering issues in the cpufreq core and a
     governor initialization error path in it, and clean it up (Lifeng
     Zheng)

   - Add Granite Rapids support in no-HWP mode to the intel_pstate
     cpufreq driver (Li RongQing)

   - Make intel_pstate always use HWP_DESIRED_PERF when operating in the
     passive mode (Rafael Wysocki)

   - Allow building the tegra124 cpufreq driver as a module (Aaron
     Kling)

   - Do minor cleanups for Rust cpufreq and cpumask APIs and fix
     MAINTAINERS entry for cpu.rs (Abhinav Ananthu, Ritvik Gupta, Lukas
     Bulwahn)

   - Clean up assorted cpufreq drivers (Arnd Bergmann, Dan Carpenter,
     Krzysztof Kozlowski, Sven Peter, Svyatoslav Ryhel, Lifeng Zheng)

   - Add the NEED_UPDATE_LIMITS flag to the CPPC cpufreq driver
     (Prashant Malani)

   - Fix minimum performance state label error in the amd-pstate driver
     documentation (Shouye Liu)

   - Add the CPUFREQ_GOV_STRICT_TARGET flag to the userspace cpufreq
     governor and explain HW coordination influence on it in the
     documentation (Shashank Balaji)

   - Fix opencoded for_each_cpu() in idle_state_valid() in the DT
     cpuidle driver (Yury Norov)

   - Remove info about non-existing QoS interfaces from the PM QoS
     documentation (Ulf Hansson)

   - Use c_* types via kernel prelude in Rust for OPP (Abhinav Ananthu)

   - Add HiSilicon uncore frequency scaling driver to devfreq (Jie Zhan)

   - Allow devfreq drivers to add custom sysfs ABIs (Jie Zhan)

   - Simplify the sun8i-a33-mbus devfreq driver by using more devm
     functions (Uwe Kleine-König)

   - Fix an index typo in trans_stat() in devfreq (Chanwoo Choi)

   - Check devfreq governor before using governor->name (Lifeng Zheng)

   - Remove a redundant devfreq_get_freq_range() call from
     devfreq_add_device() (Lifeng Zheng)

   - Limit max_freq with scaling_min_freq in devfreq (Lifeng Zheng)

   - Replace sscanf() with kstrtoul() in set_freq_store() (Lifeng Zheng)

   - Extend the asynchronous suspend and resume of devices to handle
     suppliers like parents and consumers like children (Rafael Wysocki)

   - Make pm_runtime_force_resume() work for drivers that set the
     DPM_FLAG_SMART_SUSPEND flag and allow PCI drivers and drivers that
     collaborate with the general ACPI PM domain to set it (Rafael
     Wysocki)

   - Add kernel parameter to disable asynchronous suspend/resume of
     devices (Tudor Ambarus)

   - Drop redundant might_sleep() calls from some functions in the
     device suspend/resume core code (Zhongqiu Han)

   - Fix the handling of monitors connected right before waking up the
     system from sleep (tuhaowen)

   - Clean up MAINTAINERS entries for suspend and hibernation (Rafael
     Wysocki)

   - Fix error code path in the KEXEC_JUMP flow and drop a redundant
     pm_restore_gfp_mask() call from it (Rafael Wysocki)

   - Rearrange suspend/resume error handling in the core device suspend
     and resume code (Rafael Wysocki)

   - Fix up white space that does not follow coding style in the
     hibernation core code (Darshan Rathod)

   - Document return values of suspend-related API functions in the
     runtime PM framework (Sakari Ailus)

   - Mark last busy stamp in multiple autosuspend-related functions in
     the runtime PM framework and update its documentation (Sakari
     Ailus)

   - Take active children into account in pm_runtime_get_if_in_use() for
     consistency (Rafael Wysocki)

   - Fix NULL pointer dereference in get_pd_power_uw() in the dtpm_cpu
     power capping driver (Sivan Zohar-Kotzer)

   - Add support for the Bartlett Lake platform to the Intel RAPL power
     capping driver (Qiao Wei)

   - Add PL4 support for Panther Lake to the intel_rapl_msr power
     capping driver (Zhang Rui)

   - Update contact information in the PM ABI docs and maintainer
     information in the power domains DT binding (Rafael Wysocki)

   - Update PM header inclusions to follow the IWYU (Include What You
     Use) principle (Andy Shevchenko)

   - Add flags to specify power on attach/detach for PM domains, make
     the driver core detach PM domains in device_unbind_cleanup(), and
     drop the dev_pm_domain_detach() call from the platform bus type
     (Claudiu Beznea)

   - Improve Python binding's Makefile for cpupower (John B. Wyatt IV)

   - Fix printing of CORE, CPU fields in cpupower-monitor (Gautham
     Shenoy)"

* tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
  cpufreq: CPPC: Mark driver with NEED_UPDATE_LIMITS flag
  PM: docs: Use my kernel.org address in ABI docs and DT bindings
  PM: hibernate: Fix up white space that does not follow coding style
  PM: sleep: Rearrange suspend/resume error handling in the core
  Documentation: amd-pstate:fix minimum performance state label error
  PM: runtime: Take active children into account in pm_runtime_get_if_in_use()
  kexec_core: Drop redundant pm_restore_gfp_mask() call
  kexec_core: Fix error code path in the KEXEC_JUMP flow
  PM: sleep: Clean up MAINTAINERS entries for suspend and hibernation
  drivers: cpufreq: add Tegra114 support
  rust: cpumask: Replace `MaybeUninit` and `mem::zeroed` with `Opaque` APIs
  cpufreq: Exit governor when failed to start old governor
  cpufreq: Move the check of cpufreq_driver->get into cpufreq_verify_current_freq()
  cpufreq: Init policy->rwsem before it may be possibly used
  cpufreq: Initialize cpufreq-based frequency-invariance later
  cpufreq: Remove duplicate check in __cpufreq_offline()
  cpufreq: Contain scaling_cur_freq.attr in cpufreq_attrs
  cpufreq: intel_pstate: Add Granite Rapids support in no-HWP mode
  cpufreq: intel_pstate: Always use HWP_DESIRED_PERF in passive mode
  PM / devfreq: Add HiSilicon uncore frequency scaling driver
  ...
2025-07-28 20:13:36 -07:00
Prachotan Bathi
7f0c6675b3 tpm_crb_ffa: handle tpm busy return code
Platforms supporting direct message request v2 [1] can support secure
partitions that support multiple services. For CRB over FF-A interface,
if the firmware TPM or TPM service [1] shares its Secure Partition (SP)
with another service, message requests may fail with a -EBUSY error.

To handle this, replace the single check and call with a retry loop
that attempts the TPM message send operation until it succeeds or a
configurable timeout is reached. Implement a _try_send_receive function
to do a single send/receive and modify the existing send_receive to
add this retry loop.
The retry mechanism introduces a module parameter (`busy_timeout_ms`,
default: 2000ms) to control how long to keep retrying on -EBUSY
responses. Between retries, the code waits briefly (50-100 microseconds)
to avoid busy-waiting and handling TPM BUSY conditions more gracefully.

The parameter can be modified at run-time as such:
echo 3000 | tee /sys/module/tpm_crb_ffa/parameters/busy_timeout_ms
This changes the timeout from the default 2000ms to 3000ms.

[1] TPM Service Command Response Buffer Interface Over FF-A
https://developer.arm.com/documentation/den0138/latest/

Signed-off-by: Prachotan Bathi <prachotan.bathi@arm.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-07-23 02:32:00 +03:00
Feng Tang
ee13240cd7 panic: add note that panic_print sysctl interface is deprecated
Add a dedicated core parameter 'panic_console_replay' for controlling
console replay, and add note that 'panic_print' sysctl interface will be
obsoleted by 'panic_sys_info' and 'panic_console_replay'.  When it
happens, the SYS_INFO_PANIC_CONSOLE_REPLAY can be removed as well.

Link: https://lkml.kernel.org/r/20250703021004.42328-6-feng.tang@linux.alibaba.com
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:08:25 -07:00
Feng Tang
9743d12d0c panic: add 'panic_sys_info=' setup option for kernel cmdline
'panic_sys_info=' sysctl interface is already added for runtime setting. 
Add counterpart kernel cmdline option for boottime setting.

Link: https://lkml.kernel.org/r/20250703021004.42328-5-feng.tang@linux.alibaba.com
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:08:24 -07:00
Feng Tang
261743b013 panic: clean up code for console replay
Patch series "generalize panic_print's dump function to be used by other
kernel parts", v3.

When working on kernel stability issues, panic, task-hung and
software/hardware lockup are frequently met.  And to debug them, user may
need lots of system information at that time, like task call stacks, lock
info, memory info etc.  

panic case already has panic_print_sys_info() for this purpose, and has a
'panic_print' bitmask to control what kinds of information is needed,
which is also helpful to debug other task-hung and lockup cases.

So this patchset extracts the function out to a new file 'lib/sys_info.c',
and makes it available for other cases which also need to dump system info
for debugging.  

Also as suggested by Petr Mladek, add 'panic_sys_info=' interface to take
human readable string like "tasks,mem,locks,timers,ftrace,....", and
eventually obsolete the current 'panic_print' bitmap interface.

In RFC and V1 version, hung_task and SW/HW watchdog modules are enabled
with the new sys_info dump interface.  In v2, they are kept out for better
review of current change, and will be posted later.  

Locally these have been used in our bug chasing for stability issues and
was proven helpful.

Many thanks to Petr Mladek for great suggestions on both the code and
architectures!


This patch (of 5):

Currently the panic_print_sys_info() was called twice with different
parameters to handle console replay case, which is kind of confusing.

Add panic_console_replay() explicitly and rename
'PANIC_PRINT_ALL_PRINTK_MSG' to 'PANIC_CONSOLE_REPLAY', to make the code
straightforward.  The related kernel document is also updated.

Link: https://lkml.kernel.org/r/20250703021004.42328-1-feng.tang@linux.alibaba.com
Link: https://lkml.kernel.org/r/20250703021004.42328-2-feng.tang@linux.alibaba.com
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:08:24 -07:00
Jiri Bohac
ce1bf19a34 kdump, documentation: describe craskernel CMA reservation
Describe the new crashkernel ",cma" suffix in Documentation/

Link: https://lkml.kernel.org/r/aEqpQwUy6gqSiUkV@dwarf.suse.cz
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Tao Liu <ltao@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:08:23 -07:00
Michal Koutný
646faf36d7 cgroup: Add compatibility option for content of /proc/cgroups
/proc/cgroups lists only v1 controllers by default, however, this is
only enforced since the commit af000ce852 ("cgroup: Do not report
unavailable v1 controllers in /proc/cgroups") and there is software in
the wild that uses content of /proc/cgroups to decide on availability of
v2 (sic) controllers.

Add a boottime param that can bring back the previous behavior for
setups where the check in the software cannot be changed and it causes
e.g. unintended OOMs.

Also, this patch takes out cgrp_v1_visible from cgroup1_subsys_absent()
guard since it's only important to check which hierarchy (v1 vs v2) the
subsys is attached to. This has no effect on the printed message but
the code is cleaner since cgrp_v1_visible is really about mounted
hierarchies, not the content of /proc/cgroups.

Link: https://lore.kernel.org/r/b26b60b7d0d2a5ecfd2f3c45f95f32922ed24686.camel@decadent.org.uk
Fixes: af000ce852 ("cgroup: Do not report unavailable v1 controllers in /proc/cgroups")
Fixes: a0ab145322 ("cgroup: Print message when /proc/cgroups is read on v2-only system")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-07-19 06:14:44 -10:00
Anup Patel
ea92b6046d irqchip/riscv-imsic: Add kernel parameter to disable IPIs
When injecting IPIs to a set of harts, the IMSIC IPI support will do a
separate MMIO write to the SETIPNUM_LE register of each target hart. This
means on a platform where IMSIC is trap-n-emulated, there will be N MMIO
traps when injecting IPI to N target harts hence IMSIC IPIs will be slow on
such platforms compared to the SBI IPI extension.

Unfortunately, there is no DT, ACPI, or any other way of discovering
whether the underlying IMSIC is trap-n-emulated. Using MMIO write to the
SETIPNUM_LE register for injecting IPI is purely a software choice in the
IMSIC driver hence add a kernel parameter to allow users to disable IMSIC
IPIs on platforms with trap-n-emulated IMSIC.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250716123745.557585-1-apatel@ventanamicro.com
2025-07-18 16:46:09 +02:00
Rafael J. Wysocki
7c1f7c22e6 Merge back earlier material related to system sleep 2025-07-17 19:55:25 +02:00
Uladzislau Rezki (Sony)
d827673d8a Documentation/kernel-parameters: Update rcu_normal_wake_from_gp doc
Update the documentation about rcu_normal_wake_from_gp parameter.

Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
2025-07-16 09:24:39 +05:30
John Stultz
25c411fce7 sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
Add a CONFIG_SCHED_PROXY_EXEC option, along with a boot argument
sched_proxy_exec= that can be used to disable the feature at boot
time if CONFIG_SCHED_PROXY_EXEC was enabled.

Also uses this option to allow the rq->donor to be different from
rq->curr.

Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lkml.kernel.org/r/20250712033407.2383110-2-jstultz@google.com
2025-07-14 17:16:31 +02:00
David Kaplan
1caa1b0509 Documentation/x86: Document new attack vector controls
Document the 5 new attack vector command line options, how they
interact with existing vulnerability controls, and recommendations on when
they can be disabled.

Note that while mitigating against untrusted userspace requires both
user-to-kernel and user-to-user protection, these are kept separate.  The
kernel can control what code executes inside of it and that may affect the
risk associated with vulnerabilities especially if new kernel mitigations
are implemented.  The same isn't typically true of userspace.

In other words, the risk associated with user-to-user or guest-to-guest
attacks is unlikely to change over time.  While the risk associated with
user-to-kernel or guest-to-host attacks may change.  Therefore, these
controls are separated.

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250709155731.3279419-1-david.kaplan@amd.com
2025-07-11 17:51:43 +02:00
Tudor Ambarus
f747cde5e7 PM: sleep: add kernel parameter to disable asynchronous suspend/resume
On some platforms, device dependencies are not properly represented by
device links, which can cause issues when asynchronous power management
is enabled. While it is possible to disable this via sysfs, doing so
at runtime can race with the first system suspend event.

This patch introduces a kernel command-line parameter, "pm_async", which
can be set to "off" to globally disable asynchronous suspend and resume
operations from early boot. It effectively provides a way to set the
initial value of the existing pm_async sysfs knob at boot time. This
offers a robust method to fall back to synchronous (sequential)
operation, which can stabilize platforms with problematic dependencies
and also serve as a useful debugging tool.

The default behavior remains unchanged (asynchronous enabled). To
disable it, boot the kernel with the "pm_async=off" parameter.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20250709-pm-async-off-v3-1-cb69a6fc8d04@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-07-10 14:40:07 +02:00
Matthew Wilcox (Oracle)
262e086f93 doc: Move SLUB documentation to the admin guide
This section is supposed to be for internal documentation, while the
document is advice for sysadmins.  Move it to the appropriate place.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20250611155916.2579160-2-willy@infradead.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-06-18 13:06:26 +02:00
Borislav Petkov (AMD)
d8010d4ba4 x86/bugs: Add a Transient Scheduler Attacks mitigation
Add the required features detection glue to bugs.c et all in order to
support the TSA mitigation.

Co-developed-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
2025-06-17 17:17:02 +02:00
Baoquan He
aa9bb1b325 ima: add a knob ima= to allow disabling IMA in kdump kernel
Kdump kernel doesn't need IMA functionality, and enabling IMA will cost
extra memory. It would be very helpful to allow IMA to be disabled for
kdump kernel.

Hence add a knob ima=on|off here to allow turning IMA off in kdump
kernel if needed.

Note that this IMA disabling is limited to kdump kernel, please don't
abuse it in other kernel and thus serious consequences are caused.

Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-06-16 09:15:13 -04:00
Kees Cook
de1c831a78 slab: Decouple slab_debug and no_hash_pointers
Some system owners use slab_debug=FPZ (or similar) as a hardening option,
but do not want to be forced into having kernel addresses exposed due
to the implicit "no_hash_pointers" boot param setting.[1]

Introduce the "hash_pointers" boot param, which defaults to "auto"
(the current behavior), but also includes "always" (forcing on hashing
even when "slab_debug=..." is defined), and "never". The existing
"no_hash_pointers" boot param becomes an alias for "hash_pointers=never".

This makes it possible to boot with "slab_debug=FPZ hash_pointers=always".

Link: https://github.com/KSPP/linux/issues/368 [1]
Fixes: 792702911f ("slub: force on no_hash_pointers when slub_debug is enabled")
Co-developed-by: Sergio Perez Gonzalez <sperezglz@gmail.com>
Signed-off-by: Sergio Perez Gonzalez <sperezglz@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20250415170232.it.467-kees@kernel.org
[kees@kernel.org: Add note about hash_pointers into slab_debug kernel parameter documentation.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-06-09 16:26:10 +02:00
Linus Torvalds
e9e668cd27 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
 "We've got a couple of build fixes when using LLD, a missing TLB
  invalidation and a workaround for broken firmware on SoCs with CPUs
  that implement MPAM:

   - Disable problematic linker assertions for broken versions of LLD

   - Work around sporadic link failure with LLD and various randconfig
     builds

   - Fix missing invalidation in the TLB batching code when reclaim
     races with mprotect() and friends

   - Add a command-line override for MPAM to allow booting on systems
     with broken firmware"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Add override for MPAM
  arm64/mm: Close theoretical race where stale TLB entry remains valid
  arm64: Work around convergence issue with LLD linker
  arm64: Disable LLD linker ASSERT()s for the time being
2025-06-05 11:39:17 -07:00