Move kvm_intr_is_single_vcpu() to lapic.c, drop its export, and make its
"fast" helper local to lapic.c. kvm_intr_is_single_vcpu() is only usable
if the local APIC is in-kernel, i.e. it most definitely belongs in the
local APIC code.
No functional change intended.
Fixes: cf04ec393e ("KVM: x86: Dedup AVIC vs. PI code for identifying target vCPU")
Link: https://lore.kernel.org/r/20250919003303.1355064-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rework the vast majority of KVM's exports to expose symbols only to KVM
submodules, i.e. to x86's kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko.
With few exceptions, KVM's exported APIs are intended (and safe) for KVM-
internal usage only.
Keep kvm_get_kvm(), kvm_get_kvm_safe(), and kvm_put_kvm() as normal
exports, as they are needed by VFIO, and are generally safe for external
usage (though ideally even the get/put APIs would be KVM-internal, and
VFIO would pin a VM by grabbing a reference to its associated file).
Implement a framework in kvm_types.h in anticipation of providing a macro
to restrict KVM-specific kernel exports, i.e. to provide symbol exports
for KVM if and only if KVM is built as one or more modules.
Link: https://lore.kernel.org/r/20250919003303.1355064-3-seanjc@google.com
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use kvm_is_gpa_in_memslot() to check the validity of the notification
indicator byte address instead of open coding equivalent logic in the VFIO
AP driver.
Opportunistically use a dedicated wrapper that exists and is exported
expressly for the VFIO AP module. kvm_is_gpa_in_memslot() is generally
unsuitable for use outside of KVM; other drivers typically shouldn't rely
on KVM's memslots, and using the API requires kvm->srcu (or slots_lock) to
be held for the entire duration of the usage, e.g. to avoid TOCTOU bugs.
handle_pqap() is a bit of a special case, as it's explicitly invoked from
KVM with kvm->srcu already held, and the VFIO AP driver is in many ways an
extension of KVM that happens to live in a separate module.
Providing a dedicated API for the VFIO AP driver will allow restricting
the vast majority of generic KVM's exports to KVM submodules (e.g. to x86's
kvm-{amd,intel}.ko vendor mdoules).
No functional change intended.
Acked-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20250919003303.1355064-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM x86 CET virtualization support for 6.18
Add support for virtualizing Control-flow Enforcement Technology (CET) on
Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks).
CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect
Branch Tracking (IBT), that can be utilized by software to help provide
Control-flow integrity (CFI). SHSTK defends against backward-edge attacks
(a.k.a. Return-oriented programming (ROP)), while IBT defends against
forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)).
Attackers commonly use ROP and COP/JOP methodologies to redirect the control-
flow to unauthorized targets in order to execute small snippets of code,
a.k.a. gadgets, of the attackers choice. By chaining together several gadgets,
an attacker can perform arbitrary operations and circumvent the system's
defenses.
SHSTK defends against backward-edge attacks, which execute gadgets by modifying
the stack to branch to the attacker's target via RET, by providing a second
stack that is used exclusively to track control transfer operations. The
shadow stack is separate from the data/normal stack, and can be enabled
independently in user and kernel mode.
When SHSTK is is enabled, CALL instructions push the return address on both the
data and shadow stack. RET then pops the return address from both stacks and
compares the addresses. If the return addresses from the two stacks do not
match, the CPU generates a Control Protection (#CP) exception.
IBT defends against backward-edge attacks, which branch to gadgets by executing
indirect CALL and JMP instructions with attacker controlled register or memory
state, by requiring the target of indirect branches to start with a special
marker instruction, ENDBRANCH. If an indirect branch is executed and the next
instruction is not an ENDBRANCH, the CPU generates a #CP. Note, ENDBRANCH
behaves as a NOP if IBT is disabled or unsupported.
From a virtualization perspective, CET presents several problems. While SHSTK
and IBT have two layers of enabling, a global control in the form of a CR4 bit,
and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET
respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS.
Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable
option due to complexity, and outright disallowing use of XSTATE to context
switch SHSTK/IBT state would render the features unusable to most guests.
To limit the overall complexity without sacrificing performance or usability,
simply ignore the potential virtualization hole, but ensure that all paths in
KVM treat SHSTK/IBT as usable by the guest if the feature is supported in
hardware, and the guest has access to at least one of SHSTK or IBT. I.e. allow
userspace to advertise one of SHSTK or IBT if both are supported in hardware,
even though doing so would allow a misbehaving guest to use the unadvertised
feature.
Fully emulating SHSTK and IBT would also require significant complexity, e.g.
to track and update branch state for IBT, and shadow stack state for SHSTK.
Given that emulating large swaths of the guest code stream isn't necessary on
modern CPUs, punt on emulating instructions that meaningful impact or consume
SHSTK or IBT. However, instead of doing nothing, explicitly reject emulation
of such instructions so that KVM's emulator can't be abused to circumvent CET.
Disable support for SHSTK and IBT if KVM is configured such that emulation of
arbitrary guest instructions may be required, specifically if Unrestricted
Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that
is smaller than host.MAXPHYADDR.
Lastly disable SHSTK support if shadow paging is enabled, as the protections
for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so
that they can't be directly modified by software), i.e. would require
non-trivial support in the Shadow MMU.
Note, AMD CPUs currently only support SHSTK. Explicitly disable IBT support
so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT
in SVM requires KVM modifications.
KVM x86 changes for 6.18
- Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw
where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's
intercepts and trigger a variety of WARNs in KVM.
- Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is
supposed to exist for v2 PMUs.
- Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs.
- Clean up KVM's vector hashing code for delivering lowest priority IRQs.
- Clean up the fastpath handler code to only handle IPIs and WRMSRs that are
actually "fast", as opposed to handling those that KVM _hopes_ are fast, and
in the process of doing so add fastpath support for TSC_DEADLINE writes on
AMD CPUs.
- Clean up a pile of PMU code in anticipation of adding support for mediated
vPMUs.
- Add support for the immediate forms of RDMSR and WRMSRNS, sans full
emulator support (KVM should never need to emulate the MSRs outside of
forced emulation and other contrived testing scenarios).
- Clean up the MSR APIs in preparation for CET and FRED virtualization, as
well as mediated vPMU support.
- Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs,
as KVM can't faithfully emulate an I/O APIC for such guests.
- KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation
for mediated vPMU support, as KVM will need to recalculate MSR intercepts in
response to PMU refreshes for guests with mediated vPMUs.
- Misc cleanups and minor fixes.
Pull scheduler updates from Ingo Molnar:
"Core scheduler changes:
- Make migrate_{en,dis}able() inline, to improve performance
(Menglong Dong)
- Move STDL_INIT() functions out-of-line (Peter Zijlstra)
- Unify the SCHED_{SMT,CLUSTER,MC} Kconfig (Peter Zijlstra)
Fair scheduling:
- Defer throttling to when tasks exit to user-space, to reduce the
chance & impact of throttle-preemption with held locks and other
resources (Aaron Lu, Valentin Schneider)
- Get rid of sched_domains_curr_level hack for tl->cpumask(), as the
warning was getting triggered on certain topologies (Peter
Zijlstra)
Misc cleanups & fixes:
- Header cleanups (Menglong Dong)
- Fix race in push_dl_task() (Harshit Agarwal)"
* tag 'sched-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix some typos in include/linux/preempt.h
sched: Make migrate_{en,dis}able() inline
rcu: Replace preempt.h with sched.h in include/linux/rcupdate.h
arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c
sched/fair: Do not balance task to a throttled cfs_rq
sched/fair: Do not special case tasks in throttled hierarchy
sched/fair: update_cfs_group() for throttled cfs_rqs
sched/fair: Propagate load for throttled cfs_rq
sched/fair: Get rid of throttled_lb_pair()
sched/fair: Task based throttle time accounting
sched/fair: Switch to task based throttle model
sched/fair: Implement throttle task work and related helpers
sched/fair: Add related data structure for task based throttle
sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig
sched: Move STDL_INIT() functions out-of-line
sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
sched/deadline: Fix race in push_dl_task()
KVM SEV-SNP CipherText Hiding support for 6.18
Add support for SEV-SNP's CipherText Hiding, an opt-in feature that prevents
unauthorized CPU accesses from reading the ciphertext of SNP guest private
memory, e.g. to attempt an offline attack. Instead of ciphertext, the CPU
will always read back all FFs when CipherText Hiding is enabled.
Add new module parameter to the KVM module to enable CipherText Hiding and
control the number of ASIDs that can be used for VMs with CipherText Hiding,
which is in effect the number of SNP VMs. When CipherText Hiding is enabled,
the shared SEV-ES/SEV-SNP ASID space is split into separate ranges for SEV-ES
and SEV-SNP guests, i.e. ASIDs that can be used for CipherText Hiding cannot
be used to run SEV-ES guests.
KVM SVM changes for 6.18
- Require a minimum GHCB version of 2 when starting SEV-SNP guests via
KVM_SEV_INIT2 so that invalid GHCB versions result in immediate errors
instead of latent guest failures.
- Add support for Secure TSC for SEV-SNP guests, which prevents the untrusted
host from tampering with the guest's TSC frequency, while still allowing the
the VMM to configure the guest's TSC frequency prior to launch.
- Mitigate the potential for TOCTOU bugs when accessing GHCB fields by
wrapping all accesses via READ_ONCE().
- Validate the XCR0 provided by the guest (via the GHCB) to avoid tracking a
bogous XCR0 value in KVM's software model.
- Save an SEV guest's policy if and only if LAUNCH_START fully succeeds to
avoid leaving behind stale state (thankfully not consumed in KVM).
- Explicitly reject non-positive effective lengths during SNP's LAUNCH_UPDATE
instead of subtly relying on guest_memfd to do the "heavy" lifting.
- Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the host's
desired TSC_AUX, to fix a bug where KVM could clobber a different vCPU's
TSC_AUX due to hardware not matching the value cached in the user-return MSR
infrastructure.
- Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is supported,
and clean up the AVIC initialization code along the way.
KVM VMX changes for 6.18
- Add read/write helpers for MSRs that need to be accessed with preemption
disable to prepare for virtualizing FRED RSP0.
- Fix a bug where KVM would return 0/success from __tdx_bringup() on error,
i.e. where KVM would load with enable_tdx=true despite TDX not being usable.
- Minor cleanups.
KVM x86 MMU changes for 6.18
- Recover possible NX huge pages within the TDP MMU under read lock to
reduce guest jitter when restoring NX huge pages.
- Return -EAGAIN during prefault if userspace concurrently deletes/moves the
relevant memslot to fix an issue where prefaulting could deadlock with the
memslot update.
- Don't retry in TDX's anti-zero-step mitigation if the target memslot is
invalid, i.e. is being deleted or moved, to fix a deadlock scenario similar
to the aforementioned prefaulting case.
x86/kvm guest side changes for 6.18
- For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping
could map devices as WB and prevent the device drivers from mapping their
devices with UC/UC-.
- Make kvm_async_pf_task_wake() a local static helper and remove its
export.
- Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU
bindings even when PV_UNHALT is unsupported.
KVM/riscv changes for 6.18
- Added SBI FWFT extension for Guest/VM with misaligned
delegation and pointer masking PMLEN features
- Added ONE_REG interface for SBI FWFT extension
- Added Zicbop and bfloat16 extensions for Guest/VM
- Enabled more common KVM selftests for RISC-V such as
access_tracking_perf_test, dirty_log_perf_test,
memslot_modification_stress_test, memslot_perf_test,
mmu_stress_test, and rseq_test
- Added SBI v3.0 PMU enhancements in KVM and perf driver
KVM/arm64 updates for 6.18
- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
allowing more registers to be used as part of the message payload.
- Change the way pKVM allocates its VM handles, making sure that the
privileged hypervisor is never tricked into using uninitialised
data.
- Speed up MMIO range registration by avoiding unnecessary RCU
synchronisation, which results in VMs starting much quicker.
- Add the dump of the instruction stream when panic-ing in the EL2
payload, just like the rest of the kernel has always done. This will
hopefully help debugging non-VHE setups.
- Add 52bit PA support to the stage-1 page-table walker, and make use
of it to populate the fault level reported to the guest on failing
to translate a stage-1 walk.
- Add NV support to the GICv3-on-GICv5 emulation code, ensuring
feature parity for guests, irrespective of the host platform.
- Fix some really ugly architecture problems when dealing with debug
in a nested VM. This has some bad performance impacts, but is at
least correct.
- Add enough infrastructure to be able to disable EL2 features and
give effective values to the EL2 control registers. This then allows
a bunch of features to be turned off, which helps cross-host
migration.
- Large rework of the selftest infrastructure to allow most tests to
transparently run at EL2. This is the first step towards enabling
NV testing.
- Various fixes and improvements all over the map, including one BE
fix, just in time for the removal of the feature.
KVM/arm64 changes for 6.17, round #3
- Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
visiting from an MMU notifier
- Fixes to the TLB match process and TLB invalidation range for
managing the VCNR pseudo-TLB
- Prevent SPE from erroneously profiling guests due to UNKNOWN reset
values in PMSCR_EL1
- Fix save/restore of host MDCR_EL2 to account for eagerly programming
at vcpu_load() on VHE systems
- Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios
where an xarray's spinlock was nested with a *raw* spinlock
- Permit stage-2 read permission aborts which are possible in the case
of NV depending on the guest hypervisor's stage-2 translation
- Call raw_spin_unlock() instead of the internal spinlock API
- Fix parameter ordering when assigning VBAR_EL1
[Pull into kvm/master to fix conflicts. - Paolo]
KVM: s390: A bugfix and a performance improvement
* Improve interrupt cpu for wakeup, change the heuristic to decide wich
vCPU to deliver a floating interrupt to.
* Clear the pte when discarding a swapped page because of CMMA; this
bug was introduced in 6.16 when refactoring gmap code.
KVM run fails when guests with 'cmm' cpu feature and host are
under memory pressure and use swap heavily. This is because
npages becomes ENOMEN (out of memory) in hva_to_pfn_slow()
which inturn propagates as EFAULT to qemu. Clearing the page
table entry when discarding an address that maps to a swap
entry resolves the issue.
Fixes: 200197908d ("KVM: s390: Refactor and split some gmap helpers")
Cc: stable@vger.kernel.org
Suggested-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Gautam Gala <ggala@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Pull xen updates from Juergen Gross:
- fix the migration of a Xen virq to another cpu plus some related
cleanup work
- clean up Xen-PV mode specific code, resulting in removing some of
that code in the resulting binary in case CONFIG_XEN_PV is not set
- fixes and cleanup for suspend handling under Xen
* tag 'for-linus-6.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: take system_transition_mutex on suspend
xen/manage: Fix suspend error path
xen/events: Update virq_to_irq on migration
xen/events: Return -EEXIST for bound VIRQs
xen/events: Cleanup find_virq() return codes
x86/xen: select HIBERNATE_CALLBACKS more directly
drivers/xen/gntdev: use xen_pv_domain() instead of cached value
xen: replace XENFEAT_auto_translated_physmap with xen_pv_domain()
xen: rework xen_pv_domain()
Pull powerpc updates from Madhavan Srinivasan:
- powerpc support for BPF arena and arena atomics
- Patches to switch to msi parent domain (per-device MSI domains)
- Add a lock contention tracepoint in the queued spinlock slowpath
- Fixes for underflow in pseries/powernv msi and pci paths
- Switch from legacy-of-mm-gpiochip dependency to platform driver
- Fixes for handling TLB misses
- Introduce support for powerpc papr-hvpipe
- Add vpa-dtl PMU driver for pseries platform
- Misc fixes and cleanups
Thanks to Aboorva Devarajan, Aditya Bodkhe, Andrew Donnellan, Athira
Rajeev, Cédric Le Goater, Christophe Leroy, Erhard Furtner, Gautam
Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joe Lawrence,
Kajol Jain, Kienan Stewart, Linus Walleij, Mahesh Salgaonkar, Nam Cao,
Nicolas Schier, Nysal Jan K.A., Ritesh Harjani (IBM), Ruben Wauters,
Saket Kumar Bhaskar, Shashank MS, Shrikanth Hegde, Tejas Manhas, Thomas
Gleixner, Thomas Huth, Thorsten Blum, Tyrel Datwyler, and Venkat Rao
Bagalkote.
* tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (49 commits)
powerpc/pseries: Define __u{8,32} types in papr_hvpipe_hdr struct
genirq/msi: Remove msi_post_free()
powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed
powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data
docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu
powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
powerpc/time: Expose boot_tb via accessor
powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
powerpc/fprobe: fix updated fprobe for function-graph tracer
powerpc/ftrace: support CONFIG_FUNCTION_GRAPH_RETVAL
powerpc64/modules: replace stub allocation sentinel with an explicit counter
powerpc64/modules: correctly iterate over stubs in setup_ftrace_ool_stubs
powerpc/ftrace: ensure ftrace record ops are always set for NOPs
powerpc/603: Really copy kernel PGD entries into all PGDIRs
powerpc/8xx: Remove left-over instruction and comments in DataStoreTLBMiss handler
powerpc/pseries: HVPIPE changes to support migration
powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS
powerpc/pseries: Enable HVPIPE event message interrupt
...
Pull s390 updates from Alexander Gordeev:
- Refactor SCLP memory hotplug code
- Introduce common boot_panic() decompressor helper macro and use it to
get rid of nearly few identical implementations
- Take into account additional key generation flags and forward it to
the ep11 implementation. With that allow users to modify the key
generation process, e.g. provide valid combinations of XCP_BLOB_*
flags
- Replace kmalloc() + copy_from_user() with memdup_user_nul() in s390
debug facility and HMC driver
- Add DAX support for DCSS memory block devices
- Make the compiler statement attribute "assume" available with a new
__assume macro
- Rework ffs() and fls() family bitops functions, including source code
improvements and generated code optimizations. Use the newly
introduced __assume macro for that
- Enable additional network features in default configurations
- Use __GFP_ACCOUNT flag for user page table allocations to add missing
kmemcg accounting
- Add WQ_PERCPU flag to explicitly request the use of the per-CPU
workqueue for 3590 tape driver
- Switch power reading to the per-CPU and the Hiperdispatch to the
default workqueue
- Add memory allocation profiling hooks to allow better profiling data
and the /proc/allocinfo output similar to other architectures
* tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (21 commits)
s390/mm: Add memory allocation profiling hooks
s390: Replace use of system_wq with system_dfl_wq
s390/diag324: Replace use of system_wq with system_percpu_wq
s390/tape: Add WQ_PERCPU to alloc_workqueue users
s390/bitops: Switch to generic ffs() if supported by compiler
s390/bitops: Switch to generic fls(), fls64(), etc.
s390/mm: Use __GFP_ACCOUNT for user page table allocations
s390/configs: Enable additional network features
s390/bitops: Cleanup __flogr()
s390/bitops: Use __assume() for __flogr() inline assembly return value
compiler_types: Add __assume macro
s390/bitops: Limit return value range of __flogr()
s390/dcssblk: Add DAX support
s390/hmcdrv: Replace kmalloc() + copy_from_user() with memdup_user_nul()
s390/debug: Replace kmalloc() + copy_from_user() with memdup_user_nul()
s390/pkey: Forward keygenflags to ep11_unwrapkey
s390/boot: Add common boot_panic() code
s390/bitops: Optimize inlining
s390/bitops: Slightly optimize ffs() and fls64()
s390/sclp: Move memory hotplug code for better modularity
...
Pull RISC-V updates from Paul Walmsley
- Replacement of __ASSEMBLY__ with __ASSEMBLER__ in header files (other
architectures have already merged this type of cleanup)
- The introduction of ioremap_wc() for RISC-V
- Cleanup of the RISC-V kprobes code to use mostly-extant macros rather
than open code
- A RISC-V kprobes unit test
- An architecture-specific endianness swap macro set implementation,
leveraging some dedicated RISC-V instructions for this purpose if
they are available
- The ability to identity and communicate to userspace the presence
of a MIPS P8700-specific ISA extension, and to leverage its
MIPS-specific PAUSE implementation in cpu_relax()
- Several other miscellaneous cleanups
* tag 'riscv-for-linus-6.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (39 commits)
riscv: errata: Fix the PAUSE Opcode for MIPS P8700
riscv: hwprobe: Document MIPS xmipsexectl vendor extension
riscv: hwprobe: Add MIPS vendor extension probing
riscv: Add xmipsexectl instructions
riscv: Add xmipsexectl as a vendor extension
dt-bindings: riscv: Add xmipsexectl ISA extension description
riscv: cpufeature: add validation for zfa, zfh and zfhmin
perf: riscv: skip empty batches in counter start
selftests: riscv: Add README for RISC-V KSelfTest
riscv: sbi: Switch to new sys-off handler API
riscv: Move vendor errata definitions to new header
RISC-V: ACPI: enable parsing the BGRT table
riscv: Enable ARCH_HAVE_NMI_SAFE_CMPXCHG
riscv: pi: use 'targets' instead of extra-y in Makefile
riscv: introduce asm/swab.h
riscv: mmap(): use unsigned offset type in riscv_sys_mmap
drivers/perf: riscv: Remove redundant ternary operators
riscv: mm: Use mmu-type from FDT to limit SATP mode
riscv: mm: Return intended SATP mode for noXlvl options
riscv: kprobes: Remove duplication of RV_EXTRACT_ITYPE_IMM
...
Pull arm64 updates from Will Deacon:
"There's good stuff across the board, including some nice mm
improvements for CPUs with the 'noabort' BBML2 feature and a clever
patch to allow ptdump to play nicely with block mappings in the
vmalloc area.
Confidential computing:
- Add support for accepting secrets from firmware (e.g. ACPI CCEL)
and mapping them with appropriate attributes.
CPU features:
- Advertise atomic floating-point instructions to userspace
- Extend Spectre workarounds to cover additional Arm CPU variants
- Extend list of CPUs that support break-before-make level 2 and
guarantee not to generate TLB conflict aborts for changes of
mapping granularity (BBML2_NOABORT)
- Add GCS support to our uprobes implementation.
Documentation:
- Remove bogus SME documentation concerning register state when
entering/exiting streaming mode.
Entry code:
- Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY)
- Micro-optimise syscall entry path with a compiler branch hint.
Memory management:
- Enable huge mappings in vmalloc space even when kernel page-table
dumping is enabled
- Tidy up the types used in our early MMU setup code
- Rework rodata= for closer parity with the behaviour on x86
- For CPUs implementing BBML2_NOABORT, utilise block mappings in the
linear map even when rodata= applies to virtual aliases
- Don't re-allocate the virtual region between '_text' and '_stext',
as doing so confused tools parsing /proc/vmcore.
Miscellaneous:
- Clean-up Kconfig menuconfig text for architecture features
- Avoid redundant bitmap_empty() during determination of supported
SME vector lengths
- Re-enable warnings when building the 32-bit vDSO object
- Avoid breaking our eggs at the wrong end.
Perf and PMUs:
- Support for v3 of the Hisilicon L3C PMU
- Support for Hisilicon's MN and NoC PMUs
- Support for Fujitsu's Uncore PMU
- Support for SPE's extended event filtering feature
- Preparatory work to enable data source filtering in SPE
- Support for multiple lanes in the DWC PCIe PMU
- Support for i.MX94 in the IMX DDR PMU driver
- MAINTAINERS update (Thank you, Yicong)
- Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets).
Selftests:
- Add basic LSFE check to the existing hwcaps test
- Support nolibc in GCS tests
- Extend SVE ptrace test to pass unsupported regsets and invalid
vector lengths
- Minor cleanups (typos, cosmetic changes).
System registers:
- Fix ID_PFR1_EL1 definition
- Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1
- Sync TCR_EL1 definition with the latest Arm ARM (L.b)
- Be stricter about the input fed into our AWK sysreg generator
script
- Typo fixes and removal of redundant definitions.
ACPI, EFI and PSCI:
- Decouple Arm's "Software Delegated Exception Interface" (SDEI)
support from the ACPI GHES code so that it can be used by platforms
booted with device-tree
- Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
runtime calls
- Fix a node refcount imbalance in the PSCI device-tree code.
CPU Features:
- Ensure register sanitisation is applied to fields in ID_AA64MMFR4
- Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM
guests can reliably query the underlying CPU types from the VMM
- Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
to our context-switching, signal handling and ptrace code"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
arm64: cpufeature: Remove duplicate asm/mmu.h header
arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN
perf/dwc_pcie: Fix use of uninitialized variable
arm/syscalls: mark syscall invocation as likely in invoke_syscall
Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU
Documentation: hisi-pmu: Fix of minor format error
drivers/perf: hisi: Add support for L3C PMU v3
drivers/perf: hisi: Refactor the event configuration of L3C PMU
drivers/perf: hisi: Extend the field of tt_core
drivers/perf: hisi: Extract the event filter check of L3C PMU
drivers/perf: hisi: Simplify the probe process of each L3C PMU version
drivers/perf: hisi: Export hisi_uncore_pmu_isr()
drivers/perf: hisi: Relax the event ID check in the framework
perf: Fujitsu: Add the Uncore PMU driver
arm64: map [_text, _stext) virtual address range non-executable+read-only
arm64/sysreg: Update TCR_EL1 register
arm64: Enable vmalloc-huge with ptdump
arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list
arm64: errata: Apply workarounds for Neoverse-V3AE
arm64: cputype: Add Neoverse-V3AE definitions
...
Pull microblaze updates from Michal Simek:
- Fix typos in Kconfig
- s/__ASSEMBLY__/__ASSEMBLER__/g
* tag 'microblaze-v6.18' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
microblaze: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
microblaze: fix typos in Kconfig
Pull NIOS2 updates from Dinh Nguyen:
- Replace __ASSEMBLY__ with __ASSEMBLER__ in headers
- Set memblock.current_limit when setting pfn limits
* tag 'nios2_update_for_v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
nios2: ensure that memblock.current_limit is set when setting pfn limits
nios2: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
nios2: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
Pull hardening updates from Kees Cook:
"One notable addition is the creation of the 'transitional' keyword for
kconfig so CONFIG renaming can go more smoothly.
This has been a long-standing deficiency, and with the renaming of
CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI
support), this came up again.
The breadth of the diffstat is mainly this renaming.
- Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)
- lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
(Junjie Cao)
- Add str_assert_deassert() helper (Lad Prabhakar)
- gcc-plugins: Remove TODO_verify_il for GCC >= 16
- kconfig: Fix BrokenPipeError warnings in selftests
- kconfig: Add transitional symbol attribute for migration support
- kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI"
* tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib/string_choices: Add str_assert_deassert() helper
kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
kconfig: Add transitional symbol attribute for migration support
kconfig: Fix BrokenPipeError warnings in selftests
gcc-plugins: Remove TODO_verify_il for GCC >= 16
stddef: Introduce __TRAILING_OVERLAP()
stddef: Remove token-pasting in TRAILING_OVERLAP()
lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
Pull execve updates from Kees Cook:
- binfmt_elf: preserve original ELF e_flags for core dumps (Svetlana
Parfenova)
- exec: Fix incorrect type for ret (Xichao Zhao)
- binfmt_elf: Replace offsetof() with struct_size() in fill_note_info()
(Xichao Zhao)
* tag 'execve-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf: preserve original ELF e_flags for core dumps
binfmt_elf: Replace offsetof() with struct_size() in fill_note_info()
exec: Fix incorrect type for ret
Pull ffs const-attribute cleanups from Kees Cook:
"While working on various hardening refactoring a while back we
encountered inconsistencies in the application of __attribute_const__
on the ffs() family of functions.
This series fixes this across all archs and adds KUnit tests.
Notably, this found a theoretical underflow in PCI (also fixed here)
and uncovered an inefficiency in ARC (fixed in the ARC arch PR). I
kept the series separate from the general hardening PR since it is a
stand-alone "topic".
- PCI: Fix theoretical underflow in use of ffs().
- Universally apply __attribute_const__ to all architecture's
ffs()-family of functions.
- Add KUnit tests for ffs() behavior and const-ness"
* tag 'ffs-const-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
KUnit: ffs: Validate all the __attribute_const__ annotations
sparc: Add __attribute_const__ to ffs()-family implementations
xtensa: Add __attribute_const__ to ffs()-family implementations
s390: Add __attribute_const__ to ffs()-family implementations
parisc: Add __attribute_const__ to ffs()-family implementations
mips: Add __attribute_const__ to ffs()-family implementations
m68k: Add __attribute_const__ to ffs()-family implementations
openrisc: Add __attribute_const__ to ffs()-family implementations
riscv: Add __attribute_const__ to ffs()-family implementations
hexagon: Add __attribute_const__ to ffs()-family implementations
alpha: Add __attribute_const__ to ffs()-family implementations
sh: Add __attribute_const__ to ffs()-family implementations
powerpc: Add __attribute_const__ to ffs()-family implementations
x86: Add __attribute_const__ to ffs()-family implementations
csky: Add __attribute_const__ to ffs()-family implementations
bitops: Add __attribute_const__ to generic ffs()-family implementations
KUnit: Introduce ffs()-family tests
PCI: Test for bit underflow in pcie_set_readrq()
Pull crypto library updates from Eric Biggers:
- Add a RISC-V optimized implementation of Poly1305. This code was
written by Andy Polyakov and contributed by Zhihang Shao.
- Migrate the MD5 code into lib/crypto/, and add KUnit tests for MD5.
Yes, it's still the 90s, and several kernel subsystems are still
using MD5 for legacy use cases. As long as that remains the case,
it's helpful to clean it up in the same way as I've been doing for
other algorithms.
Later, I plan to convert most of these users of MD5 to use the new
MD5 library API instead of the generic crypto API.
- Simplify the organization of the ChaCha, Poly1305, BLAKE2s, and
Curve25519 code.
Consolidate these into one module per algorithm, and centralize the
configuration and build process. This is the same reorganization that
has already been successful for SHA-1 and SHA-2.
- Remove the unused crypto_kpp API for Curve25519.
- Migrate the BLAKE2s and Curve25519 self-tests to KUnit.
- Always enable the architecture-optimized BLAKE2s code.
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (38 commits)
crypto: md5 - Implement export_core() and import_core()
wireguard: kconfig: simplify crypto kconfig selections
lib/crypto: tests: Enable Curve25519 test when CRYPTO_SELFTESTS
lib/crypto: curve25519: Consolidate into single module
lib/crypto: curve25519: Move a couple functions out-of-line
lib/crypto: tests: Add Curve25519 benchmark
lib/crypto: tests: Migrate Curve25519 self-test to KUnit
crypto: curve25519 - Remove unused kpp support
crypto: testmgr - Remove curve25519 kpp tests
crypto: x86/curve25519 - Remove unused kpp support
crypto: powerpc/curve25519 - Remove unused kpp support
crypto: arm/curve25519 - Remove unused kpp support
crypto: hisilicon/hpre - Remove unused curve25519 kpp support
lib/crypto: tests: Add KUnit tests for BLAKE2s
lib/crypto: blake2s: Consolidate into single C translation unit
lib/crypto: blake2s: Move generic code into blake2s.c
lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code
lib/crypto: blake2s: Remove obsolete self-test
lib/crypto: x86/blake2s: Reduce size of BLAKE2S_SIGMA2
lib/crypto: chacha: Consolidate into single module
...
Pull vfs async directory updates from Christian Brauner:
"This contains further preparatory changes for the asynchronous directory
locking scheme:
- Add lookup_one_positive_killable() which allows overlayfs to
perform lookup that won't block on a fatal signal
- Unify the mount idmap handling in struct renamedata as a rename can
only happen within a single mount
- Introduce kern_path_parent() for audit which sets the path to the
parent and returns a dentry for the target without holding any
locks on return
- Rename kern_path_locked() as it is only used to prepare for the
removal of an object from the filesystem:
kern_path_locked() => start_removing_path()
kern_path_create() => start_creating_path()
user_path_create() => start_creating_user_path()
user_path_locked_at() => start_removing_user_path_at()
done_path_create() => end_creating_path()
NA => end_removing_path()"
* tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
debugfs: rename start_creating() to debugfs_start_creating()
VFS: rename kern_path_locked() and related functions.
VFS/audit: introduce kern_path_parent() for audit
VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
VFS: discard err2 in filename_create()
VFS/ovl: add lookup_one_positive_killable()
Pull copy_process updates from Christian Brauner:
"This contains the changes to enable support for clone3() on nios2
which apparently is still a thing.
The more exciting part of this is that it cleans up the inconsistency
in how the 64-bit flag argument is passed from copy_process() into the
various other copy_*() helpers"
[ Fixed up rv ltl_monitor 32-bit support as per Sasha Levin in the merge ]
* tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
nios2: implement architecture-specific portion of sys_clone3
arch: copy_thread: pass clone_flags as u64
copy_process: pass clone_flags as u64 across calltree
copy_sighand: Handle architectures where sizeof(unsigned long) < sizeof(u64)
Pull vfs inode updates from Christian Brauner:
"This contains a series I originally wrote and that Eric brought over
the finish line. It moves out the i_crypt_info and i_verity_info
pointers out of 'struct inode' and into the fs-specific part of the
inode.
So now the few filesytems that actually make use of this pay the price
in their own private inode storage instead of forcing it upon every
user of struct inode.
The pointer for the crypt and verity info is simply found by storing
an offset to its address in struct fsverity_operations and struct
fscrypt_operations. This shrinks struct inode by 16 bytes.
I hope to move a lot more out of it in the future so that struct inode
becomes really just about very core stuff that we need, much like
struct dentry and struct file, instead of the dumping ground it has
become over the years.
On top of this are a various changes associated with the ongoing inode
lifetime handling rework that multiple people are pushing forward:
- Stop accessing inode->i_count directly in f2fs and gfs2. They
simply should use the __iget() and iput() helpers
- Make the i_state flags an enum
- Rework the iput() logic
Currently, if we are the last iput, and we have the I_DIRTY_TIME
bit set, we will grab a reference on the inode again and then mark
it dirty and then redo the put. This is to make sure we delay the
time update for as long as possible
We can rework this logic to simply dec i_count if it is not 1, and
if it is do the time update while still holding the i_count
reference
Then we can replace the atomic_dec_and_lock with locking the
->i_lock and doing atomic_dec_and_test, since we did the
atomic_add_unless above
- Add an icount_read() helper and convert everyone that accesses
inode->i_count directly for this purpose to use the helper
- Expand dump_inode() to dump more information about an inode helping
in debugging
- Add some might_sleep() annotations to iput() and associated
helpers"
* tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: add might_sleep() annotation to iput() and more
fs: expand dump_inode()
inode: fix whitespace issues
fs: add an icount_read helper
fs: rework iput logic
fs: make the i_state flags an enum
fs: stop accessing ->i_count directly in f2fs and gfs2
fsverity: check IS_VERITY() in fsverity_cleanup_inode()
fs: remove inode::i_verity_info
btrfs: move verity info pointer to fs-specific part of inode
f2fs: move verity info pointer to fs-specific part of inode
ext4: move verity info pointer to fs-specific part of inode
fsverity: add support for info in fs-specific part of inode
fs: remove inode::i_crypt_info
ceph: move crypt info pointer to fs-specific part of inode
ubifs: move crypt info pointer to fs-specific part of inode
f2fs: move crypt info pointer to fs-specific part of inode
ext4: move crypt info pointer to fs-specific part of inode
fscrypt: add support for info in fs-specific part of inode
fscrypt: replace raw loads of info pointer with helper function
Remove superfluous newlines from inline assemblies. Compilers use the
number of lines of inline assemblies as heuristic for the complexity
and inline decisions. Therefore inline assemblies should only contain
as many lines as required.
A lot of inline assemblies contain a superfluous newline for the last
line. Remove such newlines to improve compiler inlining decisions.
Suggested-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
If the decompressor is compiled with clang this can lead to the following
warning:
In file included from arch/s390/boot/startup.c:4:
...
In file included from ./include/linux/pgtable.h:6:
./arch/s390/include/asm/pgtable.h:2065:48: warning: passing 'unsigned long *' to parameter of type
'long *' converts between pointers to integer types with different sign [-Wpointer-sign]
2065 | value = __atomic64_or_barrier(PGSTE_PCL_BIT, ptr);
Add -Wno-pointer-sign to the decompressor compile flags, like it is also
done for the kernel. This is similar to what was done for x86 to address
the same problem [1].
[1] commit dca5203e3f ("x86/boot: Add -Wno-pointer-sign to KBUILD_CFLAGS")
Cc: stable@vger.kernel.org
Reported-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Pull ARM fix from Russell King:
"Just one fix to the module freeing function that was declared __weak
when it should not have been. Thanks to Petr Pavlu for spotting this"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9458/1: module: Ensure the override of module_arch_freeing_init()
Pull RISC-V fixes from Paul Walmsley:
- A race-free implementation of pudp_huge_get_and_clear() (based on the
x86 code)
- A MAINTAINERS update to my E-mail address
* tag 'riscv-for-linus-v6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
MAINTAINERS: Update Paul Walmsley's E-mail address
riscv: Use an atomic xchg in pudp_huge_get_and_clear()
Pull x86 fixes from Ingo Molnar:
"Fix a CPU topology code regression that caused the mishandling of
certain boot command line options, and re-enable CONFIG_PTDUMP on i386
that was mistakenly turned off in the Kconfig"
* tag 'x86-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology: Implement topology_is_core_online() to address SMT regression
x86/Kconfig: Reenable PTDUMP on i386
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembly code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This can be very confusing when switching between userspace
and kernelspace coding, or when dealing with uapi headers that
rather should use __ASSEMBLER__ instead. So let's standardize on
the __ASSEMBLER__ macro that is provided by the compilers now.
This is a completely mechanical patch (done with a simple "sed -i"
statement).
Cc: David S. Miller <davem@davemloft.net>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
__ASSEMBLY__ is only defined by the Makefile of the kernel, so
this is not really useful for uapi headers (unless the userspace
Makefile defines it, too). Let's switch to __ASSEMBLER__ which
gets set automatically by the compiler when compiling assembly
code.
This is a completely mechanical patch (done with a simple "sed -i"
statement).
Cc: David S. Miller <davem@davemloft.net>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
In the very early kernel 1.x days, assembler files were pre-processed
with the "-traditional" flag. With kernel 1.1.85, the sparc subsystem
was changed to use "-ansi" instead while the other parts of the kernel
continued to use "-traditional". That "-traditional" got removed from
the other architectures in the course of time, but the sparc part
kept the "-ansi" until today.
This is bad since it comes with some disadvantages nowadays: You have
to make sure to not include any header that contains a "//" C++ comment
by accident (there are now some in the tree that use these for SPDX
identifiers for example), and with "-ansi" we also do not get the
pre-defined __ASSEMBLER__ macro which we'd like to use instead of the
kernel-only __ASSEMBLY__ macro in the future.
Since there does not seem to be any compelling reason anymore to use
"-ansi" nowadays, let's simply drop the "-ansi" flag from the sparc
subsystem now to get rid of those disadvantages.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Tested-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>