When we exec a new task we forget to flush the set of locked GCS mode bits.
Since we do flush the rest of the state this means that if GCS is locked
the new task will be unable to enable GCS, it will be locked as being
disabled. Add the expected flush.
Fixes: fc84bc5378 ("arm64/gcs: Context switch GCS state for EL0")
Cc: <stable@vger.kernel.org> # 6.13.x
Reported-by: Yury Khrustalev <Yury.Khrustalev@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Tested-by: Yury Khrustalev <yury.khrustalev@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since commit 7137a203b2 ("arm64/fpsimd: Permit kernel mode NEON with
IRQs off"), the only condition under which the fallback path is taken
for FP/SIMD preserve/restore across a EFI runtime call is when it is
called from hardirq or NMI context.
In practice, this only happens when the EFI pstore driver is called to
dump the kernel log buffer into a EFI variable under a panic, oops or
emergency_restart() condition, and none of these can be expected to
result in a return to user space for the task in question.
This means that the existing EFI-specific logic for preserving and
restoring SVE/SME state is pointless, and can be removed.
Instead, kill the task, so that an exceedingly unlikely inadvertent
return to user space does not proceed with a corrupted FP/SIMD state.
Also, retain the preserve and restore of the base FP/SIMD state, as that
might belong to kernel mode use of FP/SIMD. (Note that EFI runtime calls
are never invoked reentrantly, even in this case, and so any interrupted
kernel mode FP/SIMD usage will be unrelated to EFI)
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull driver core updates from Danilo Krummrich:
"Arch Topology:
- Move parse_acpi_topology() from arm64 to common code for reuse in
RISC-V
CPU:
- Expose housekeeping CPUs through /sys/devices/system/cpu/housekeeping
- Print a newline (or 0x0A) instead of '(null)' reading
/sys/devices/system/cpu/nohz_full when nohz_full= is not set
debugfs
- Remove (broken) 'no-mount' mode
- Remove redundant access mode checks in debugfs_get_tree() and
debugfs_create_*() functions
Devres:
- Remove unused devm_free_percpu() helper
- Move devm_alloc_percpu() from device.h to devres.h
Firmware Loader:
- Replace simple_strtol() with kstrtoint()
- Do not call cancel_store() when no upload is in progress
kernfs:
- Increase struct super_block::maxbytes to MAX_LFS_FILESIZE
- Fix a missing unwind path in __kernfs_new_node()
Misc:
- Increase the name size in struct auxiliary_device_id to 40
characters
- Replace system_unbound_wq with system_dfl_wq and add WQ_PERCPU to
alloc_workqueue()
Platform:
- Replace ERR_PTR() with IOMEM_ERR_PTR() in platform ioremap
functions
Rust:
- Auxiliary:
- Unregister auxiliary device on parent device unbind
- Move parent() to impl Device; implement device context aware
parent() for Device<Bound>
- Illustrate how to safely obtain a driver's device private data
when calling from an auxiliary driver into the parant device
driver
- DebugFs:
- Implement support for binary large objects
- Device:
- Let probe() return the driver's device private data as pinned
initializer, i.e. impl PinInit<Self, Error>
- Implement safe accessor for a driver's device private data for
Device<Bound> (returned reference can't out-live driver binding
and guarantees the correct private data type)
- Implement AsBusDevice trait, to be used by class device
abstractions to derive the bus device type of the parent device
- DMA:
- Store raw pointer of allocation as NonNull
- Use start_ptr() and start_ptr_mut() to inherit correct
mutability of self
- FS:
- Add file::Offset type alias
- I2C:
- Add abstractions for I2C device / driver infrastructure
- Implement abstractions for manual I2C device registrations
- I/O:
- Use "kernel vertical" style for imports
- Define ResourceSize as resource_size_t
- Move ResourceSize to top-level I/O module
- Add type alias for phys_addr_t
- Implement Rust version of read_poll_timeout_atomic()
- PCI:
- Use "kernel vertical" style for imports
- Move I/O and IRQ infrastructure to separate files
- Add support for PCI interrupt vectors
- Implement TryInto<IrqRequest<'a>> for IrqVector<'a> to convert
an IrqVector bound to specific pci::Device into an IrqRequest
bound to the same pci::Device's parent Device
- Leverage pin_init_scope() to get rid of redundant Result in IRQ
methods
- PinInit:
- Add {pin_}init_scope() to execute code before creating an
initializer
- Platform:
- Leverage pin_init_scope() to get rid of redundant Result in IRQ
methods
- Timekeeping:
- Implement abstraction of udelay()
- Uaccess:
- Implement read_slice_partial() and read_slice_file() for
UserSliceReader
- Implement write_slice_partial() and write_slice_file() for
UserSliceWriter
sysfs:
- Prepare the constification of struct attribute"
* tag 'driver-core-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (75 commits)
rust: pci: fix build failure when CONFIG_PCI_MSI is disabled
debugfs: Fix default access mode config check
debugfs: Remove broken no-mount mode
debugfs: Remove redundant access mode checks
driver core: Check drivers_autoprobe for all added devices
driver core: WQ_PERCPU added to alloc_workqueue users
driver core: replace use of system_unbound_wq with system_dfl_wq
tick/nohz: Expose housekeeping CPUs in sysfs
tick/nohz: avoid showing '(null)' if nohz_full= not set
sysfs/cpu: Use DEVICE_ATTR_RO for nohz_full attribute
kernfs: fix memory leak of kernfs_iattrs in __kernfs_new_node
fs/kernfs: raise sb->maxbytes to MAX_LFS_FILESIZE
mod_devicetable: Bump auxiliary_device_id name size
sysfs: simplify attribute definition macros
samples/kobject: constify 'struct foo_attribute'
samples/kobject: add is_visible() callback to attribute group
sysfs: attribute_group: enable const variants of is_visible()
sysfs: introduce __SYSFS_FUNCTION_ALTERNATIVE()
sysfs: transparently handle const pointers in ATTRIBUTE_GROUPS()
sysfs: attribute_group: allow registration of const attribute
...
Pull KVM updates from Paolo Bonzini:
"ARM:
- Support for userspace handling of synchronous external aborts
(SEAs), allowing the VMM to potentially handle the abort in a
non-fatal manner
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers
in hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU
- Allow page table destruction to reschedule, fixing long
need_resched latencies observed when destroying a large VM
- Minor fixes to KVM and selftests
Loongarch:
- Get VM PMU capability from HW GCFG register
- Add AVEC basic support
- Use 64-bit register definition for EIOINTC
- Add KVM timer test cases for tools/selftests
RISC/V:
- SBI message passing (MPXY) support for KVM guest
- Give a new, more specific error subcode for the case when in-kernel
AIA virtualization fails to allocate IMSIC VS-file
- Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
s390:
- Always allocate ESCA (Extended System Control Area), instead of
starting with the basic SCA and converting to ESCA with the
addition of the 65th vCPU. The price is increased number of exits
(and worse performance) on z10 and earlier processor; ESCA was
introduced by z114/z196 in 2010
- VIRT_XFER_TO_GUEST_WORK support
- Operation exception forwarding support
- Cleanups
x86:
- Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO
SPTE caching is disabled, as there can't be any relevant SPTEs to
zap
- Relocate a misplaced export
- Fix an async #PF bug where KVM would clear the completion queue
when the guest transitioned in and out of paging mode, e.g. when
handling an SMI and then returning to paged mode via RSM
- Leave KVM's user-return notifier registered even when disabling
virtualization, as long as kvm.ko is loaded. On reboot/shutdown,
keeping the notifier registered is ok; the kernel does not use the
MSRs and the callback will run cleanly and restore host MSRs if the
CPU manages to return to userspace before the system goes down
- Use the checked version of {get,put}_user()
- Fix a long-lurking bug where KVM's lack of catch-up logic for
periodic APIC timers can result in a hard lockup in the host
- Revert the periodic kvmclock sync logic now that KVM doesn't use a
clocksource that's subject to NTP corrections
- Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the
latter behind CONFIG_CPU_MITIGATIONS
- Context switch XCR0, XSS, and PKRU outside of the entry/exit fast
path; the only reason they were handled in the fast path was to
paper of a bug in the core #MC code, and that has long since been
fixed
- Add emulator support for AVX MOV instructions, to play nice with
emulated devices whose guest drivers like to access PCI BARs with
large multi-byte instructions
x86 (AMD):
- Fix a few missing "VMCB dirty" bugs
- Fix the worst of KVM's lack of EFER.LMSLE emulation
- Add AVIC support for addressing 4k vCPUs in x2AVIC mode
- Fix incorrect handling of selective CR0 writes when checking
intercepts during emulation of L2 instructions
- Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32]
on VMRUN and #VMEXIT
- Fix a bug where KVM corrupt the guest code stream when re-injecting
a soft interrupt if the guest patched the underlying code after the
VM-Exit, e.g. when Linux patches code with a temporary INT3
- Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits
to userspace, and extend KVM "support" to all policy bits that
don't require any actual support from KVM
x86 (Intel):
- Use the root role from kvm_mmu_page to construct EPTPs instead of
the current vCPU state, partly as worthwhile cleanup, but mostly to
pave the way for tracking per-root TLB flushes, and elide EPT
flushes on pCPU migration if the root is clean from a previous
flush
- Add a few missing nested consistency checks
- Rip out support for doing "early" consistency checks via hardware
as the functionality hasn't been used in years and is no longer
useful in general; replace it with an off-by-default module param
to WARN if hardware fails a check that KVM does not perform
- Fix a currently-benign bug where KVM would drop the guest's
SPEC_CTRL[63:32] on VM-Enter
- Misc cleanups
- Overhaul the TDX code to address systemic races where KVM (acting
on behalf of userspace) could inadvertantly trigger lock contention
in the TDX-Module; KVM was either working around these in weird,
ugly ways, or was simply oblivious to them (though even Yan's
devilish selftests could only break individual VMs, not the host
kernel)
- Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a
TDX vCPU, if creating said vCPU failed partway through
- Fix a few sparse warnings (bad annotation, 0 != NULL)
- Use struct_size() to simplify copying TDX capabilities to userspace
- Fix a bug where TDX would effectively corrupt user-return MSR
values if the TDX Module rejects VP.ENTER and thus doesn't clobber
host MSRs as expected
Selftests:
- Fix a math goof in mmu_stress_test when running on a single-CPU
system/VM
- Forcefully override ARCH from x86_64 to x86 to play nice with
specifying ARCH=x86_64 on the command line
- Extend a bunch of nested VMX to validate nested SVM as well
- Add support for LA57 in the core VM_MODE_xxx macro, and add a test
to verify KVM can save/restore nested VMX state when L1 is using
5-level paging, but L2 is not
- Clean up the guest paging code in anticipation of sharing the core
logic for nested EPT and nested NPT
guest_memfd:
- Add NUMA mempolicy support for guest_memfd, and clean up a variety
of rough edges in guest_memfd along the way
- Define a CLASS to automatically handle get+put when grabbing a
guest_memfd from a memslot to make it harder to leak references
- Enhance KVM selftests to make it easer to develop and debug
selftests like those added for guest_memfd NUMA support, e.g. where
test and/or KVM bugs often result in hard-to-debug SIGBUS errors
- Misc cleanups
Generic:
- Use the recently-added WQ_PERCPU when creating the per-CPU
workqueue for irqfd cleanup
- Fix a goof in the dirty ring documentation
- Fix choice of target for directed yield across different calls to
kvm_vcpu_on_spin(); the function was always starting from the first
vCPU instead of continuing the round-robin search"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits)
KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
KVM: arm64: selftests: Add test for AT emulation
KVM: arm64: nv: Expose hardware access flag management to NV guests
KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
KVM: arm64: Implement HW access flag management in stage-1 SW PTW
KVM: arm64: Propagate PTW errors up to AT emulation
KVM: arm64: Add helper for swapping guest descriptor
KVM: arm64: nv: Use pgtable definitions in stage-2 walk
KVM: arm64: Handle endianness in read helper for emulated PTW
KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
KVM: arm64: Call helper for reading descriptors directly
KVM: arm64: nv: Advertise support for FEAT_XNX
KVM: arm64: Teach ptdump about FEAT_XNX permissions
KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions
...
Pull arm64 FPSIMD on-stack buffer updates from Eric Biggers:
"This is a core arm64 change. However, I was asked to take this because
most uses of kernel-mode FPSIMD are in crypto or CRC code.
In v6.8, the size of task_struct on arm64 increased by 528 bytes due
to the new 'kernel_fpsimd_state' field. This field was added to allow
kernel-mode FPSIMD code to be preempted.
Unfortunately, 528 bytes is kind of a lot for task_struct. This
regression in the task_struct size was noticed and reported.
Recover that space by making this state be allocated on the stack at
the beginning of each kernel-mode FPSIMD section.
To make it easier for all the users of kernel-mode FPSIMD to do that
correctly, introduce and use a 'scoped_ksimd' abstraction"
* tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (23 commits)
lib/crypto: arm64: Move remaining algorithms to scoped ksimd API
lib/crypto: arm/blake2b: Move to scoped ksimd API
arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack
arm64/fpu: Enforce task-context only for generic kernel mode FPU
net/mlx5: Switch to more abstract scoped ksimd guard API on arm64
arm64/xorblocks: Switch to 'ksimd' scoped guard API
crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API
crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API
crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API
crypto/arm64: polyval - Switch to 'ksimd' scoped guard API
crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API
crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API
crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API
crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API
raid6: Move to more abstract 'ksimd' guard API
crypto: aegis128-neon - Move to more abstract 'ksimd' guard API
crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
...
Pull arm64 updates from Catalin Marinas:
"These are the arm64 updates for 6.19.
The biggest part is the Arm MPAM driver under drivers/resctrl/.
There's a patch touching mm/ to handle spurious faults for huge pmd
(similar to the pte version). The corresponding arm64 part allows us
to avoid the TLB maintenance if a (huge) page is reused after a write
fault. There's EFI refactoring to allow runtime services with
preemption enabled and the rest is the usual perf/PMU updates and
several cleanups/typos.
Summary:
Core features:
- Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
driver under drivers/resctrl/ which makes use of the fs/rectrl/ API
Perf and PMU:
- Avoid cycle counter on multi-threaded CPUs
- Extend CSPMU device probing and add additional filtering support
for NVIDIA implementations
- Add support for the PMUs on the NoC S3 interconnect
- Add additional compatible strings for new Cortex and C1 CPUs
- Add support for data source filtering to the SPE driver
- Add support for i.MX8QM and "DB" PMU in the imx PMU driver
Memory managemennt:
- Avoid broadcast TLBI if page reused in write fault
- Elide TLB invalidation if the old PTE was not valid
- Drop redundant cpu_set_*_tcr_t0sz() macros
- Propagate pgtable_alloc() errors outside of __create_pgd_mapping()
- Propagate return value from __change_memory_common()
ACPI and EFI:
- Call EFI runtime services without disabling preemption
- Remove unused ACPI function
Miscellaneous:
- ptrace support to disable streaming on SME-only systems
- Improve sysreg generation to include a 'Prefix' descriptor
- Replace __ASSEMBLY__ with __ASSEMBLER__
- Align register dumps in the kselftest zt-test
- Remove some no longer used macros/functions
- Various spelling corrections"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic
arm64/pageattr: Propagate return value from __change_memory_common
arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
KVM: arm64: selftests: Consider all 7 possible levels of cache
KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
Documentation/arm64: Fix the typo of register names
ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
perf: arm_spe: Add support for filtering on data source
perf: Add perf_event_attr::config4
perf/imx_ddr: Add support for PMU in DB (system interconnects)
perf/imx_ddr: Get and enable optional clks
perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
arm64: mm: use untagged address to calculate page index
MAINTAINERS: new entry for MPAM Driver
arm_mpam: Add kunit tests for props_mismatch()
arm_mpam: Add kunit test for bitmap reset
arm_mpam: Add helper to reset saved mbwu state
...
KVM/arm64 updates for 6.19
- Support for userspace handling of synchronous external aborts (SEAs),
allowing the VMM to potentially handle the abort in a non-fatal
manner.
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers in
hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ.
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU.
- Allow page table destruction to reschedule, fixing long need_resched
latencies observed when destroying a large VM.
- Minor fixes to KVM and selftests
Pull irq core updates from Thomas Gleixner:
"Updates for the interrupt core and treewide cleanups:
- Rework of the Per Processor Interrupt (PPI) management on ARM[64]
PPI support was built under the assumption that the systems are
homogenous so that the same CPU local device types are connected to
them. That's unfortunately wishful thinking and created horrible
workarounds.
This rework provides affinity management for PPIs so that they can
be individually configured in the firmware tables and mops up the
related drivers all over the place.
- Prevent CPUSET/isolation changes to arbitrarily affine interrupt
threads to random CPUs, which ignores user or driver settings.
- Plug a harmless race in the interrupt affinity proc interface,
which allows to see a half updated mask
- Adjust the priority of secondary interrupt threads on RT, so that
the combination of primary and secondary thread emulates the
hardware interrupt plus thread scenario. Having them at the same
priority can cause starvation issues in some drivers"
* tag 'irq-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
genirq: Remove cpumask availability check on kthread affinity setting
genirq: Fix interrupt threads affinity vs. cpuset isolated partitions
genirq: Prevent early spurious wake-ups of interrupt threads
genirq: Use raw_spinlock_irq() in irq_set_affinity_notifier()
genirq/manage: Reduce priority of forced secondary interrupt handler
genirq/proc: Fix race in show_irq_affinity()
genirq: Fix percpu_devid irq affinity documentation
perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer
irqdomain: Kill of_node_to_fwnode() helper
genirq: Kill irq_{g,s}et_percpu_devid_partition()
irqchip: Kill irq-partition-percpu
irqchip/apple-aic: Drop support for custom PMU irq partitions
irqchip/gic-v3: Drop support for custom PPI partitions
coresight: trbe: Request specific affinities for per CPU interrupts
perf: arm_spe_pmu: Request specific affinities for per CPU interrupts
perf: arm_pmu: Request specific affinities for per CPU NMIs/interrupts
genirq: Add request_percpu_irq_affinity() helper
genirq: Allow per-cpu interrupt sharing for non-overlapping affinities
genirq: Update request_percpu_nmi() to take an affinity
genirq: Add affinity to percpu_devid interrupt requests
...
Pull rseq updates from Thomas Gleixner:
"A large overhaul of the restartable sequences and CID management:
The recent enablement of RSEQ in glibc resulted in regressions which
are caused by the related overhead. It turned out that the decision to
invoke the exit to user work was not really a decision. More or less
each context switch caused that. There is a long list of small issues
which sums up nicely and results in a 3-4% regression in I/O
benchmarks.
The other detail which caused issues due to extra work in context
switch and task migration is the CID (memory context ID) management.
It also requires to use a task work to consolidate the CID space,
which is executed in the context of an arbitrary task and results in
sporadic uncontrolled exit latencies.
The rewrite addresses this by:
- Removing deprecated and long unsupported functionality
- Moving the related data into dedicated data structures which are
optimized for fast path processing.
- Caching values so actual decisions can be made
- Replacing the current implementation with a optimized inlined
variant.
- Separating fast and slow path for architectures which use the
generic entry code, so that only fault and error handling goes into
the TIF_NOTIFY_RESUME handler.
- Rewriting the CID management so that it becomes mostly invisible in
the context switch path. That moves the work of switching modes
into the fork/exit path, which is a reasonable tradeoff. That work
is only required when a process creates more threads than the
cpuset it is allowed to run on or when enough threads exit after
that. An artificial thread pool benchmarks which triggers this did
not degrade, it actually improved significantly.
The main effect in migration heavy scenarios is that runqueue lock
held time and therefore contention goes down significantly"
* tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched/mmcid: Switch over to the new mechanism
sched/mmcid: Implement deferred mode change
irqwork: Move data struct to a types header
sched/mmcid: Provide CID ownership mode fixup functions
sched/mmcid: Provide new scheduler CID mechanism
sched/mmcid: Introduce per task/CPU ownership infrastructure
sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex
sched/mmcid: Provide precomputed maximal value
sched/mmcid: Move initialization out of line
signal: Move MMCID exit out of sighand lock
sched/mmcid: Convert mm CID mask to a bitmap
cpumask: Cache num_possible_cpus()
sched/mmcid: Use cpumask_weighted_or()
cpumask: Introduce cpumask_weighted_or()
sched/mmcid: Prevent pointless work in mm_update_cpus_allowed()
sched/mmcid: Move scheduler code out of global header
sched: Fixup whitespace damage
sched/mmcid: Cacheline align MM CID storage
sched/mmcid: Use proper data structures
sched/mmcid: Revert the complex CID management
...
Pull misc vfs updates from Christian Brauner:
"Features:
- Cheaper MAY_EXEC handling for path lookup. This elides MAY_WRITE
permission checks during path lookup and adds the
IOP_FASTPERM_MAY_EXEC flag so filesystems like btrfs can avoid
expensive permission work.
- Hide dentry_cache behind runtime const machinery.
- Add German Maglione as virtiofs co-maintainer.
Cleanups:
- Tidy up and inline step_into() and walk_component() for improved
code generation.
- Re-enable IOCB_NOWAIT writes to files. This refactors file
timestamp update logic, fixing a layering bypass in btrfs when
updating timestamps on device files and improving FMODE_NOCMTIME
handling in VFS now that nfsd started using it.
- Path lookup optimizations extracting slowpaths into dedicated
routines and adding branch prediction hints for mntput_no_expire(),
fd_install(), lookup_slow(), and various other hot paths.
- Enable clang's -fms-extensions flag, requiring a JFS rename to
avoid conflicts.
- Remove spurious exports in fs/file_attr.c.
- Stop duplicating union pipe_index declaration. This depends on the
shared kbuild branch that brings in -fms-extensions support which
is merged into this branch.
- Use MD5 library instead of crypto_shash in ecryptfs.
- Use largest_zero_folio() in iomap_dio_zero().
- Replace simple_strtol/strtoul with kstrtoint/kstrtouint in init and
initrd code.
- Various typo fixes.
Fixes:
- Fix emergency sync for btrfs. Btrfs requires an explicit sync_fs()
call with wait == 1 to commit super blocks. The emergency sync path
never passed this, leaving btrfs data uncommitted during emergency
sync.
- Use local kmap in watch_queue's post_one_notification().
- Add hint prints in sb_set_blocksize() for LBS dependency on THP"
* tag 'vfs-6.19-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
MAINTAINERS: add German Maglione as virtiofs co-maintainer
fs: inline step_into() and walk_component()
fs: tidy up step_into() & friends before inlining
orangefs: use inode_update_timestamps directly
btrfs: fix the comment on btrfs_update_time
btrfs: use vfs_utimes to update file timestamps
fs: export vfs_utimes
fs: lift the FMODE_NOCMTIME check into file_update_time_flags
fs: refactor file timestamp update logic
include/linux/fs.h: trivial fix: regualr -> regular
fs/splice.c: trivial fix: pipes -> pipe's
fs: mark lookup_slow() as noinline
fs: add predicts based on nd->depth
fs: move mntput_no_expire() slowpath into a dedicated routine
fs: remove spurious exports in fs/file_attr.c
watch_queue: Use local kmap in post_one_notification()
fs: touch up predicts in path lookup
fs: move fd_install() slowpath into a dedicated routine and provide commentary
fs: hide dentry_cache behind runtime const machinery
fs: touch predicts in do_dentry_open()
...
* kvm-arm64/nv-xnx-haf: (22 commits)
: Support for FEAT_XNX and FEAT_HAF in nested
:
: Add support for a couple of MMU-related features that weren't
: implemented by KVM's software page table walk:
:
: - FEAT_XNX: Allows the hypervisor to describe execute permissions
: separately for EL0 and EL1
:
: - FEAT_HAF: Hardware update of the Access Flag, which in the context of
: nested means software walkers must also set the Access Flag.
:
: The series also adds some basic support for testing KVM's emulation of
: the AT instruction, including the implementation detail that AT sets the
: Access Flag in KVM.
KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
KVM: arm64: selftests: Add test for AT emulation
KVM: arm64: nv: Expose hardware access flag management to NV guests
KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
KVM: arm64: Implement HW access flag management in stage-1 SW PTW
KVM: arm64: Propagate PTW errors up to AT emulation
KVM: arm64: Add helper for swapping guest descriptor
KVM: arm64: nv: Use pgtable definitions in stage-2 walk
KVM: arm64: Handle endianness in read helper for emulated PTW
KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
KVM: arm64: Call helper for reading descriptors directly
KVM: arm64: nv: Advertise support for FEAT_XNX
KVM: arm64: Teach ptdump about FEAT_XNX permissions
KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
...
Signed-off-by: Oliver Upton <oupton@kernel.org>
* for-next/sysreg:
: arm64 sysreg updates/cleanups
arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
KVM: arm64: selftests: Consider all 7 possible levels of cache
KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
arm64/sysreg: Add ICH_VMCR_EL2
arm64/sysreg: Move generation of RES0/RES1/UNKN to function
arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor
arm64/sysreg: Fix checks for incomplete sysreg definitions
arm64/sysreg: Replace TCR_EL1 field macros
* arm64/for-next/perf:
perf: arm_spe: Add support for filtering on data source
perf: Add perf_event_attr::config4
perf/imx_ddr: Add support for PMU in DB (system interconnects)
perf/imx_ddr: Get and enable optional clks
perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY
perf/arm-ni: Fix and optimise register offset calculation
perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs
perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister()
perf/arm-ni: Add NoC S3 support
perf/arm_cspmu: nvidia: Add pmevfiltr2 support
perf/arm_cspmu: nvidia: Add revision id matching
perf/arm_cspmu: Add pmpidr support
perf/arm_cspmu: Add callback to reset filter config
perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
* for-next/misc:
: Miscellaneous patches
arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
arm64: mm: use untagged address to calculate page index
arm64: mm: make linear mapping permission update more robust for patial range
arm64/mm: Elide TLB flush in certain pte protection transitions
arm64/mm: Rename try_pgd_pgtable_alloc_init_mm
arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors
arm64: add unlikely hint to MTE async fault check in el0_svc_common
arm64: acpi: add newline to deferred APEI warning
arm64: entry: Clean out some indirection
arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52
arm64/mm: Drop cpu_set_[default|idmap]_tcr_t0sz()
arm64: remove unused ARCH_PFN_OFFSET
arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack
arm64: Remove assertion on CONFIG_VMAP_STACK
* for-next/kselftest:
: arm64 kselftest patches
kselftest/arm64: Align zt-test register dumps
* for-next/efi-preempt:
: arm64: Make EFI calls preemptible
arm64/efi: Call EFI runtime services without disabling preemption
arm64/efi: Move uaccess en/disable out of efi_set_pgd()
arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper
arm64/fpsimd: Permit kernel mode NEON with IRQs off
arm64/fpsimd: Don't warn when EFI execution context is preemptible
efi/runtime-wrappers: Keep track of the efi_runtime_lock owner
efi: Add missing static initializer for efi_mm::cpus_allowed_lock
* for-next/assembler-macro:
: arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers
arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
* for-next/typos:
: Random typo/spelling fixes
arm64: Fix double word in comments
arm64: Fix typos and spelling errors in comments
* for-next/sme-ptrace-disable:
: Support disabling streaming mode via ptrace on SME only systems
kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace
kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems
arm64/sme: Support disabling streaming mode via ptrace on SME only systems
* for-next/local-tlbi-page-reused:
: arm64, mm: avoid TLBI broadcast if page reused in write fault
arm64, tlbflush: don't TLBI broadcast if page reused in write fault
mm: add spurious fault fixing support for huge pmd
* for-next/mpam: (34 commits)
: Basic Arm MPAM driver (more to follow)
MAINTAINERS: new entry for MPAM Driver
arm_mpam: Add kunit tests for props_mismatch()
arm_mpam: Add kunit test for bitmap reset
arm_mpam: Add helper to reset saved mbwu state
arm_mpam: Use long MBWU counters if supported
arm_mpam: Probe for long/lwd mbwu counters
arm_mpam: Consider overflow in bandwidth counter state
arm_mpam: Track bandwidth counter state for power management
arm_mpam: Add mpam_msmon_read() to read monitor value
arm_mpam: Add helpers to allocate monitors
arm_mpam: Probe and reset the rest of the features
arm_mpam: Allow configuration to be applied and restored during cpu online
arm_mpam: Use a static key to indicate when mpam is enabled
arm_mpam: Register and enable IRQs
arm_mpam: Extend reset logic to allow devices to be reset any time
arm_mpam: Add a helper to touch an MSC from any CPU
arm_mpam: Reset MSC controls from cpuhp callbacks
arm_mpam: Merge supported features during mpam_enable() into mpam_class
arm_mpam: Probe the hardware features resctrl supports
arm_mpam: Add helpers for managing the locking around the mon_sel registers
...
* for-next/acpi:
: arm64 acpi updates
ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
* for-next/documentation:
: arm64 Documentation updates
Documentation/arm64: Fix the typo of register names
Suzuki notices that making the ICH_HCR_EL2_TDIR capability a system
one isn't a very good idea, should we end-up with CPUs that have
asymmetric TDIR support (somehow unlikely, but you never know what
level of stupidity vendors are up to). For this hypothetical setup,
making this an "EARLY_LOCAL_CPU_FEATURE" is a much better option.
This is actually consistent with what we already do with GICv5
legacy interface, so flip the capability over.
Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fixes: 2a28810cbb ("KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping")
Link: https://lore.kernel.org/r/5df713d4-8b79-4456-8fd1-707ca89a61b6@arm.com
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://msgid.link/20251125160144.1086511-1-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
Pull arm64 fixes from Will Deacon:
"We've got a revert due to one of the recent CCA commits breaking ACPI
firmware-based error reporting, a fix for a hard-lockup introduced by
a prior fix affecting non-default (CONFIG_EXPERT) configurations and
another ACPI fix for systems using MMIO-based timers.
Other than that, we're looking pretty good.
- Avoid hardlockup when CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY=n
- Fix regression in APEI/GHES error handling
- Fix MMIO timers when probed via ACPI"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: proton-pack: Fix hard lockup when !MITIGATE_SPECTRE_BRANCH_HISTORY
ACPI: GTDT: Correctly number platform devices for MMIO timers
Revert "arm64: acpi: Enable ACPI CCEL support"
A long time ago, an unsuspecting architect forgot to add a trap
bit for ICV_DIR_EL1 in ICH_HCR_EL2. Which was unfortunate, but
what's a bit of spec between friends? Thankfully, this was fixed
in a later revision, and ARM "deprecates" the lack of trapping
ability.
Unfortuantely, a few (billion) CPUs went out with that defect,
anything ARMv8.0 from ARM, give or take. And on these CPUs,
you can't trap DIR on its own, full stop.
As the next best thing, we can trap everything in the common group,
which is a tad expensive, but hey ho, that's what you get. You can
otherwise recycle the HW in the neaby bin.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-7-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
The trap bits are currently only set to manage CPU errata. However,
we are about to make use of them for purposes beyond beating broken
CPUs into submission.
For this purpose, turn these errata-driven bits into a patched-in
constant that is merged with the KVM-driven value at the point of
programming the ICH_HCR_EL2 register, rather than being directly
stored with with the shadow value..
This allows the KVM code to distinguish between a trap being handled
for the purpose of an erratum workaround, or for KVM's own need.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-5-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
The "drop print" commit removed the whole branch and not just the print.
For some ARM64 cpus, this leads to hard lockup when
CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY is not enabled.
Fixes: 62e72463ca ("arm64: proton-pack: Drop print when !CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Will Deacon <will@kernel.org>
This reverts commit d02c2e45b1.
Mauro reports that this breaks APEI notifications on his QEMU setup
because the "reserved for firmware" region still needs to be writable
by Linux in order to signal _back_ to the firmware after processing
the reported error:
| {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
| ...
| [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error
| Unable to handle kernel write to read-only memory at virtual address ffff800080035018
| Mem abort info:
| ESR = 0x000000009600004f
| EC = 0x25: DABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| FSC = 0x0f: level 3 permission fault
| Data abort info:
| ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000000
| CM = 0, WnR = 1, TnD = 0, TagAccess = 0
| GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
| swapper pgtable: 4k pages, 52-bit VAs, pgdp=00000000505d7000
| pgd=10000000510bc003, p4d=1000000100229403, pud=100000010022a403, pmd=100000010022b403, pte=0060000139b90483
| Internal error: Oops: 000000009600004f [#1] SMP
For now, revert the offending commit. We can presumably switch back to
PAGE_KERNEL when bringing this back in the future.
Link: https://lore.kernel.org/r/20251121224611.07efa95a@foz.lan
Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Currently it is not possible to disable streaming mode via ptrace on SME
only systems, the interface for doing this is to write via NT_ARM_SVE but
such writes will be rejected on a system without SVE support. Enable this
functionality by allowing userspace to write SVE_PT_REGS_FPSIMD format data
via NT_ARM_SVE with the vector length set to 0 on SME only systems. Such
writes currently error since we require that a vector length is specified
which should minimise the risk that existing software is relying on current
behaviour.
Reads are not supported since I am not aware of any use case for this and
there is some risk that an existing userspace application may be confused if
it reads NT_ARM_SVE on a system without SVE. Existing kernels will return
FPSIMD formatted register state from NT_ARM_SVE if full SVE state is not
stored, for example if the task has not used SVE. Returning a vector length
of 0 would create a risk that software would try to do things like allocate
space for register state with zero sizes, while returning a vector length of
128 bits would look like SVE is supported. It seems safer to just not make
the changes to add read support.
It remains possible for userspace to detect a SME only system via the ptrace
interface only since reads of NT_ARM_SSVE and NT_ARM_ZA will succeed while
reads of NT_ARM_SVE will fail. Read/write access to the FPSIMD registers in
non-streaming mode is available via REGSET_FPR.
sve_set_common() already avoids allocating SVE storage when doing a FPSIMD
formatted write and allocating SME storage when doing a NT_ARM_SVE write so
we change the function to validate the new case and skip setting a vector
length for it.
The aim is to make a minimally invasive change, no operation that would
previously have succeeded will be affected, and we use a previously
defined interface in new circumstances rather than define completely new
ABI.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: David Spickett <david.spickett@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull fpsimd-on-stack changes from Ard Biesheuvel:
"Shared tag/branch for arm64 FP/SIMD changes going through libcrypto"
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
This patch corrects several minor typographical and spelling errors
in comments across multiple arm64 source files.
No functional changes.
Signed-off-by: mrigendrachaubey <mrigendra.chaubey@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit aefbab8e77
("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch")
added a 'kernel_fpsimd_state' field to struct thread_struct, which is
the arch-specific portion of struct task_struct, and is allocated for
each task in the system. The size of this field is 528 bytes, resulting
in non-negligible bloat of task_struct, and the resulting memory
overhead may impact performance on systems with many processes.
This allocation is only used if the task is scheduled out or interrupted
by a softirq while using the FP/SIMD unit in kernel mode, and so it is
possible to transparently allocate this buffer on the caller's stack
instead.
So tweak the 'ksimd' scoped guard implementation so that a stack buffer
is allocated and passed to both kernel_neon_begin() and
kernel_neon_end(), and either record it in the task struct, or use it
directly to preserve the task mode kernel FP/SIMD when running in
softirq context. Passing the address to both functions, and checking the
addresses for consistency ensures that callers of the updated bare
begin/end API use it in a manner that is consistent with the new context
switch semantics.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Add unlikely() hint to the _TIF_MTE_ASYNC_FAULT flag check in
el0_svc_common() since asynchronous MTE faults are expected to be
rare occurrences during normal system call execution.
This optimization helps the compiler to improve instruction caching
and branch prediction for the common case where no asynchronous
MTE faults are pending, while maintaining correct behavior for
the exceptional case where such faults need to be handled prior
to system call execution.
Signed-off-by: Li Qiang <liqiang01@kylinos.cn>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The conversion to generic IRQ entry left some functions
in the EL1 (kernel) IRQ entry path very shallow, so drop
the __inner_functions() where appropriate, saving some
time and stack.
This is not a fix but an optimization.
Drop stale comments about irqentry_enter/exit() while we
are at it.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The only remaining reason why EFI runtime services are invoked with
preemption disabled is the fact that the mm is swapped out behind the
back of the context switching code.
The kernel no longer disables preemption in kernel_neon_begin().
Furthermore, the EFI spec is being clarified to explicitly state that
only baseline FP/SIMD is permitted in EFI runtime service
implementations, and so the existing kernel mode NEON context switching
code is sufficient to preserve and restore the execution context of an
in-progress EFI runtime service call.
Most EFI calls are made from the efi_rts_wq, which is serviced by a
kthread. As kthreads never return to user space, they usually don't have
an mm, and so we can use the existing infrastructure to swap in the
efi_mm while the EFI call is in progress. This is visible to the
scheduler, which will therefore reactivate the selected mm when
switching out the kthread and back in again.
Given that the EFI spec explicitly permits runtime services to be called
with interrupts enabled, firmware code is already required to tolerate
interruptions. So rather than disable preemption, disable only migration
so that EFI runtime services are less likely to cause scheduling delays.
To avoid potential issues where runtime services are interrupted while
polling the secure firmware for async completions, keep migration
disabled so that a runtime service invocation does not resume on a
different CPU from the one it was started on.
Note, though, that the firmware executes at the same privilege level as
the kernel, and is therefore able to disable interrupts altogether.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
efi_set_pgd() will no longer be called when invoking EFI runtime
services via the efi_rts_wq work queue, but the uaccess en/disable are
still needed when using PAN emulation using TTBR0 switching. So move
these into the callers.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since commit
5894cf571e ("acpi/prmt: Use EFI runtime sandbox to invoke PRM handlers")
all EFI runtime calls on arm64 are routed via the EFI runtime wrappers,
which are serialized using the efi_runtime_lock semaphore.
This means the efi_rt_lock spinlock in the arm64 arch wrapper code has
become redundant, and can be dropped. For robustness, replace it with an
assert that the EFI runtime lock is in fact held by 'current'.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, may_use_simd() will return false when called from a context
where IRQs are disabled. One notable case where this happens is when
calling the ResetSystem() EFI runtime service from the reboot/poweroff
code path. For this case alone, there is a substantial amount of FP/SIMD
support code to handle the corner case where a EFI runtime service is
invoked with IRQs disabled.
The only reason kernel mode SIMD is not allowed when IRQs are disabled
is that re-enabling softirqs in this case produces a noisy diagnostic
when lockdep is enabled. The warning is valid, in the sense that
delivering pending softirqs over the back of the call to
local_bh_enable() is problematic when IRQs are disabled.
While the API lacks a facility to simply mask and unmask softirqs
without triggering their delivery, disabling softirqs is not needed to
begin with when IRQs are disabled, given that softirqs are only every
taken asynchronously over the back of a hard IRQ.
So dis/enable softirq processing conditionally, based on whether IRQs
are enabled, and relax the check in may_use_simd().
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Kernel mode FP/SIMD no longer requires preemption to be disabled, so
only warn on uses of FP/SIMD from preemptible context if the fallback
path is taken for cases where kernel mode NEON would not be allowed
otherwise.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull arm64 fixes from Will Deacon:
"There's more here than I would ideally like at this stage, but there's
been a steady trickle of fixes and some of them took a few rounds of
review.
The bulk of the changes are fixing some fallout from the recent BBM
level two support which allows the linear map to be split from block
to page mappings at runtime, but inadvertently led to sleeping in
atomic context on some paths where the linear map was already mapped
with page granularity. The fix is simply to avoid splitting in those
cases but the implementation of that is a little involved.
The other interesting fix is addressing a catastophic performance
issue with our per-cpu atomics discovered by Paul in the SRCU locking
code but which took some interactions with the hardware folks to
resolve.
Summary:
- Avoid sleeping in atomic context when changing linear map
permissions for DEBUG_PAGEALLOC or KFENCE
- Rework printing of Spectre mitigation status to avoid hardlockup
when enabling per-task mitigations on the context-switch path
- Reject kernel modules when instruction patching fails either due to
the DWARF-based SCS patching or because of an alternatives callback
residing outside of the core kernel text
- Propagate error when updating kernel memory permissions in kprobes
- Drop pointless, incorrect message when enabling the ACPI SPCR
console
- Use value-returning LSE instructions for per-cpu atomics to reduce
latency in SRCU locking routines"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Reject modules with internal alternative callbacks
arm64: Fail module loading if dynamic SCS patching fails
arm64: proton-pack: Fix hard lockup due to print in scheduler context
arm64: proton-pack: Drop print when !CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY
arm64: mm: Tidy up force_pte_mapping()
arm64: mm: Optimize range_split_to_ptes()
arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context
arm64: kprobes: check the return value of set_memory_rox()
arm64: acpi: Drop message logging SPCR default console
Revert "ACPI: Suppress misleading SPCR console message when SPCR table is absent"
arm64: Use load LSE atomics for the non-return per-CPU atomic operations
Nathan Chancellor <nathan@kernel.org> says:
Shared branch between Kbuild and other trees for enabling
'-fms-extensions' for 6.19.
* tag 'kbuild-ms-extensions-6.19' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: Add '-fms-extensions' to areas with dedicated CFLAGS
Kbuild: enable -fms-extensions
jfs: Rename _inline to avoid conflict with clang's '-fms-extensions'
Link: https://patch.msgid.link/20251101-kbuild-ms-extensions-dedicated-cflags-v1-1-38004aba524b@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
On arm64 with MTE enabled, a page mapped as Normal Tagged (PROT_MTE) in
user space will need to have its allocation tags initialised. This is
normally done in the arm64 set_pte_at() after checking the memory
attributes. Such page is also marked with the PG_mte_tagged flag to avoid
subsequent clearing. Since this relies on having a struct page,
pte_special() mappings are ignored.
Commit d82d09e482 ("mm/huge_memory: mark PMD mappings of the huge zero
folio special") maps the huge zero folio special and the arm64
set_pmd_at() will no longer zero the tags. There is no guarantee that the
tags are zero, especially if parts of this huge page have been previously
tagged.
It's fairly easy to detect this by regularly dropping the caches to
force the reallocation of the huge zero folio.
Allocate the huge zero folio with the __GFP_ZEROTAGS flag. In addition,
do not warn in the arm64 __access_remote_tags() when reading tags from the
huge zero page.
I bundled the arm64 change in here as well since they are both related to
the commit mapping the huge zero folio as special.
[catalin.marinas@arm.com: handle arch mte_zero_clear_page_tags() code issuing MTE instructions]
Link: https://lkml.kernel.org/r/aQi8dA_QpXM8XqrE@arm.com
Link: https://lkml.kernel.org/r/20251031170133.280742-1-catalin.marinas@arm.com
Fixes: d82d09e482 ("mm/huge_memory: mark PMD mappings of the huge zero folio special")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Tested-by: Beleswar Padhi <b-padhi@ti.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Aishwarya TCV <aishwarya.tcv@arm.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For those architectures with HAVE_SOFTIRQ_ON_OWN_STACK use
their dedicated softirq stack when !PREEMPT_RT. This condition
is ensured by SOFTIRQ_ON_OWN_STACK.
Let arm64 use SOFTIRQ_ON_OWN_STACK as well to select its
usage of the stack.
Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
CONFIG_VMAP_STACK is selected by arm64 arch unconditionly since commit
ef6861b8e6 ("arm64: Mandate VMAP_STACK").
Remove the redundant assertion and headers.
Signed-off-by: Dawei Li <dawei.li@linux.dev>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Disallow a module to load if SCS dynamic patching fails for its code. For
module loading, instead of running a dry-run to check for patching errors,
try to run patching in the first run and propagate any errors so module
loading will fail.
Signed-off-by: Adrian Barnaś <abarnas@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Following the pattern established with other Spectre mitigations,
do not print a message when the CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY
Kconfig option is disabled.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: shechenglong <shechenglong@xfusion.com>
Signed-off-by: Will Deacon <will@kernel.org>
Since commit a166563e7e ("arm64: mm: support large block mapping when
rodata=full"), __change_memory_common has more chance to fail due to
memory allocation failure when splitting page table. So check the return
value of set_memory_rox(), then bail out if it fails otherwise we may have
RW memory mapping for kprobes insn page.
Fixes: 195a1b7d83 ("arm64: kprobes: call set_memory_rox() for kprobe page")
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Will Deacon <will@kernel.org>
Commit f5a4af3c75 ("ACPI: Add acpi=nospcr to disable ACPI SPCR as
default console on ARM64") introduced a command line parameter to
prevent using SPCR provided console as default. It also introduced a
message to log this choice.
Drop the message as it is not particularly useful and can be incorrect
in situations where no SPCR is provided by the firmware.
Link: https://lore.kernel.org/all/aQN0YWUYaPYWpgJM@willie-the-truck/
Signed-off-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>
This reverts commit bad3fa2fb9.
Commit bad3fa2fb9 ("ACPI: Suppress misleading SPCR console message
when SPCR table is absent") mistakenly assumes acpi_parse_spcr()
returning 0 to indicate a failure to parse SPCR. While addressing the
resultant incorrect logging it was deemed that dropping the message is
a better approach as it is not particularly useful.
Roll back the commit introducing the bug as a step towards dropping
the log message.
Link: https://lore.kernel.org/all/aQN0YWUYaPYWpgJM@willie-the-truck/
Signed-off-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>
This is a follow up to commit c4781dc3d1 ("Kbuild: enable
-fms-extensions") but in a separate change due to being substantially
different from the initial submission.
There are many places within the kernel that use their own CFLAGS
instead of the main KBUILD_CFLAGS, meaning code written with the main
kernel's use of '-fms-extensions' in mind that may be tangentially
included in these areas will result in "error: declaration does not
declare anything" messages from the compiler.
Add '-fms-extensions' to all these areas to ensure consistency, along
with -Wno-microsoft-anon-tag to silence clang's warning about use of the
extension that the kernel cares about using. parisc does not build with
clang so it does not need this warning flag. LoongArch does not need it
either because -W flags from KBUILD_FLAGS are pulled into cflags-vdso.
Reported-by: Christian Brauner <brauner@kernel.org>
Closes: https://lore.kernel.org/20251030-meerjungfrau-getrocknet-7b46eacc215d@brauner/
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>