linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 06:41:39 -04:00

Author	SHA1	Message	Date
Catalin Marinas	818f644ec6	Merge branches 'for-next/misc' and 'for-next/mpam' into for-next/core * for-next/misc: : Miscellaneous cleanups/fixes virt: arm-cca-guest: fix error check for RSI_INCOMPLETE arm64/hwcap: Include kernel-hwcap.h in list of generated files * for-next/mpam: : Fix an unmount->remount problem with the CDP emulation, uninitialised : variable and checker warnings arm_mpam: resctrl: Make resctrl_mon_ctx_waiters static arm_mpam: resctrl: Fix the check for no monitor components found arm_mpam: resctrl: Fix MBA CDP alloc_capable handling on unmount	2026-04-20 13:11:50 +01:00
Ben Horgan	4d5bbbafc1	arm_mpam: resctrl: Make resctrl_mon_ctx_waiters static resctrl_mon_ctx_waiters is not used outside of this file, so make it static. This fixes the sparse warning: drivers/resctrl/mpam_resctrl.c:25:1: warning: symbol 'resctrl_mon_ctx_waiters' was not declared. Should it be static? Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603281842.c2K96tJA-lkp@intel.com/ Fixes: `2a3c79c615` ("arm_mpam: resctrl: Allow resctrl to allocate monitors") Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-04-17 17:48:07 +01:00
Ben Horgan	67c0a487ef	arm_mpam: resctrl: Fix the check for no monitor components found Dan Carpenter reports that, in mpam_resctrl_alloc_domain(), any_mon_comp is used in an 'if' condition when it may be uninitialized. Initialize it to NULL so that the check behaves correctly when no monitor components are found. Reported-by: Dan Carpenter <error27@gmail.com> Fixes: `264c285999` ("arm_mpam: resctrl: Add monitor initialisation and domain boilerplate") Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-04-17 17:48:07 +01:00
Zeng Heng	f758340da5	arm_mpam: resctrl: Fix MBA CDP alloc_capable handling on unmount The code to set MBA's alloc_capable to true appears to be trying to restore alloc_capable on unmount. This can never work because resctrl_arch_set_cdp_enabled() is never invoked with RDT_RESOURCE_MBA as the rid parameter. Consequently, mpam_resctrl_controls[RDT_RESOURCE_MBA].cdp_enabled always remains false. The alloc_capable setting in resctrl_arch_set_cdp_enabled() is to re-enable MBA if the caller opts in to separate control values using CDP for this resource. This doesn't happen today. Add a comment to describe this. However a bug remains where MBA allocation is permanently disabled after the mount with CDP option. Remounting without CDP cannot restore the MBA partition capability. Add a check to re-enable MBA when CDP is disabled, which happens on unmount. Fixes: `6789fb9928` ("arm_mpam: resctrl: Add CDP emulation") Signed-off-by: Zeng Heng <zengheng4@huawei.com> [ morse: Added comment for existing code, added hunk to fix this bug from Ben H ] Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-04-17 17:43:57 +01:00
Sami Mujawar	e534e9d13d	virt: arm-cca-guest: fix error check for RSI_INCOMPLETE The RSI interface can return RSI_INCOMPLETE when a report spans multiple granules. This is an expected condition and should not be treated as a fatal error. Currently, arm_cca_report_new() checks for `info.result != RSI_SUCCESS` and bails out, which incorrectly flags RSI_INCOMPLETE as a failure. Fix the check to only break out on results other than RSI_SUCCESS or RSI_INCOMPLETE. This ensures partial reports are handled correctly and avoids spurious -ENXIO errors when generating attestation reports. Fixes: `7999edc484` ("virt: arm-cca-guest: TSM_REPORT support for realms") Signed-off-by: Sami Mujawar <sami.mujawar@arm.com> Reported-by: Jagdish Gediya <Jagdish.Gediya@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-14 16:26:08 +01:00
Mark Brown	680b961ebf	arm64/hwcap: Include kernel-hwcap.h in list of generated files When adding generation for the kernel internal constants for hwcaps the generated file was not explicitly flagged as such in the build system, causing it to be regenerated on each build. This wasn't obvious when the series the change was included in was developed since it was all about changes that trigger rebuilds anyway. Fixes: `abed23c3c4` ("arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps") Reported-by: Marek Vasut <marex@nabladev.com> Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-14 16:23:40 +01:00
Catalin Marinas	480a9e57cc	Merge branches 'for-next/misc', 'for-next/tlbflush', 'for-next/ttbr-macros-cleanup', 'for-next/kselftest', 'for-next/feat_lsui', 'for-next/mpam', 'for-next/hotplug-batched-tlbi', 'for-next/bbml2-fixes', 'for-next/sysreg', 'for-next/generic-entry' and 'for-next/acpi', remote-tracking branches 'arm64/for-next/perf' and 'arm64/for-next/read-once' into for-next/core * arm64/for-next/perf: : Perf updates perf/arm-cmn: Fix resource_size_t printk specifier in arm_cmn_init_dtc() perf/arm-cmn: Fix incorrect error check for devm_ioremap() perf: add NVIDIA Tegra410 C2C PMU perf: add NVIDIA Tegra410 CPU Memory Latency PMU perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU perf/arm_cspmu: Add arm_cspmu_acpi_dev_get perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU perf/arm_cspmu: nvidia: Rename doc to Tegra241 perf/arm-cmn: Stop claiming entire iomem region arm64: cpufeature: Use pmuv3_implemented() function arm64: cpufeature: Make PMUVer and PerfMon unsigned KVM: arm64: Read PMUVer as unsigned * arm64/for-next/read-once: : Fixes for __READ_ONCE() with CONFIG_LTO=y arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y arm64: Optimize __READ_ONCE() with CONFIG_LTO=y * for-next/misc: : Miscellaneous cleanups/fixes arm64: rsi: use linear-map alias for realm config buffer arm64: Kconfig: fix duplicate word in CMDLINE help text arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps arm64: kexec: Remove duplicate allocation for trans_pgd arm64: mm: Use generic enum pgtable_level arm64: scs: Remove redundant save/restore of SCS SP on entry to/from EL0 arm64: remove ARCH_INLINE_* * for-next/tlbflush: : Refactor the arm64 TLB invalidation API and implementation arm64: mm: __ptep_set_access_flags must hint correct TTL arm64: mm: Provide level hint for flush_tlb_page() arm64: mm: Wrap flush_tlb_page() around __do_flush_tlb_range() arm64: mm: More flags for __flush_tlb_range() arm64: mm: Refactor __flush_tlb_range() to take flags arm64: mm: Refactor flush_tlb_page() to use __tlbi_level_asid() arm64: mm: Simplify __flush_tlb_range_limit_excess() arm64: mm: Simplify __TLBI_RANGE_NUM() macro arm64: mm: Re-implement the __flush_tlb_range_op macro in C arm64: mm: Inline __TLBI_VADDR_RANGE() into __tlbi_range() arm64: mm: Push __TLBI_VADDR() into __tlbi_level() arm64: mm: Implicitly invalidate user ASID based on TLBI operation arm64: mm: Introduce a C wrapper for by-range TLB invalidation arm64: mm: Re-implement the __tlbi_level macro as a C function * for-next/ttbr-macros-cleanup: : Cleanups of the TTBR1_* macros arm64/mm: Directly use TTBRx_EL1_CnP arm64/mm: Directly use TTBRx_EL1_ASID_MASK arm64/mm: Describe TTBR1_BADDR_4852_OFFSET * for-next/kselftest: : arm64 kselftest updates selftests/arm64: Implement cmpbr_sigill() to hwcap test * for-next/feat_lsui: : Futex support using FEAT_LSUI instructions to avoid toggling PAN arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present arm64: Kconfig: Add support for LSUI KVM: arm64: Use CAST instruction for swapping guest descriptor arm64: futex: Support futex with FEAT_LSUI arm64: futex: Refactor futex atomic operation KVM: arm64: kselftest: set_id_regs: Add test for FEAT_LSUI KVM: arm64: Expose FEAT_LSUI to guests arm64: cpufeature: Add FEAT_LSUI * for-next/mpam: (40 commits) : Expose MPAM to user-space via resctrl: : - Add architecture context-switch and hiding of the feature from KVM. : - Add interface to allow MPAM to be exposed to user-space using resctrl. : - Add errata workaoround for some existing platforms. : - Add documentation for using MPAM and what shape of platforms can use resctrl arm64: mpam: Add initial MPAM documentation arm_mpam: Quirk CMN-650's CSU NRDY behaviour arm_mpam: Add workaround for T241-MPAM-6 arm_mpam: Add workaround for T241-MPAM-4 arm_mpam: Add workaround for T241-MPAM-1 arm_mpam: Add quirk framework arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl arm64: mpam: Select ARCH_HAS_CPU_RESCTRL arm_mpam: resctrl: Add empty definitions for assorted resctrl functions arm_mpam: resctrl: Update the rmid reallocation limit arm_mpam: resctrl: Add resctrl_arch_rmid_read() arm_mpam: resctrl: Allow resctrl to allocate monitors arm_mpam: resctrl: Add support for csu counters arm_mpam: resctrl: Add monitor initialisation and domain boilerplate arm_mpam: resctrl: Add kunit test for control format conversions arm_mpam: resctrl: Add support for 'MB' resource arm_mpam: resctrl: Wait for cacheinfo to be ready arm_mpam: resctrl: Add rmid index helpers arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats arm_mpam: resctrl: Hide CDP emulation behind CONFIG_EXPERT ... * for-next/hotplug-batched-tlbi: : arm64/mm: Enable batched TLB flush in unmap_hotplug_range() arm64/mm: Reject memory removal that splits a kernel leaf mapping arm64/mm: Enable batched TLB flush in unmap_hotplug_range() * for-next/bbml2-fixes: : Fixes for realm guest and BBML2_NOABORT arm64: mm: Remove pmd_sect() and pud_sect() arm64: mm: Handle invalid large leaf mappings correctly arm64: mm: Fix rodata=full block mapping support for realm guests * for-next/sysreg: : arm64 sysreg updates arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06 * for-next/generic-entry: : More arm64 refactoring towards using the generic entry code arm64: Check DAIF (and PMR) at task-switch time arm64: entry: Use split preemption logic arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() arm64: entry: Consistently prefix arm64-specific wrappers arm64: entry: Don't preempt with SError or Debug masked entry: Split preemption from irqentry_exit_to_kernel_mode() entry: Split kernel mode logic from irqentry_{enter,exit}() entry: Move irqentry_enter() prototype later entry: Remove local_irq_{enable,disable}_exit_to_user() entry: Fix stale comment for irqentry_enter() * for-next/acpi: : arm64 ACPI updates ACPI: AGDI: fix missing newline in error message	2026-04-10 14:22:24 +01:00
Aneesh Kumar K.V (Arm)	34e563947c	arm64: rsi: use linear-map alias for realm config buffer rsi_get_realm_config() passes its argument to virt_to_phys(), but &config is a kernel image address and not a linear-map alias. On arm64 this triggers the below warning: virt_to_phys used for non-linear address: (____ptrval____) (config+0x0/0x1000) WARNING: arch/arm64/mm/physaddr.c:15 at __virt_to_phys+0x50/0x70, CPU#0: swapper/0 Modules linked in: ..... Hardware name: linux,dummy-virt (DT) pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __virt_to_phys+0x50/0x70 lr : __virt_to_phys+0x4c/0x70 ..... ...... Call trace: __virt_to_phys+0x50/0x70 (P) arm64_rsi_init+0xa0/0x1b8 setup_arch+0x13c/0x1a0 start_kernel+0x68/0x398 __primary_switched+0x88/0x90 Pass lm_alias(&config) instead so the RSI call uses the linear-map alias of the same buffer and avoids the boot-time warning. Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-10 11:52:04 +01:00
Michael Ugrin	74b63934ab	arm64: Kconfig: fix duplicate word in CMDLINE help text Remove duplicate 'the' in the CMDLINE config help text. Signed-off-by: Michael Ugrin <mugrinphoto@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-10 11:16:24 +01:00
Muhammad Usama Anjum	249bf97331	arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode With KASAN_HW_TAGS (MTE) in synchronous mode, tag check faults are reported as immediate Data Abort exceptions. The TFSR_EL1.TF1 bit is never set since faults never go through the asynchronous path. Therefore, reading TFSR_EL1 and executing data and instruction barriers on kernel entry, exit, context switch and suspend is unnecessary overhead. As with the check_mte_async_tcf and clear_mte_async_tcf paths for TFSRE0_EL1, extend the same optimisation to kernel entry/exit, context switch and suspend. All mte kselftests pass. The kunit before and after the patch show same results. A selection of test_vmalloc benchmarks running on a arm64 machine. v6.19 is the baseline. (>0 is faster, <0 is slower, (R)/(I) = statistically significant Regression/Improvement). Based on significance and ignoring the noise, the benchmarks improved. * 77 result classes were considered, with 9 wins, 0 losses and 68 ties Results of fastpath [1] on v6.19 vs this patch: +----------------------------+----------------------------------------------------------+------------+ \| Benchmark \| Result Class \| barriers \| +============================+==========================================================+============+ \| micromm/fork \| fork: p:1, d:10 (seconds) \| (I) 2.75% \| \| \| fork: p:512, d:10 (seconds) \| 0.96% \| +----------------------------+----------------------------------------------------------+------------+ \| micromm/munmap \| munmap: p:1, d:10 (seconds) \| -1.78% \| \| \| munmap: p:512, d:10 (seconds) \| 5.02% \| +----------------------------+----------------------------------------------------------+------------+ \| micromm/vmalloc \| fix_align_alloc_test: p:1, h:0, l:500000 (usec) \| -0.56% \| \| \| fix_size_alloc_test: p:1, h:0, l:500000 (usec) \| 0.70% \| \| \| fix_size_alloc_test: p:4, h:0, l:500000 (usec) \| 1.18% \| \| \| fix_size_alloc_test: p:16, h:0, l:500000 (usec) \| -5.01% \| \| \| fix_size_alloc_test: p:16, h:1, l:500000 (usec) \| 13.81% \| \| \| fix_size_alloc_test: p:64, h:0, l:100000 (usec) \| 6.51% \| \| \| fix_size_alloc_test: p:64, h:1, l:100000 (usec) \| 32.87% \| \| \| fix_size_alloc_test: p:256, h:0, l:100000 (usec) \| 4.17% \| \| \| fix_size_alloc_test: p:256, h:1, l:100000 (usec) \| 8.40% \| \| \| fix_size_alloc_test: p:512, h:0, l:100000 (usec) \| -0.48% \| \| \| fix_size_alloc_test: p:512, h:1, l:100000 (usec) \| -0.74% \| \| \| full_fit_alloc_test: p:1, h:0, l:500000 (usec) \| 0.53% \| \| \| kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) \| -2.81% \| \| \| kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) \| -2.06% \| \| \| long_busy_list_alloc_test: p:1, h:0, l:500000 (usec) \| -0.56% \| \| \| pcpu_alloc_test: p:1, h:0, l:500000 (usec) \| -0.41% \| \| \| random_size_align_alloc_test: p:1, h:0, l:500000 (usec) \| 0.89% \| \| \| random_size_alloc_test: p:1, h:0, l:500000 (usec) \| 1.71% \| \| \| vm_map_ram_test: p:1, h:0, l:500000 (usec) \| 0.83% \| +----------------------------+----------------------------------------------------------+------------+ \| schbench/thread-contention \| -m 16 -t 1 -r 10 -s 1000, avg_rps (req/sec) \| 0.05% \| \| \| -m 16 -t 1 -r 10 -s 1000, req_latency_p99 (usec) \| 0.60% \| \| \| -m 16 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) \| 0.00% \| \| \| -m 16 -t 4 -r 10 -s 1000, avg_rps (req/sec) \| -0.34% \| \| \| -m 16 -t 4 -r 10 -s 1000, req_latency_p99 (usec) \| -0.58% \| \| \| -m 16 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) \| 9.09% \| \| \| -m 16 -t 16 -r 10 -s 1000, avg_rps (req/sec) \| -0.74% \| \| \| -m 16 -t 16 -r 10 -s 1000, req_latency_p99 (usec) \| -1.40% \| \| \| -m 16 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) \| 0.00% \| \| \| -m 16 -t 64 -r 10 -s 1000, avg_rps (req/sec) \| -0.78% \| \| \| -m 16 -t 64 -r 10 -s 1000, req_latency_p99 (usec) \| -0.11% \| \| \| -m 16 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) \| 0.11% \| \| \| -m 16 -t 256 -r 10 -s 1000, avg_rps (req/sec) \| 2.64% \| \| \| -m 16 -t 256 -r 10 -s 1000, req_latency_p99 (usec) \| 3.15% \| \| \| -m 16 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) \| 17.54% \| \| \| -m 32 -t 1 -r 10 -s 1000, avg_rps (req/sec) \| -1.22% \| \| \| -m 32 -t 1 -r 10 -s 1000, req_latency_p99 (usec) \| 0.85% \| \| \| -m 32 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) \| 0.00% \| \| \| -m 32 -t 4 -r 10 -s 1000, avg_rps (req/sec) \| -0.34% \| \| \| -m 32 -t 4 -r 10 -s 1000, req_latency_p99 (usec) \| 1.05% \| \| \| -m 32 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) \| 0.00% \| \| \| -m 32 -t 16 -r 10 -s 1000, avg_rps (req/sec) \| -0.41% \| \| \| -m 32 -t 16 -r 10 -s 1000, req_latency_p99 (usec) \| 0.58% \| \| \| -m 32 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) \| 2.13% \| \| \| -m 32 -t 64 -r 10 -s 1000, avg_rps (req/sec) \| 0.67% \| \| \| -m 32 -t 64 -r 10 -s 1000, req_latency_p99 (usec) \| 2.07% \| \| \| -m 32 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) \| -1.28% \| \| \| -m 32 -t 256 -r 10 -s 1000, avg_rps (req/sec) \| 1.01% \| \| \| -m 32 -t 256 -r 10 -s 1000, req_latency_p99 (usec) \| 0.69% \| \| \| -m 32 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) \| 13.12% \| \| \| -m 64 -t 1 -r 10 -s 1000, avg_rps (req/sec) \| -0.25% \| \| \| -m 64 -t 1 -r 10 -s 1000, req_latency_p99 (usec) \| -0.48% \| \| \| -m 64 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) \| 10.53% \| \| \| -m 64 -t 4 -r 10 -s 1000, avg_rps (req/sec) \| -0.06% \| \| \| -m 64 -t 4 -r 10 -s 1000, req_latency_p99 (usec) \| 0.00% \| \| \| -m 64 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) \| 0.00% \| \| \| -m 64 -t 16 -r 10 -s 1000, avg_rps (req/sec) \| -0.36% \| \| \| -m 64 -t 16 -r 10 -s 1000, req_latency_p99 (usec) \| 0.52% \| \| \| -m 64 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) \| 0.11% \| \| \| -m 64 -t 64 -r 10 -s 1000, avg_rps (req/sec) \| 0.52% \| \| \| -m 64 -t 64 -r 10 -s 1000, req_latency_p99 (usec) \| 3.53% \| \| \| -m 64 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) \| -0.10% \| \| \| -m 64 -t 256 -r 10 -s 1000, avg_rps (req/sec) \| 2.53% \| \| \| -m 64 -t 256 -r 10 -s 1000, req_latency_p99 (usec) \| 1.82% \| \| \| -m 64 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) \| -5.80% \| +----------------------------+----------------------------------------------------------+------------+ \| syscall/getpid \| mean (ns) \| (I) 15.98% \| \| \| p99 (ns) \| (I) 11.11% \| \| \| p99.9 (ns) \| (I) 16.13% \| +----------------------------+----------------------------------------------------------+------------+ \| syscall/getppid \| mean (ns) \| (I) 14.82% \| \| \| p99 (ns) \| (I) 17.86% \| \| \| p99.9 (ns) \| (I) 9.09% \| +----------------------------+----------------------------------------------------------+------------+ \| syscall/invalid \| mean (ns) \| (I) 17.78% \| \| \| p99 (ns) \| (I) 11.11% \| \| \| p99.9 (ns) \| 13.33% \| +----------------------------+----------------------------------------------------------+------------+ [1] https://gitlab.arm.com/tooling/fastpath Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-09 18:56:22 +01:00
Mark Brown	306736fd51	arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12 The 2025 extensions add FEAT_SME2P3, including LUT6. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-09 16:50:55 +01:00
Mark Brown	bf56250f34	arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12 The 2025 extensions add FEAT_SVE2P3 and FEAT_SVE_B16MM. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-09 16:50:55 +01:00
Mark Brown	d74576b51b	arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12 The 2025 extensions add FEAT_F16MM and adjust some of the RES0 bits to be RAZ instead as a placeholder for future extensions. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-09 16:50:55 +01:00
Mark Brown	bb5e1e5405	arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12 The 2025 extensions update the LUT field for new instructions added by SVE and SME 2.3, there is no separate FEAT_ feature for these. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-09 16:50:55 +01:00
Mark Brown	b964aa8d68	arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12 The 2025 extensions add FEAT_F16F32DOT and FEAT_F16F32MM. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-09 16:50:55 +01:00
Mark Brown	abed23c3c4	arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps Currently for each hwcap we define both the HWCAPn_NAME definition which is exposed to userspace and a kernel internal KERNEL_HWCAP_NAME definition which we use internally. This is tedious and repetitive, instead use a script to generate the KERNEL_HWCAP_ definitions from the UAPI definitions. No functional changes intended. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-09 16:50:23 +01:00
Wang Wensheng	ee020bf6f1	arm64: kexec: Remove duplicate allocation for trans_pgd trans_pgd would be allocated in trans_pgd_create_copy(), so remove the duplicate allocation before calling trans_pgd_create_copy(). Fixes: `3744b5280e` ("arm64: kexec: install a copy of the linear-map") Signed-off-by: Wang Wensheng <wsw9603@163.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-08 17:49:08 +01:00
Haoyu Lu	b178330b67	ACPI: AGDI: fix missing newline in error message Add the missing trailing newline to the dev_err() message printed when SDEI event registration fails. This keeps the error output as a properly terminated log line. Fixes: `a2a591fb76` ("ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device") Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Haoyu Lu <hechushiguitu666@gmail.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-08 17:45:06 +01:00
Mark Rutland	8d13386c76	arm64: Check DAIF (and PMR) at task-switch time When __switch_to() switches from a 'prev' task to a 'next' task, various pieces of CPU state are expected to have specific values, such that these do not need to be saved/restored. If any of these hold an unexpected value when switching away from the prev task, they could lead to surprising behaviour in the context of the next task, and it would be difficult to determine where they were configured to their unexpected value. Add some checks for DAIF and PMR at task-switch time so that we can detect such issues. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jinjie Ruan <ruanjinjie@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-08 17:40:04 +01:00
Mark Rutland	ae654112ea	arm64: entry: Use split preemption logic The generic irqentry code now provides irqentry_exit_to_kernel_mode_preempt() and irqentry_exit_to_kernel_mode_after_preempt(), which can be used where architectures have different state requirements for involuntary preemption and exception return, as is the case on arm64. Use the new functions on arm64, aligning our exit to kernel mode logic with the style of our exit to user mode logic. This removes the need for the recently-added bodge in arch_irqentry_exit_need_resched(), and allows preemption to occur when returning from any exception taken from kernel mode, which is nicer for RT. In an ideal world, we'd remove arch_irqentry_exit_need_resched(), and fold the conditionality directly into the architecture-specific entry code. That way all the logic necessary to avoid preempting from a pseudo-NMI could be constrained specifically to the EL1 IRQ/FIQ paths, avoiding redundant work for other exceptions, and making the flow a bit clearer. At present it looks like that would require a larger refactoring (e.g. for the PREEMPT_DYNAMIC logic), and so I've left that as-is for now. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jinjie Ruan <ruanjinjie@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-08 17:40:04 +01:00
Mark Rutland	a07b7b2142	arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() The generic irqentry code now provides irqentry_enter_from_kernel_mode() and irqentry_exit_to_kernel_mode(), which can be used when an exception is known to be taken from kernel mode. These can be inlined into architecture-specific entry code, and avoid redundant work to test whether the exception was taken from user mode. Use these in arm64_enter_from_kernel_mode() and arm64_exit_to_kernel_mode(), which are only used for exceptions known to be taken from kernel mode. This will remove a small amount of redundant work, and will permit further changes to arm64_exit_to_kernel_mode() in subsequent patches. There should be no funcitonal change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jinjie Ruan <ruanjinjie@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-08 17:40:04 +01:00
Mark Rutland	6879ef1302	arm64: entry: Consistently prefix arm64-specific wrappers For historical reasons, arm64's entry code has arm64-specific functions named enter_from_kernel_mode() and exit_to_kernel_mode(), which are wrappers for similarly-named functions from the generic irqentry code. Other arm64-specific wrappers have an 'arm64_' prefix to clearly distinguish them from their generic counterparts, e.g. arm64_enter_from_user_mode() and arm64_exit_to_user_mode(). For consistency and clarity, add an 'arm64_' prefix to these functions. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jinjie Ruan <ruanjinjie@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-08 17:40:04 +01:00
Mark Rutland	2371bd83b3	arm64: entry: Don't preempt with SError or Debug masked On arm64, involuntary kernel preemption has been subtly broken since the move to the generic irqentry code. When preemption occurs, the new task may run with SError and Debug exceptions masked unexpectedly, leading to a loss of RAS events, breakpoints, watchpoints, and single-step exceptions. Prior to moving to the generic irqentry code, involuntary preemption of kernel mode would only occur when returning from regular interrupts, in a state where interrupts were masked and all other arm64-specific exceptions (SError, Debug, and pseudo-NMI) were unmasked. This is the only state in which it is valid to switch tasks. As part of moving to the generic irqentry code, the involuntary preemption logic was moved such that involuntary preemption could occur when returning from any (non-NMI) exception. As most exception handlers mask all arm64-specific exceptions before this point, preemption could occur in a state where arm64-specific exceptions were masked. This is not a valid state to switch tasks, and resulted in the loss of exceptions described above. As a temporary bodge, avoid the loss of exceptions by avoiding involuntary preemption when SError and/or Debug exceptions are masked. Practically speaking this means that involuntary preemption will only occur when returning from regular interrupts, as was the case before moving to the generic irqentry code. Fixes: `99eb057ccd` ("arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()") Reported-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jinjie Ruan <ruanjinjie@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-08 17:40:04 +01:00
Mark Rutland	041aa7a853	entry: Split preemption from irqentry_exit_to_kernel_mode() Some architecture-specific work needs to be performed between the state management for exception entry/exit and the "real" work to handle the exception. For example, arm64 needs to manipulate a number of exception masking bits, with different exceptions requiring different masking. Generally this can all be hidden in the architecture code, but for arm64 the current structure of irqentry_exit_to_kernel_mode() makes this particularly difficult to handle in a way that is correct, maintainable, and efficient. The gory details are described in the thread surrounding: https://lore.kernel.org/lkml/acPAzdtjK5w-rNqC@J2N7QTR9R3/ The summary is: * Currently, irqentry_exit_to_kernel_mode() handles both involuntary preemption AND state management necessary for exception return. * When scheduling (including involuntary preemption), arm64 needs to have all arm64-specific exceptions unmasked, though regular interrupts must be masked. * Prior to the state management for exception return, arm64 needs to mask a number of arm64-specific exceptions, and perform some work with these exceptions masked (with RCU watching, etc). While in theory it is possible to handle this with a new arch_*() hook called somewhere under irqentry_exit_to_kernel_mode(), this is fragile and complicated, and doesn't match the flow used for exception return to user mode, which has a separate 'prepare' step (where preemption can occur) prior to the state management. To solve this, refactor irqentry_exit_to_kernel_mode() to match the style of {irqentry,syscall}_exit_to_user_mode(), moving preemption logic into a new irqentry_exit_to_kernel_mode_preempt() function, and moving state management in a new irqentry_exit_to_kernel_mode_after_preempt() function. The existing irqentry_exit_to_kernel_mode() is left as a caller of both of these, avoiding the need to modify existing callers. There should be no functional change as a result of this change. [ tglx: Updated kernel doc ] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260407131650.3813777-6-mark.rutland@arm.com	2026-04-08 11:43:32 +02:00
Mark Rutland	c5538d0141	entry: Split kernel mode logic from irqentry_{enter,exit}() The generic irqentry code has entry/exit functions specifically for exceptions taken from user mode, but doesn't have entry/exit functions specifically for exceptions taken from kernel mode. It would be helpful to have separate entry/exit functions specifically for exceptions taken from kernel mode. This would make the structure of the entry code more consistent, and would make it easier for architectures to manage logic specific to exceptions taken from kernel mode. Move the logic specific to kernel mode out of irqentry_enter() and irqentry_exit() into new irqentry_enter_from_kernel_mode() and irqentry_exit_to_kernel_mode() functions. These are marked __always_inline and placed in irq-entry-common.h, as with irqentry_enter_from_user_mode() and irqentry_exit_to_user_mode(), so that they can be inlined into architecture-specific wrappers. The existing out-of-line irqentry_enter() and irqentry_exit() functions retained as callers of the new functions. The lockdep assertion from irqentry_exit() is moved into irqentry_exit_to_user_mode() and irqentry_exit_to_kernel_mode(). This was previously missing from irqentry_exit_to_user_mode() when called directly, and any new lockdep assertion failure relating from this change is a latent bug. Aside from the lockdep change noted above, there should be no functional change as a result of this change. [ tglx: Updated kernel doc ] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260407131650.3813777-5-mark.rutland@arm.com	2026-04-08 11:43:32 +02:00
Mark Rutland	eb1b51afde	entry: Move irqentry_enter() prototype later Subsequent patches will rework the irqentry_*() functions. The end result (and the intermediate diffs) will be much clearer if the prototype for the irqentry_enter() function is moved later, immediately before the prototype of the irqentry_exit() function. Move the prototype later. This is purely a move; there should be no functional change as a result of this change. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260407131650.3813777-4-mark.rutland@arm.com	2026-04-08 11:43:31 +02:00
Mark Rutland	22f66e7ef4	entry: Remove local_irq_{enable,disable}_exit_to_user() local_irq_enable_exit_to_user() and local_irq_disable_exit_to_user() are never overridden by architecture code, and are always equivalent to local_irq_enable() and local_irq_disable(). These functions were added on the assumption that arm64 would override them to manage 'DAIF' exception masking, as described by Thomas Gleixner in these threads: https://lore.kernel.org/all/20190919150809.340471236@linutronix.de/ https://lore.kernel.org/all/alpine.DEB.2.21.1910240119090.1852@nanos.tec.linutronix.de/ In practice arm64 did not need to override either. Prior to moving to the generic irqentry code, arm64's management of DAIF was reworked in commit: `97d935faac` ("arm64: Unmask Debug + SError in do_notify_resume()") Since that commit, arm64 only masks interrupts during the 'prepare' step when returning to user mode, and masks other DAIF exceptions later. Within arm64_exit_to_user_mode(), the arm64 entry code is as follows: local_irq_disable(); exit_to_user_mode_prepare_legacy(regs); local_daif_mask(); mte_check_tfsr_exit(); exit_to_user_mode(); Remove the unnecessary local_irq_enable_exit_to_user() and local_irq_disable_exit_to_user() functions. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260407131650.3813777-3-mark.rutland@arm.com	2026-04-08 11:43:31 +02:00
Mark Rutland	1f0d117cd6	entry: Fix stale comment for irqentry_enter() The kerneldoc comment for irqentry_enter() refers to idtentry_exit(), which is an accidental holdover from the x86 entry code that the generic irqentry code was based on. Correct this to refer to irqentry_exit(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260407131650.3813777-2-mark.rutland@arm.com	2026-04-08 11:43:31 +02:00
Mark Brown	85b6f920a8	arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06 Update the definition of SMIDR_EL1 in the sysreg definition to reflect the information in DD0601 2025-06. This includes somewhat more generic ways of describing the sharing of SMCUs, more information on supported priorities and provides additional resolution for describing affinity groups. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-02 21:48:21 +01:00
Ryan Roberts	1d37713fa8	arm64: mm: Remove pmd_sect() and pud_sect() The semantics of pXd_leaf() are very similar to pXd_sect(). The only difference is that pXd_sect() only considers it a section if PTE_VALID is set, whereas pXd_leaf() permits both "valid" and "present-invalid" types. Using pXd_sect() has caused issues now that large leaf entries can be present-invalid since commit `a166563e7e` ("arm64: mm: support large block mapping when rodata=full"), so let's just remove the API and standardize on pXd_leaf(). There are a few callsites of the form pXd_leaf(READ_ONCE(*pXdp)). This was previously fine for the pXd_sect() macro because it only evaluated its argument once. But pXd_leaf() evaluates its argument multiple times. So let's avoid unintended side effects by reimplementing pXd_leaf() as an inline function. Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-02 20:49:16 +01:00
Ryan Roberts	15bfba1ad7	arm64: mm: Handle invalid large leaf mappings correctly It has been possible for a long time to mark ptes in the linear map as invalid. This is done for secretmem, kfence, realm dma memory un/share, and others, by simply clearing the PTE_VALID bit. But until commit `a166563e7e` ("arm64: mm: support large block mapping when rodata=full") large leaf mappings were never made invalid in this way. It turns out various parts of the code base are not equipped to handle invalid large leaf mappings (in the way they are currently encoded) and I've observed a kernel panic while booting a realm guest on a BBML2_NOABORT system as a result: [ 15.432706] software IO TLB: Memory encryption is active and system is using DMA bounce buffers [ 15.476896] Unable to handle kernel paging request at virtual address ffff000019600000 [ 15.513762] Mem abort info: [ 15.527245] ESR = 0x0000000096000046 [ 15.548553] EC = 0x25: DABT (current EL), IL = 32 bits [ 15.572146] SET = 0, FnV = 0 [ 15.592141] EA = 0, S1PTW = 0 [ 15.612694] FSC = 0x06: level 2 translation fault [ 15.640644] Data abort info: [ 15.661983] ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000 [ 15.694875] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 15.723740] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 15.755776] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081f3f000 [ 15.800410] [ffff000019600000] pgd=0000000000000000, p4d=180000009ffff403, pud=180000009fffe403, pmd=00e8000199600704 [ 15.855046] Internal error: Oops: 0000000096000046 [#1] SMP [ 15.886394] Modules linked in: [ 15.900029] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-dirty #4 PREEMPT [ 15.935258] Hardware name: linux,dummy-virt (DT) [ 15.955612] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 15.986009] pc : __pi_memcpy_generic+0x128/0x22c [ 16.006163] lr : swiotlb_bounce+0xf4/0x158 [ 16.024145] sp : ffff80008000b8f0 [ 16.038896] x29: ffff80008000b8f0 x28: 0000000000000000 x27: 0000000000000000 [ 16.069953] x26: ffffb3976d261ba8 x25: 0000000000000000 x24: ffff000019600000 [ 16.100876] x23: 0000000000000001 x22: ffff0000043430d0 x21: 0000000000007ff0 [ 16.131946] x20: 0000000084570010 x19: 0000000000000000 x18: ffff00001ffe3fcc [ 16.163073] x17: 0000000000000000 x16: 00000000003fffff x15: 646e612065766974 [ 16.194131] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 16.225059] x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000018 [ 16.256113] x8 : 0000000000000018 x7 : 0000000000000000 x6 : 0000000000000000 [ 16.287203] x5 : ffff000019607ff0 x4 : ffff000004578000 x3 : ffff000019600000 [ 16.318145] x2 : 0000000000007ff0 x1 : ffff000004570010 x0 : ffff000019600000 [ 16.349071] Call trace: [ 16.360143] __pi_memcpy_generic+0x128/0x22c (P) [ 16.380310] swiotlb_tbl_map_single+0x154/0x2b4 [ 16.400282] swiotlb_map+0x5c/0x228 [ 16.415984] dma_map_phys+0x244/0x2b8 [ 16.432199] dma_map_page_attrs+0x44/0x58 [ 16.449782] virtqueue_map_page_attrs+0x38/0x44 [ 16.469596] virtqueue_map_single_attrs+0xc0/0x130 [ 16.490509] virtnet_rq_alloc.isra.0+0xa4/0x1fc [ 16.510355] try_fill_recv+0x2a4/0x584 [ 16.526989] virtnet_open+0xd4/0x238 [ 16.542775] __dev_open+0x110/0x24c [ 16.558280] __dev_change_flags+0x194/0x20c [ 16.576879] netif_change_flags+0x24/0x6c [ 16.594489] dev_change_flags+0x48/0x7c [ 16.611462] ip_auto_config+0x258/0x1114 [ 16.628727] do_one_initcall+0x80/0x1c8 [ 16.645590] kernel_init_freeable+0x208/0x2f0 [ 16.664917] kernel_init+0x24/0x1e0 [ 16.680295] ret_from_fork+0x10/0x20 [ 16.696369] Code: 927cec03 cb0e0021 8b0e0042 a9411c26 (a900340c) [ 16.723106] ---[ end trace 0000000000000000 ]--- [ 16.752866] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 16.792556] Kernel Offset: 0x3396ea200000 from 0xffff800080000000 [ 16.818966] PHYS_OFFSET: 0xfff1000080000000 [ 16.837237] CPU features: 0x0000000,00060005,13e38581,957e772f [ 16.862904] Memory Limit: none [ 16.876526] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- This panic occurs because the swiotlb memory was previously shared to the host (__set_memory_enc_dec()), which involves transitioning the (large) leaf mappings to invalid, sharing to the host, then marking the mappings valid again. But pageattr_p[mu]d_entry() would only update the entry if it is a section mapping, since otherwise it concluded it must be a table entry so shouldn't be modified. But p[mu]d_sect() only returns true if the entry is valid. So the result was that the large leaf entry was made invalid in the first pass then ignored in the second pass. It remains invalid until the above code tries to access it and blows up. The simple fix would be to update pageattr_pmd_entry() to use !pmd_table() instead of pmd_sect(). That would solve this problem. But the ptdump code also suffers from a similar issue. It checks pmd_leaf() and doesn't call into the arch-specific note_page() machinery if it returns false. As a result of this, ptdump wasn't even able to show the invalid large leaf mappings; it looked like they were valid which made this super fun to debug. the ptdump code is core-mm and pmd_table() is arm64-specific so we can't use the same trick to solve that. But we already support the concept of "present-invalid" for user space entries. And even better, pmd_leaf() will return true for a leaf mapping that is marked present-invalid. So let's just use that encoding for present-invalid kernel mappings too. Then we can use pmd_leaf() where we previously used pmd_sect() and everything is magically fixed. Additionally, from inspection kernel_page_present() was broken in a similar way, so I'm also updating that to use pmd_leaf(). The transitional page tables component was also similarly broken; it creates a copy of the kernel page tables, making RO leaf mappings RW in the process. It also makes invalid (but-not-none) pte mappings valid. But it was not doing this for large leaf mappings. This could have resulted in crashes at kexec- or hibernate-time. This code is fixed to flip "present-invalid" mappings back to "present-valid" at all levels. Finally, I have hardened split_pmd()/split_pud() so that if it is passed a "present-invalid" leaf, it will maintain that property in the split leaves, since I wasn't able to convince myself that it would only ever be called for "present-valid" leaves. Fixes: `a166563e7e` ("arm64: mm: support large block mapping when rodata=full") Cc: stable@vger.kernel.org Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-02 20:49:16 +01:00
Ryan Roberts	f12b435de2	arm64: mm: Fix rodata=full block mapping support for realm guests Commit `a166563e7e` ("arm64: mm: support large block mapping when rodata=full") enabled the linear map to be mapped by block/cont while still allowing granular permission changes on BBML2_NOABORT systems by lazily splitting the live mappings. This mechanism was intended to be usable by realm guests since they need to dynamically share dma buffers with the host by "decrypting" them - which for Arm CCA, means marking them as shared in the page tables. However, it turns out that the mechanism was failing for realm guests because realms need to share their dma buffers (via __set_memory_enc_dec()) much earlier during boot than split_kernel_leaf_mapping() was able to handle. The report linked below showed that GIC's ITS was one such user. But during the investigation I found other callsites that could not meet the split_kernel_leaf_mapping() constraints. The problem is that we block map the linear map based on the boot CPU supporting BBML2_NOABORT, then check that all the other CPUs support it too when finalizing the caps. If they don't, then we stop_machine() and split to ptes. For safety, split_kernel_leaf_mapping() previously wouldn't permit splitting until after the caps were finalized. That ensured that if any secondary cpus were running that didn't support BBML2_NOABORT, we wouldn't risk breaking them. I've fix this problem by reducing the black-out window where we refuse to split; there are now 2 windows. The first is from T0 until the page allocator is inititialized. Splitting allocates memory for the page allocator so it must be in use. The second covers the period between starting to online the secondary cpus until the system caps are finalized (this is a very small window). All of the problematic callers are calling __set_memory_enc_dec() before the secondary cpus come online, so this solves the problem. However, one of these callers, swiotlb_update_mem_attributes(), was trying to split before the page allocator was initialized. So I have moved this call from arch_mm_preinit() to mem_init(), which solves the ordering issue. I've added warnings and return an error if any attempt is made to split in the black-out windows. Note there are other issues which prevent booting all the way to user space, which will be fixed in subsequent patches. Reported-by: Jinjiang Tu <tujinjiang@huawei.com> Closes: https://lore.kernel.org/all/0b2a4ae5-fc51-4d77-b177-b2e9db74f11d@huawei.com/ Fixes: `a166563e7e` ("arm64: mm: support large block mapping when rodata=full") Cc: stable@vger.kernel.org Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-02 20:49:16 +01:00
Kevin Brodsky	5e0deb0a6b	arm64: mm: Use generic enum pgtable_level enum pgtable_type was introduced for arm64 by commit `c64f46ee13` ("arm64: mm: use enum to identify pgtable level instead of *_SHIFT"). In the meantime, the generic enum pgtable_level got introduced by commit `b22cc9a9c7` ("mm/rmap: convert "enum rmap_level" to "enum pgtable_level""). Let's switch to the generic enum pgtable_level. The only difference is that it also includes PGD level; __pgd_pgtable_alloc() isn't expected to create PGD tables so we add a VM_WARN_ON() for that case. Suggested-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-01 18:20:12 +01:00
Anshuman Khandual	95a58852b0	arm64/mm: Reject memory removal that splits a kernel leaf mapping Linear and vmemmap mappings that get torn down during a memory hot remove operation might contain leaf level entries on any page table level. If the requested memory range's linear or vmemmap mappings falls within such leaf entries, new mappings need to be created for the remaining memory mapped on the leaf entry earlier, following standard break before make aka BBM rules. But kernel cannot tolerate BBM and hence remapping to fine grained leaves would not be possible on systems without BBML2_NOABORT. Currently memory hot remove operation does not perform such restructuring, and so removing memory ranges that could split a kernel leaf level mapping need to be rejected. While memory_hotplug.c does appear to permit hot removing arbitrary ranges of memory, the higher layers that drive memory_hotplug (e.g. ACPI, virtio, ...) all appear to treat memory as fixed size devices. So it is impossible to hot unplug a different amount than was previously hot plugged, and hence we should never see a rejection in practice, but adding the check makes us robust against a future change. Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/ Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Suggested-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-01 18:04:54 +01:00
Anshuman Khandual	48478b9f79	arm64/mm: Enable batched TLB flush in unmap_hotplug_range() During a memory hot remove operation, both linear and vmemmap mappings for the memory range being removed, get unmapped via unmap_hotplug_range() but mapped pages get freed only for vmemmap mapping. This is just a sequential operation where each table entry gets cleared, followed by a leaf specific TLB flush, and then followed by memory free operation when applicable. This approach was simple and uniform both for vmemmap and linear mappings. But linear mapping might contain CONT marked block memory where it becomes necessary to first clear out all entire in the range before a TLB flush. This is as per the architecture requirement. Hence batch all TLB flushes during the table tear down walk and finally do it in unmap_hotplug_range(). Prior to this fix, it was hypothetically possible for a speculative access to a higher address in the contiguous block to fill the TLB with shattered entries for the entire contiguous range after a lower address had already been cleared and invalidated. Due to the table entries being shattered, the subsequent TLB invalidation for the higher address would not then clear the TLB entries for the lower address, meaning stale TLB entries could persist. Besides it also helps in improving the performance via TLBI range operation along with reduced synchronization instructions. The time spent executing unmap_hotplug_range() improved 97% measured over a 2GB memory hot removal in KVM guest. This scheme is not applicable during vmemmap mapping tear down where memory needs to be freed and hence a TLB flush is required after clearing out page table entry. Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Closes: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/ Fixes: `bbd6ec605c` ("arm64/mm: Enable memory hot remove") Cc: stable@vger.kernel.org Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-04-01 18:04:54 +01:00
Yeoreum Yun	e223258ed8	arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present The purpose of supporting LSUI is to eliminate PAN toggling. CPUs that support LSUI are unlikely to support a 32-bit runtime. Rather than emulating the SWP instruction using LSUI instructions in order to remove PAN toggling, simply disable SWP emulation. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> [catalin.marinas@arm.com: some tweaks to the in-code comment] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-03-27 17:29:10 +00:00
Ben Horgan	4ce0a2ccc0	arm64: mpam: Add initial MPAM documentation MPAM (Memory Partitioning and Monitoring) is now exposed to user-space via resctrl. Add some documentation so the user knows what features to expect. Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:32:52 +00:00
James Morse	aeb8595a5f	arm_mpam: Quirk CMN-650's CSU NRDY behaviour CMN-650 is afflicted with an erratum where the CSU NRDY bit never clears. This tells us the monitor never finishes scanning the cache. The erratum document says to wait the maximum time, then ignore the field. Add a flag to indicate whether this is the final attempt to read the counter, and when this quirk is applied, ignore the NRDY field. This means accesses to this counter will always retry, even if the counter was previously programmed to the same values. The counter value is not expected to be stable, it drifts up and down with each allocation and eviction. The CSU register provides the value for a point in time. Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:32:42 +00:00
Shanker Donthineni	dc48eb1ff2	arm_mpam: Add workaround for T241-MPAM-6 The registers MSMON_MBWU_L and MSMON_MBWU return the number of requests rather than the number of bytes transferred. Bandwidth resource monitoring is performed at the last level cache, where each request arrive in 64Byte granularity. The current implementation returns the number of transactions received at the last level cache but does not provide the value in bytes. Scaling by 64 gives an accurate byte count to match the MPAM specification for the MSMON_MBWU and MSMON_MBWU_L registers. This patch fixes the issue by reporting the actual number of bytes instead of the number of transactions from __ris_msmon_read(). Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:32:41 +00:00
Shanker Donthineni	a7efe23ed6	arm_mpam: Add workaround for T241-MPAM-4 In the T241 implementation of memory-bandwidth partitioning, in the absence of contention for bandwidth, the minimum bandwidth setting can affect the amount of achieved bandwidth. Specifically, the achieved bandwidth in the absence of contention can settle to any value between the values of MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX. Also, if MPAMCFG_MBW_MIN is set zero (below 0.78125%), once a core enters a throttled state, it will never leave that state. The first issue is not a concern if the MPAM software allows to program MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed. In the scenario where the resctrl doesn't support the MBW_MIN interface via sysfs, to achieve bandwidth closer to MBW_MAX in the absence of contention, software should configure a relatively narrow gap between MBW_MIN and MBW_MAX. The recommendation is to use a 5% gap to mitigate the problem. Clear the feature MBW_MIN feature from the class to ensure we don't accidentally change behaviour when resctrl adds support for a MBW_MIN interface. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:32:41 +00:00
Shanker Donthineni	70e81fbedc	arm_mpam: Add workaround for T241-MPAM-1 The MPAM bandwidth partitioning controls will not be correctly configured, and hardware will retain default configuration register values, meaning generally that bandwidth will remain unprovisioned. To address the issue, follow the below steps after updating the MBW_MIN and/or MBW_MAX registers. - Perform 64b reads from all 12 bridge MPAM shadow registers at offsets (0x360048 + slice0x10000 + partid8). These registers are read-only. - Continue iterating until all 12 shadow register values match in a loop. pr_warn_once if the values fail to match within the loop count 1000. - Perform 64b writes with the value 0x0 to the two spare registers at offsets 0x1b0000 and 0x1c0000. In the hardware, writes to the MPAMCFG_MBW_MAX MPAMCFG_MBW_MIN registers are transformed into broadcast writes to the 12 shadow registers. The final two writes to the spare registers cause a final rank of downstream micro-architectural MPAM registers to be updated from the shadow copies. The intervening loop to read the 12 shadow registers helps avoid a race condition where writes to the spare registers occur before all shadow registers have been updated. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:32:41 +00:00
Shanker Donthineni	fa7745218c	arm_mpam: Add quirk framework The MPAM specification includes the MPAMF_IIDR, which serves to uniquely identify the MSC implementation through a combination of implementer details, product ID, variant, and revision. Certain hardware issues/errata can be resolved using software workarounds. Introduce a quirk framework to allow workarounds to be enabled based on the MPAMF_IIDR value. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:32:27 +00:00
James Morse	fb481ec086	arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl Now that MPAM links against resctrl, call resctrl_init() to register the filesystem and setup resctrl's structures. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:32:19 +00:00
James Morse	4aab135bda	arm64: mpam: Select ARCH_HAS_CPU_RESCTRL Enough MPAM support is present to enable ARCH_HAS_CPU_RESCTRL. Let it rip^Wlink! ARCH_HAS_CPU_RESCTRL indicates resctrl can be enabled. It is enabled by the arch code simply because it has 'arch' in its name. This removes ARM_CPU_RESCTRL as a mimic of X86_CPU_RESCTRL. While here, move the ACPI dependency to the driver's Kconfig file. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:32:11 +00:00
James Morse	efc775eadc	arm_mpam: resctrl: Add empty definitions for assorted resctrl functions A few resctrl features and hooks need to be provided, but aren't needed or supported on MPAM platforms. resctrl has individual hooks to separately enable and disable the closid/partid and rmid/pmg context switching code. For MPAM this is all the same thing, as the value in struct task_struct is used to cache the value that should be written to hardware. arm64's context switching code is enabled once MPAM is usable, but doesn't touch the hardware unless the value has changed. For now event configuration is not supported, and can be turned off by returning 'false' from resctrl_arch_is_evt_configurable(). The new io_alloc feature is not supported either, always return false from the enable helper to indicate and fail the enable. Add this, and empty definitions for the other hooks. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:32:04 +00:00
James Morse	49b04e4018	arm_mpam: resctrl: Update the rmid reallocation limit resctrl's limbo code needs to be told when the data left in a cache is small enough for the partid+pmg value to be re-allocated. x86 uses the cache size divided by the number of rmid users the cache may have. Do the same, but for the smallest cache, and with the number of partid-and-pmg users. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:31:57 +00:00
James Morse	fb56b29932	arm_mpam: resctrl: Add resctrl_arch_rmid_read() resctrl uses resctrl_arch_rmid_read() to read counters. CDP emulation means the counter may need reading in three different ways. The helpers behind the resctrl_arch_ functions will be re-used for the ABMC equivalent functions. Add the rounding helper for checking monitor values while we're here. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:31:50 +00:00
James Morse	2a3c79c615	arm_mpam: resctrl: Allow resctrl to allocate monitors When resctrl wants to read a domain's 'QOS_L3_OCCUP', it needs to allocate a monitor on the corresponding resource. Monitors are allocated by class instead of component. Add helpers to allocate a CSU monitor. These helper return an out of range value for MBM counters. Allocating a montitor context is expected to block until hardware resources become available. This only makes sense for QOS_L3_OCCUP as unallocated MBM counters are losing data. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:31:38 +00:00
James Morse	1458c4f053	arm_mpam: resctrl: Add support for csu counters resctrl exposes a counter via a file named llc_occupancy. This isn't really a counter as its value goes up and down, this is a snapshot of the cache storage usage monitor. Add some picking code which will only find an L3. The resctrl counter file is called llc_occupancy but we don't check it is the last one as it is already identified as L3. Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Dave Martin <dave.martin@arm.com> Signed-off-by: Dave Martin <dave.martin@arm.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:31:23 +00:00
Ben Horgan	264c285999	arm_mpam: resctrl: Add monitor initialisation and domain boilerplate Add the boilerplate that tells resctrl about the mpam monitors that are available. resctrl expects all (non-telemetry) monitors to be on the L3 and so advertise them there and invent an L3 resctrl resource if required. The L3 cache itself has to exist as the cache ids are used as the domain ids. Bring the resctrl monitor domains online and offline based on the cpus they contain. Support for specific monitor types is left to later. Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:31:10 +00:00

1 2 3 4 5 ...

1427986 Commits