linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-02-16 23:36:09 -05:00

Author	SHA1	Message	Date
Will Deacon	f2d64a22fa	Merge branch 'for-next/perf' into for-next/core * for-next/perf: (29 commits) perf/dwc_pcie: Fix use of uninitialized variable Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU Documentation: hisi-pmu: Fix of minor format error drivers/perf: hisi: Add support for L3C PMU v3 drivers/perf: hisi: Refactor the event configuration of L3C PMU drivers/perf: hisi: Extend the field of tt_core drivers/perf: hisi: Extract the event filter check of L3C PMU drivers/perf: hisi: Simplify the probe process of each L3C PMU version drivers/perf: hisi: Export hisi_uncore_pmu_isr() drivers/perf: hisi: Relax the event ID check in the framework perf: Fujitsu: Add the Uncore PMU driver perf/arm-cmn: Fix CMN S3 DTM offset perf: arm_spe: Prevent overflow in PERF_IDX2OFF() coresight: trbe: Prevent overflow in PERF_IDX2OFF() MAINTAINERS: Remove myself from HiSilicon PMU maintainers drivers/perf: hisi: Add support for HiSilicon MN PMU driver drivers/perf: hisi: Add support for HiSilicon NoC PMU perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS arm64/boot: Factor out a macro to check SPE version ...	2025-09-24 16:34:52 +01:00
Will Deacon	77dfca70ba	Merge branch 'for-next/mm' into for-next/core * for-next/mm: arm64: map [_text, _stext) virtual address range non-executable+read-only arm64: Enable vmalloc-huge with ptdump arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs arm64: mm: support large block mapping when rodata=full arm64: Enable permission change on arm64 kernel block mappings arm64/Kconfig: Remove CONFIG_RODATA_FULL_DEFAULT_ENABLED arm64: mm: Rework the 'rodata=' options arm64: mm: Represent physical memory with phys_addr_t and resource_size_t arm64: mm: Make map_fdt() return mapped pointer arm64: mm: Cast start/end markers to char *, not u64	2025-09-24 16:34:34 +01:00
Will Deacon	30f9386820	Merge branch 'for-next/misc' into for-next/core * for-next/misc: arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN arm64: Kconfig: Spell out "ARMv9.4" in menuconfig text arm64/fpsimd: simplify sme_setup()	2025-09-24 16:34:06 +01:00
Will Deacon	7df73a0049	Merge branch 'for-next/entry' into for-next/core * for-next/entry: arm/syscalls: mark syscall invocation as likely in invoke_syscall arm64: entry: Switch to generic IRQ entry arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode() arm64: entry: Refactor preempt_schedule_irq() check code entry: Add arch_irqentry_exit_need_resched() for arm64 arm64: entry: Use preempt_count() and need_resched() helper arm64: entry: Rework arm64_preempt_schedule_irq() arm64: entry: Refactor the entry and exit for exceptions from EL1 arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled()	2025-09-24 16:34:02 +01:00
Will Deacon	e0669b95f7	Merge branch 'for-next/docs' into for-next/core * for-next/docs: arm64/sme: Drop inaccurate documentation of streaming mode switches	2025-09-24 16:33:58 +01:00
Will Deacon	3d751c56c9	Merge branch 'for-next/cpufeature' into for-next/core * for-next/cpufeature: arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list arm64: errata: Apply workarounds for Neoverse-V3AE arm64: cputype: Add Neoverse-V3AE definitions arm64: cpufeature: add AmpereOne to BBML2 allow list arm64: cpufeature: Add Olympus MIDR to BBML2 allow list arm64: cputype: Add NVIDIA Olympus definitions arm64: cputype: Remove duplicate Cortex-X1C definitions arm64: errata: Expand speculative SSBS workaround for Cortex-A720AE arm64: cputype: Add Cortex-A720AE definitions arm64/hwcap: Add hwcap for FEAT_LSFE	2025-09-24 16:33:53 +01:00
Will Deacon	5647d32f51	Merge branch 'for-next/cca' into for-next/core * for-next/cca: arm64: acpi: Enable ACPI CCEL support arm64: Enable EFI secret area Securityfs support arm64: realm: ioremap: Allow mapping memory as encrypted	2025-09-24 16:33:25 +01:00
Will Deacon	57f13e3d91	Merge branch 'for-next/fixes' into for-next/core * for-next/fixes: arm64: ftrace: fix unreachable PLT for ftrace_caller in init_module with CONFIG_DYNAMIC_FTRACE ACPI/IORT: Fix memory leak in iort_rmr_alloc_sids() arm64: uapi: Provide correct __BITS_PER_LONG for the compat vDSO kselftest/arm64: Don't open code SVE_PT_SIZE() in fp-ptrace arm64: mm: Fix CFI failure due to kpti_ng_pgd_alloc function signature	2025-09-24 16:33:03 +01:00
Will Deacon	1cf89b6bf6	arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN Big-endian arm64 configurations are vanishingly rare, yet we still claim to support them in Linux despite very limited testing or visible interest. Supporting big-endian adds unnecessary burden to reviewers and contributors which, without any known active users, is hard to justify. For example, recent work to improve our futex routines and to implement nested virtualisation support is non-trivially complicated by having to support both big- and little-endianness. Back in 2019 [1], it was claimed that Huawei were using arm64 big-endian machines in their telecommunication products but I don't know whether that's still the case and certainly haven't seen any patch contributions to help support or maintain it. Make CPU_BIG_ENDIAN depend on BROKEN as an initial deprecation step towards its removal. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/linux-arm-kernel/73701e9f-bee1-7ae8-2277-7a3576171cd4@huawei.com/ [1] Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-24 16:25:45 +01:00
Ilkka Koskinen	2084660ad2	perf/dwc_pcie: Fix use of uninitialized variable Fix use of uninitialized variable in group validation code. Fixes: `71396cfac9` ("perf/dwc_pcie: Support counting multiple lane events in parallel") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202509231223.gZsX6Eio-lkp@intel.com/ Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-24 12:13:19 +01:00
Can Peng	da9e5c04be	arm/syscalls: mark syscall invocation as likely in invoke_syscall The invoke_syscall() function is overwhelmingly called for valid system call entries. Annotate the main path with likely() to help the compiler generate better branch prediction hints, reducing CPU pipeline stalls due to mispredictions. This is a micro-optimization targeting syscall-heavy workloads [1]. Link: https://lore.kernel.org/r/20250922121730.986761-1-pengcan@kylinos.cn [1] Signed-off-by: Can Peng <pengcan@kylinos.cn> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 13:26:16 +01:00
Yushan Wang	6d2f913fda	Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU Some of HiSilicon V3 PMU hardware is divided into parts to fulfill the job of monitoring specific parts of a device. Add description on that as well as the newly added ext option for L3C PMU. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 13:14:38 +01:00
Yushan Wang	272dd0e5e5	Documentation: hisi-pmu: Fix of minor format error The inline path of sysfs should be placed in literal blocks to make documentation look better. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 13:14:38 +01:00
Yicong Yang	475d94dfe7	drivers/perf: hisi: Add support for L3C PMU v3 This patch adds support for L3C PMU v3. The v3 L3C PMU supports an extended events space which can be controlled in up to 2 extra address spaces with separate overflow interrupts. The layout of the control/event registers are kept the same. The extended events with original ones together cover the monitoring job of all transactions on L3C. The extended events is specified with `ext=[1\|2]` option for the driver to distinguish, like below: perf stat -e hisi_sccl0_l3c0_0/event=<event_id>,ext=1/ Currently only event option using config bit [7, 0]. There's still plenty unused space. Make ext using config [16, 17] and reserve bit [15, 8] for event option for future extension. With the capability of extra counters, number of counters for HiSilicon uncore PMU could reach up to 24, the usedmap is extended accordingly. The hw_perf_event::event_base is initialized to the base MMIO address of the event and will be used for later control, overflow handling and counts readout. We still make use of the Uncore PMU framework for handling the events and interrupt migration on CPU hotplug. The framework's cpuhp callback will handle the event migration and interrupt migration of orginial event, if PMU supports extended events then the interrupt of extended events is migrated to the same CPU choosed by the framework. A new HID of HISI0215 is used for this version of L3C PMU. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Co-developed-by: Yushan Wang <wangyushan12@huawei.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 13:14:38 +01:00
Yicong Yang	b3abb08d6f	drivers/perf: hisi: Refactor the event configuration of L3C PMU The event register is configured using hisi_pmu::base directly since only one address space is supported for L3C PMU. We need to extend if events configuration locates in different address space. In order to make preparation for such hardware, extract the event register configuration to separate function using hw_perf_event::event_base as each event's base address. Implement a private hisi_uncore_ops::get_event_idx() callback for initialize the event_base besides get the hardware index. No functional changes intended. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 13:14:38 +01:00
Yicong Yang	ede339ff61	drivers/perf: hisi: Extend the field of tt_core Currently the tt_core's using config1's bit [7, 0] and can not be extended. For some platforms there's more the 8 CPUs sharing the L3 cache. So make tt_core use config2's bit [15, 0] and the remaining bits in config2 is reserved for extension. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 13:14:37 +01:00
Yicong Yang	2271f16342	drivers/perf: hisi: Extract the event filter check of L3C PMU L3C PMU has 4 filter options which are sharing perf_event_attr::config1. Driver will check config1 to see whether a certain event has a filter setting. It'll be incorrect if we make use of other bits in config1 for non-filter options. So check whether each filter options are set directly in a separate function instead. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 13:14:37 +01:00
Yicong Yang	0960e535be	drivers/perf: hisi: Simplify the probe process of each L3C PMU version Version 1 and 2 of L3C PMU also use different HID. Make use of struct acpi_device_id::driver_data for version specific information rather than judge the version register. This will help to simplify the probe process and also a bit easier for extension. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 13:14:37 +01:00
Yicong Yang	4550244b53	drivers/perf: hisi: Export hisi_uncore_pmu_isr() Currently Uncore PMU framework assume one PMU device only have one interrupt and will help register the interrupt handler. It cannot support a PMU with multiple interrupt resources. An uncore PMU may have multiple interrupts that can share the same handler. Export hisi_uncore_pmu_isr() to allow drivers register the irq handler by their own routine. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 13:14:37 +01:00
Yicong Yang	43de0ac332	drivers/perf: hisi: Relax the event ID check in the framework Event ID is only using the attr::config bit [7, 0] but we check the event range using the whole 64bit field. It blocks the usage of the rest field of attr::config. Relax the check by only using the bit [7, 0]. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 13:14:37 +01:00
Koichi Okuno	bad11557ee	perf: Fujitsu: Add the Uncore PMU driver This adds a new dynamic PMU to the Perf Events framework to program and control the Uncore PMUs in Fujitsu chips. This driver exports formatting and event information to sysfs so it can be used by the perf user space tools with the syntaxes: perf stat -e pci_iod0_pci0/ea-pci/ ls perf stat -e pci_iod0_pci0/event=0x80/ ls perf stat -e mac_iod0_mac0_ch0/ea-mac/ ls perf stat -e mac_iod0_mac0_ch0/event=0x80/ ls FUJITSU-MONAKA PMU Events Specification v1.1 URL: https://github.com/fujitsu/FUJITSU-MONAKA Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Koichi Okuno <fj2767dz@fujitsu.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 13:05:11 +01:00
Omar Sandoval	5973a62efa	arm64: map [_text, _stext) virtual address range non-executable+read-only Since the referenced fixes commit, the kernel's .text section is only mapped starting from _stext; the region [_text, _stext) is omitted. As a result, other vmalloc/vmap allocations may use the virtual addresses nominally in the range [_text, _stext). This address reuse confuses multiple things: 1. crash_prepare_elf64_headers() sets up a segment in /proc/vmcore mapping the entire range [_text, _end) to [__pa_symbol(_text), __pa_symbol(_end)). Reading an address in [_text, _stext) from /proc/vmcore therefore gives the incorrect result. 2. Tools doing symbolization (either by reading /proc/kallsyms or based on the vmlinux ELF file) will incorrectly identify vmalloc/vmap allocations in [_text, _stext) as kernel symbols. In practice, both of these issues affect the drgn debugger. Specifically, there were cases where the vmap IRQ stacks for some CPUs were allocated in [_text, _stext). As a result, drgn could not get the stack trace for a crash in an IRQ handler because the core dump contained invalid data for the IRQ stack address. The stack addresses were also symbolized as being in the _text symbol. Fix this by bringing back the mapping of [_text, _stext), but now make it non-executable and read-only. This prevents other allocations from using it while still achieving the original goal of not mapping unpredictable data as executable. Other than the changed protection, this is effectively a revert of the fixes commit. Fixes: `e2a073dde9` ("arm64: omit [_text, _stext) from permanent kernel mapping") Cc: stable@vger.kernel.org Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 11:58:17 +01:00
Dev Jain	fa93b45fd3	arm64: Enable vmalloc-huge with ptdump Our goal is to move towards enabling vmalloc-huge by default on arm64 so as to reduce TLB pressure. Therefore, we need a way to analyze the portion of block mappings in vmalloc space we can get on a production system; this can be done through ptdump, but currently we disable vmalloc-huge if CONFIG_PTDUMP_DEBUGFS is on. The reason is that lazy freeing of kernel pagetables via vmap_try_huge_pxd() may race with ptdump, so ptdump may dereference a bogus address. To solve this, we need to synchronize ptdump_walk() and ptdump_check_wx() with pud_free_pmd_page() and pmd_free_pte_page(). Since this race is very unlikely to happen in practice, we do not want to penalize the vmalloc pagetable tearing path by taking the init_mm mmap_lock. Therefore, we use static keys. ptdump_walk() and ptdump_check_wx() are the pagetable walkers; they will enable the static key - upon observing that, the vmalloc pagetable tearing path will get patched in with an mmap_read_lock/unlock sequence. A combination of the patched-in mmap_read_lock/unlock, the acquire semantics of static_branch_inc(), and the barriers in __flush_tlb_kernel_pgtable() ensures that ptdump will never get a hold on the address of a freed PMD or PTE table. We can verify the correctness of the algorithm via the following litmus test (thanks to James Houghton and Will Deacon): AArch64 ptdump Variant=Ifetch { uint64_t pud=0xa110c; uint64_t pmd; 0:X0=label:"P1:L0"; 0:X1=instr:"NOP"; 0:X2=lock; 0:X3=pud; 0:X4=pmd; 1:X1=0xdead; 1:X2=lock; 1:X3=pud; 1:X4=pmd; } P0 \| P1 ; (* static_key_enable ) \| ( pud_free_pmd_page ) ; STR W1, [X0] \| LDR X9, [X3] ; DC CVAU,X0 \| STR XZR, [X3] ; DSB ISH \| DSB ISH ; IC IVAU,X0 \| ISB ; DSB ISH \| ; ISB \| ( static key ) ; \| L0: ; ( mmap_lock ) \| B out1 ; Lwlock: \| ; MOV W7, #1 \| ( mmap_lock ) ; SWPA W7, W8, [X2] \| Lrlock: ; \| MOV W7, #1 ; \| SWPA W7, W8, [X2] ; ( walk pgtable ) \| ; LDR X9, [X3] \| ( mmap_unlock ) ; CBZ X9, out0 \| STLR WZR, [X2] ; EOR X10, X9, X9 \| ; LDR X11, [X4, X10] \| out1: ; \| EOR X10, X9, X9 ; out0: \| STR X1, [X4, X10] ; exists (0:X8=0 /\ 1:X8=0 /\ ( Lock acquisitions succeed ) 0:X9=0xa110c /\ ( P0 sees the valid PUD ...) 0:X11=0xdead) ( ... but the freed PMD *) For an approximate written proof of why this algorithm works, please read the code comment in [1], which is now removed for the sake of simplicity. mm-selftests pass. No issues were observed while parallelly running test_vmalloc.sh (which stresses the vmalloc subsystem), and cat /sys/kernel/debug/{kernel_page_tables, check_wx_pages} in a loop. Link: https://lore.kernel.org/all/20250723161827.15802-1-dev.jain@arm.com/ [1] Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 11:53:24 +01:00
Ryan Roberts	8fca3852e3	arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list Neoverse-V3AE advertises support for BBML2 and is known to not raise conflict aborts. So add it to the BBML2_NOABORT allow list. However, just like Neoverse-V3, Neoverse-V3AE r0p0 and r0p1 suffer from erratum #3053180, for which the workaround is to always observe break-before-make requirements for affected revisions. Therefore only add to the allow list from r0p2 onwards. For more details see Software Developer Errata Notice (SDEN) document: Neoverse V3AE (MP172) SDEN v9.0, erratum 3053180 https://developer.arm.com/documentation/SDEN-2615521/9-0/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 11:27:14 +01:00
Mark Rutland	0c33aa1804	arm64: errata: Apply workarounds for Neoverse-V3AE Neoverse-V3AE is also affected by erratum #3312417, as described in its Software Developer Errata Notice (SDEN) document: Neoverse V3AE (MP172) SDEN v9.0, erratum 3312417 https://developer.arm.com/documentation/SDEN-2615521/9-0/ Enable the workaround for Neoverse-V3AE, and document this. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 11:27:14 +01:00
Mark Rutland	3bbf004c48	arm64: cputype: Add Neoverse-V3AE definitions Add cputype definitions for Neoverse-V3AE. These will be used for errata detection in subsequent patches. These values can be found in the Neoverse-V3AE TRM: https://developer.arm.com/documentation/SDEN-2615521/9-0/ ... in section A.6.1 ("MIDR_EL1, Main ID Register"). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 11:27:14 +01:00
Ryan Roberts	3df6979d22	arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs The kernel linear mapping is painted in very early stage of system boot. The cpufeature has not been finalized yet at this point. So the linear mapping is determined by the capability of boot CPU only. If the boot CPU supports BBML2, large block mappings will be used for linear mapping. But the secondary CPUs may not support BBML2, so repaint the linear mapping if large block mapping is used and the secondary CPUs don't support BBML2 once cpufeature is finalized on all CPUs. If the boot CPU doesn't support BBML2 or the secondary CPUs have the same BBML2 capability with the boot CPU, repainting the linear mapping is not needed. Repainting is implemented by the boot CPU, which we know supports BBML2, so it is safe for the live mapping size to change for this CPU. The linear map region is walked using the pagewalk API and any discovered large leaf mappings are split to pte mappings using the existing helper functions. Since the repainting is performed inside of a stop_machine(), we must use GFP_ATOMIC to allocate the extra intermediate pgtables. But since we are still early in boot, it is expected that there is plenty of memory available so we will never need to sleep for reclaim, and so GFP_ATOMIC is acceptable here. The secondary CPUs are all put into a waiting area with the idmap in TTBR0 and reserved map in TTBR1 while this is performed since they cannot be allowed to observe any size changes on the live mappings. Some of this infrastructure is reused from the kpti case. Specifically we share the same flag (was __idmap_kpti_flag, now idmap_kpti_bbml2_flag) since it means we don't have to reserve any extra pgtable memory to idmap the extra flag. Co-developed-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-19 19:13:54 +01:00
Will Deacon	92d051a1c1	arm64: Kconfig: Spell out "ARMv9.4" in menuconfig text The menuconfig entries to configure various architectural features are all formatted as "ARMvx.y architecture features" with the unusual exception of 9.4, which omits the "ARM" prefix. Add the "ARM" prefix to the menuconfig entry for the ARMv9.4 architectural features. Signed-off-by: Will Deacon <will@kernel.org>	2025-09-19 14:56:43 +01:00
Suzuki K Poulose	d02c2e45b1	arm64: acpi: Enable ACPI CCEL support Add support for ACPI CCEL by handling the EfiACPIMemoryNVS type memory. As per UEFI specifications NVS memory is reserved for Firmware use even after exiting boot services. Thus map the region as read-only. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Gavin Shan <gshan@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-19 10:12:02 +01:00
Suzuki K Poulose	9e8a3df3e7	arm64: Enable EFI secret area Securityfs support Enable EFI COCO secrets support. Provide the ioremap_encrypted() support required by the driver. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-19 10:12:01 +01:00
Suzuki K Poulose	fa84e534c3	arm64: realm: ioremap: Allow mapping memory as encrypted For ioremap(), so far we only checked if it was a device (RIPAS_DEV) to choose an encrypted vs decrypted mapping. However, we may have firmware reserved memory regions exposed to the OS (e.g., EFI Coco Secret Securityfs, ACPI CCEL). We need to make sure that anything that is RIPAS_RAM (i.e., Guest protected memory with RMM guarantees) are also mapped as encrypted. Rephrasing the above, anything that is not RIPAS_EMPTY is guaranteed to be protected by the RMM. Thus we choose encrypted mapping for anything that is not RIPAS_EMPTY. While at it, rename the helper function __arm64_is_protected_mmio => arm64_rsi_is_protected to clearly indicate that this not an arm64 generic helper, but something to do with Realms. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-19 10:12:01 +01:00
Yang Shi	a166563e7e	arm64: mm: support large block mapping when rodata=full When rodata=full is specified, kernel linear mapping has to be mapped at PTE level since large page table can't be split due to break-before-make rule on ARM64. This resulted in a couple of problems: - performance degradation - more TLB pressure - memory waste for kernel page table With FEAT_BBM level 2 support, splitting large block page table to smaller ones doesn't need to make the page table entry invalid anymore. This allows kernel split large block mapping on the fly. Add kernel page table split support and use large block mapping by default when FEAT_BBM level 2 is supported for rodata=full. When changing permissions for kernel linear mapping, the page table will be split to smaller size. The machine without FEAT_BBM level 2 will fallback to have kernel linear mapping PTE-mapped when rodata=full. With this we saw significant performance boost with some benchmarks and much less memory consumption on my AmpereOne machine (192 cores, 1P) with 256GB memory. * Memory use after boot Before: MemTotal: 258988984 kB MemFree: 254821700 kB After: MemTotal: 259505132 kB MemFree: 255410264 kB Around 500MB more memory are free to use. The larger the machine, the more memory saved. * Memcached We saw performance degradation when running Memcached benchmark with rodata=full vs rodata=on. Our profiling pointed to kernel TLB pressure. With this patchset we saw ops/sec is increased by around 3.5%, P99 latency is reduced by around 9.6%. The gain mainly came from reduced kernel TLB misses. The kernel TLB MPKI is reduced by 28.5%. The benchmark data is now on par with rodata=on too. * Disk encryption (dm-crypt) benchmark Ran fio benchmark with the below command on a 128G ramdisk (ext4) with disk encryption (by dm-crypt). fio --directory=/data --random_generator=lfsr --norandommap \ --randrepeat 1 --status-interval=999 --rw=write --bs=4k --loops=1 \ --ioengine=sync --iodepth=1 --numjobs=1 --fsync_on_close=1 \ --group_reporting --thread --name=iops-test-job --eta-newline=1 \ --size 100G The IOPS is increased by 90% - 150% (the variance is high, but the worst number of good case is around 90% more than the best number of bad case). The bandwidth is increased and the avg clat is reduced proportionally. * Sequential file read Read 100G file sequentially on XFS (xfs_io read with page cache populated). The bandwidth is increased by 150%. Co-developed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 21:36:37 +01:00
Dev Jain	a660194dd1	arm64: Enable permission change on arm64 kernel block mappings This patch paves the path to enable huge mappings in vmalloc space and linear map space by default on arm64. For this we must ensure that we can handle any permission games on the kernel (init_mm) pagetable. Previously, __change_memory_common() used apply_to_page_range() which does not support changing permissions for block mappings. We move away from this by using the pagewalk API, similar to what riscv does right now. It is the responsibility of the caller to ensure that the range over which permissions are being changed falls on leaf mapping boundaries. For systems with BBML2, this will be handled in future patches by dyanmically splitting the mappings when required. Unlike apply_to_page_range(), the pagewalk API currently enforces the init_mm.mmap_lock to be held. To avoid the unnecessary bottleneck of the mmap_lock for our usecase, this patch extends this generic API to be used locklessly, so as to retain the existing behaviour for changing permissions. Apart from this reason, it is noted at [1] that KFENCE can manipulate kernel pgtable entries during softirqs. It does this by calling set_memory_valid() -> __change_memory_common(). This being a non-sleepable context, we cannot take the init_mm mmap lock. Add comments to highlight the conditions under which we can use the lockless variant - no underlying VMA, and the user having exclusive control over the range, thus guaranteeing no concurrent access. We require that the start and end of a given range do not partially overlap block mappings, or cont mappings. Return -EINVAL in case a partial block mapping is detected in any of the PGD/P4D/PUD/PMD levels; add a corresponding comment in update_range_prot() to warn that eliminating such a condition is the responsibility of the caller. Note that, the pte level callback may change permissions for a whole contpte block, and that will be done one pte at a time, as opposed to an atomic operation for the block mappings. This is fine as any access will decode either the old or the new permission until the TLBI. apply_to_page_range() currently performs all pte level callbacks while in lazy mmu mode. Since arm64 can optimize performance by batching barriers when modifying kernel pgtables in lazy mmu mode, we would like to continue to benefit from this optimisation. Unfortunately walk_kernel_page_table_range() does not use lazy mmu mode. However, since the pagewalk framework is not allocating any memory, we can safely bracket the whole operation inside lazy mmu mode ourselves. Therefore, wrap the call to walk_kernel_page_table_range() with the lazy MMU helpers. Link: https://lore.kernel.org/linux-arm-kernel/89d0ad18-4772-4d8f-ae8a-7c48d26a927e@arm.com/ [1] Signed-off-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Yang Shi <yshi@os.amperecomputing.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 21:36:37 +01:00
Yang Shi	13efe932d2	arm64: cpufeature: add AmpereOne to BBML2 allow list AmpereOne supports BBML2 without conflict abort, add to the allow list. Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 21:26:50 +01:00
Shanker Donthineni	cc80537caa	arm64: cpufeature: Add Olympus MIDR to BBML2 allow list The NVIDIA Olympus core supports BBML2 without conflict abort. Add its MIDR to the allow list to enable FEAT_BBM. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 20:21:37 +01:00
Shanker Donthineni	e185c8a0d8	arm64: cputype: Add NVIDIA Olympus definitions Add cpu part and model macro definitions for NVIDIA Olympus core. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 20:21:36 +01:00
Robin Murphy	b3fe1c83a5	perf/arm-cmn: Fix CMN S3 DTM offset CMN S3's DTM offset is different between r0px and r1p0, and it turns out this was not a error in the earlier documentation, but does actually exist in the design. Lovely. Cc: stable@vger.kernel.org Fixes: `0dc2f4963f` ("perf/arm-cmn: Support CMN S3") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 20:18:24 +01:00
Mark Rutland	52b49bd6de	arm64: cputype: Remove duplicate Cortex-X1C definitions We currently have duplicate definitions for ARM_CPU_PART_CORTEX_X1C and MIDR_CORTEX_X1C as a result of commits: `58d245e03c` ("arm64: cputype: Add Cortex-X1C definitions") `efe676a1a7` ("arm64: proton-pack: Add new CPUs 'k' values for branch mitigation") Due to inconsistent sorting when adding entries, there was no textual conflict between the two patches. Delete the duplicate definitions added by the latter commit. The definitions in general are largely (but not entirely) in order of the MIDR_EL1.PartNum value rather than by CPU name, and the remaining Cortex-X1C definitions appear later in the list. For now I haven't sorted the remaining MIDR definitions to minimize churn. I intend to perform some larger cleanup of these in the near future which should supersede that anyhow. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 17:51:50 +01:00
Leo Yan	a29fea30dd	perf: arm_spe: Prevent overflow in PERF_IDX2OFF() Cast nr_pages to unsigned long to avoid overflow when handling large AUX buffer sizes (>= 2 GiB). Fixes: `d5d9696b03` ("drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension") Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 15:23:59 +01:00
Leo Yan	105f56877f	coresight: trbe: Prevent overflow in PERF_IDX2OFF() Cast nr_pages to unsigned long to avoid overflow when handling large AUX buffer sizes (>= 2 GiB). Fixes: `3fbf7f011f` ("coresight: sink: Add TRBE driver") Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 15:23:58 +01:00
Yicong Yang	542342d271	MAINTAINERS: Remove myself from HiSilicon PMU maintainers Remove myself as I'm leaving HiSilicon and not suitable for maintaining this. Thanks for the journey. Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:54:29 +01:00
Junhao He	2257798498	drivers/perf: hisi: Add support for HiSilicon MN PMU driver MN (Miscellaneous Node) is a hybrid node in ARM CHI. It broadcasts the following two types of requests: DVM operations and PCIe configuration. MN PMU devices exist on both SCCL and SICL, so we named the MN pmu driver after SCL (Super cluster) ID. The MN PMU driver using the HiSilicon uncore PMU framework. And only the event parameter is supported. Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:54:29 +01:00
Yicong Yang	e31c0eb103	drivers/perf: hisi: Add support for HiSilicon NoC PMU Adds the support for HiSilicon NoC (Network on Chip) PMU which will be used to monitor the events on the system bus. The PMU device will be named after the SCL ID (either Super CPU cluster or Super IO cluster) and the index ID, just similar to other HiSilicon Uncore PMUs. Below PMU formats are provided besides the event: - ch: the transaction channel (data, request, response, etc) which can be used to filter the counting. - tt_en: tracetag filtering enable. Just as other HiSilicon Uncore PMUs the NoC PMU supports only counting the transactions with tracetag. The NoC PMU doesn't have an interrupt to indicate the overflow. However we have a 64 bit counter which is large enough and it's nearly impossible to overflow. Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:54:29 +01:00
Yicong Yang	f8f89e8cf3	perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions PMCCNTR_EL0 is preferred for counting CPU_CYCLES under certain conditions. Factor out the condition check to a separate function for further extension. Add documents for better understanding. No functional changes intended. Reviewed-by: James Clark <james.clark@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:35:54 +01:00
James Clark	00d7a1af5a	arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS SPE data source filtering (optional from Armv8.8) requires that traps to the filter register PMSDSFR be disabled. Document the requirements and disable the traps if the feature is present. Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:17:02 +01:00
James Clark	510a8fa49d	arm64/boot: Factor out a macro to check SPE version We check the version of SPE twice, and we'll add one more check in the next commit so factor out a macro to do this. Change the #3 magic number to the actual SPE version define (V1p2) to make it more readable. No functional changes intended. Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:17:02 +01:00
James Clark	dad9603c5e	perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering FEAT_SPE_EFT (optional from Armv9.4) adds mask bits for the existing load, store and branch filters. It also adds two new filter bits for SIMD and floating point with their own associated mask bits. The current filters only allow OR filtering on samples that are load OR store etc, and the new mask bits allow setting part of the filter to an AND, for example filtering samples that are store AND SIMD. With mask bits set to 0, the OR behavior is preserved, so the unless any masks are explicitly set old filters will behave the same. Add them all and make them behave the same way as existing format bits, hidden and return EOPNOTSUPP if set when the feature doesn't exist. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:17:02 +01:00
Leo Yan	51b9f16697	perf: arm_spe: Expose event filter Expose an "event_filter" entry in the caps folder to inform user space about which events can be filtered. Change the return type of arm_spe_pmu_cap_get() from u32 to u64 to accommodate the added event filter entry. Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:17:02 +01:00
James Clark	b4401403af	perf: arm_spe: Support FEAT_SPEv1p4 filters FEAT_SPEv1p4 (optional from Armv8.8) adds some new filter bits and also makes some previously available bits unavailable again e.g: E[30], bit [30] When FEAT_SPEv1p4 is _not_ implemented ... Continuing to hard code the valid filter bits for each version isn't scalable, and it also doesn't work for filter bits that aren't related to SPE version. For example most bits have a further condition: E[15], bit [15] When ... and filtering on event 15 is supported: Whether "filtering on event 15" is implemented or not is only discoverable from the TRM of that specific CPU or by probing PMSEVFR_EL1. Instead of hard coding them, write all 1s to the PMSEVFR_EL1 register and read it back to discover the RES0 bits. Unsupported bits are RAZ/WI so should read as 0s. For any hardware that doesn't strictly follow RAZ/WI for unsupported filters: Any bits that should have been supported in a specific SPE version but now incorrectly appear to be RES0 wouldn't have worked anyway, so it's better to fail to open events that request them rather than behaving unexpectedly. Bits that aren't implemented but also aren't RAZ/WI will be incorrectly reported as supported, but allowing them to be used is harmless. Testing on N1SDP shows the probed RES0 bits to be the same as the hard coded ones. The FVP with SPEv1p4 shows only additional new RES0 bits, i.e. no previously hard coded RES0 bits are missing. Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:17:02 +01:00
James Clark	a7005ff2d0	arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register Add new fields and register that are introduced for the features FEAT_SPE_EFT (extended filtering) and FEAT_SPE_FDS (data source filtering). Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:17:02 +01:00

1 2 3 4 5 ...

1381771 Commits