linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-23 00:55:48 -04:00

Author	SHA1	Message	Date
Oliver Upton	5aea409638	KVM: arm64: nv: Allow userspace to de-feature stage-2 TGRANs KVM advertises the stage-2 TGRAN fields as writable to userspace but prevents any modification for NV-enabled VMs. Update the special-cased sanitization to permit de-featuring a particular TGRAN without allowing the legacy value which refers to the stage-1 field for support. Reported-by: Itaru Kitayama <itaru.kitayama@linux.dev> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 10:52:01 +01:00
Oliver Upton	ff37a41db8	KVM: arm64: nv: Treat AMO as 1 when at EL2 and {E2H,TGE} = {1, 0} SErrors are not deliverable at EL2 when the effective value of HCR_EL2.{TGE,AMO} = {0, 0}. This is bothersome to deal with in nested as we need to use auxiliary pending state to track the pending vSError since HCR_EL2.VSE has no mechanism for honoring the guest HCR. On top of that, we have no way of making that auxiliary pending state visible in ISR_EL1. A defect against the architecture now allows an implementation to treat HCR_EL2.AMO as 1 when HCR_EL2.{E2H,TGE} = {1, 0}. Let's do exactly that, meaning SErrors are always deliverable at EL2 for the typical E2H=RES1 VM. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 10:41:56 +01:00
Suzuki K Poulose	d02c2e45b1	arm64: acpi: Enable ACPI CCEL support Add support for ACPI CCEL by handling the EfiACPIMemoryNVS type memory. As per UEFI specifications NVS memory is reserved for Firmware use even after exiting boot services. Thus map the region as read-only. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Gavin Shan <gshan@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-19 10:12:02 +01:00
Suzuki K Poulose	9e8a3df3e7	arm64: Enable EFI secret area Securityfs support Enable EFI COCO secrets support. Provide the ioremap_encrypted() support required by the driver. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-19 10:12:01 +01:00
Suzuki K Poulose	fa84e534c3	arm64: realm: ioremap: Allow mapping memory as encrypted For ioremap(), so far we only checked if it was a device (RIPAS_DEV) to choose an encrypted vs decrypted mapping. However, we may have firmware reserved memory regions exposed to the OS (e.g., EFI Coco Secret Securityfs, ACPI CCEL). We need to make sure that anything that is RIPAS_RAM (i.e., Guest protected memory with RMM guarantees) are also mapped as encrypted. Rephrasing the above, anything that is not RIPAS_EMPTY is guaranteed to be protected by the RMM. Thus we choose encrypted mapping for anything that is not RIPAS_EMPTY. While at it, rename the helper function __arm64_is_protected_mmio => arm64_rsi_is_protected to clearly indicate that this not an arm64 generic helper, but something to do with Realms. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-19 10:12:01 +01:00
J. Neuschäfer	07c7f4f4e9	arm64: dts: allwinner: h313: Add Amediatech X96Q The X96Q is a set-top box with an H313 SoC, AXP305 PMIC, 1 or 2 GiB RAM, 8 or 16 GiB eMMC flash, 2x USB A, Micro-SD, HDMI, Ethernet, audio/video output, and infrared input. https://x96mini.com/products/x96q-tv-box-android-10-set-top-box Tested, works: - debug UART - status LED - USB ports in host mode - MicroSD - eMMC - recovery button hidden behind audio/video port - analog audio (line out) Does not work: - Ethernet (requires AC200 MFD/EPHY driver) - WLAN (requires out-of-tree XRadio driver) - analog video output (requires AC200 driver) - HDMI audio/video output Untested: - "OTG" USB port in device mode - built-in IR receiver - external IR receiver Table of regulators on the downstream kernel, for reference: vcc-5v 1 15 0 unknown 5000mV 0mA 5000mV 5000mV dcdca 0 0 0 unknown 900mV 0mA 0mV 0mV dcdcb 0 0 0 unknown 1350mV 0mA 0mV 0mV dcdcc 0 0 0 unknown 900mV 0mA 0mV 0mV dcdcd 0 0 0 unknown 1500mV 0mA 0mV 0mV dcdce 0 0 0 unknown 3300mV 0mA 0mV 0mV aldo1 0 0 0 unknown 3300mV 0mA 0mV 0mV aldo2 0 0 0 unknown 700mV 0mA 0mV 0mV aldo3 0 0 0 unknown 700mV 0mA 0mV 0mV bldo1 0 0 0 unknown 1800mV 0mA 0mV 0mV bldo2 0 0 0 unknown 1800mV 0mA 0mV 0mV bldo3 0 0 0 unknown 700mV 0mA 0mV 0mV bldo4 0 0 0 unknown 700mV 0mA 0mV 0mV cldo1 0 0 0 unknown 2500mV 0mA 0mV 0mV cldo2 0 0 0 unknown 700mV 0mA 0mV 0mV cldo3 0 0 0 unknown 700mV 0mA 0mV 0mV Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://patch.msgid.link/20250918-x96q-v2-2-51bd39928806@posteo.net Signed-off-by: Chen-Yu Tsai <wens@csie.org>	2025-09-19 12:37:16 +08:00
Aleksa Paunovic	a8fed1bc03	riscv: Add xmipsexectl as a vendor extension Add support for MIPS vendor extensions. Add support for the xmipsexectl vendor extension. Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-2-a6cbbe1c3412@htecgroup.com [pjw@kernel.org: added the MIPS vendor ID from another patch to fix the build] Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-09-18 20:36:00 -06:00
Clément Léger	2e2cf5581f	riscv: cpufeature: add validation for zfa, zfh and zfhmin These extensions depends on the F one. Add a validation callback checking for the F extension to be present. Now that extensions are correctly reported using the F/D presence, we can remove the has_fpu() check in hwprobe_isa_ext0(). Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250527100001.33284-1-cleger@rivosinc.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-09-18 19:51:09 -06:00
Andrew Davis	70ddf86d76	riscv: sbi: Switch to new sys-off handler API Kernel now supports chained power-off handlers. Use register_platform_power_off() that registers a platform level power-off handler. Legacy pm_power_off() will be removed once all drivers and archs are converted to the new sys-off API. Signed-off-by: Andrew Davis <afd@ti.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250813151855.105237-1-afd@ti.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-09-18 17:48:04 -06:00
Yang Shi	a166563e7e	arm64: mm: support large block mapping when rodata=full When rodata=full is specified, kernel linear mapping has to be mapped at PTE level since large page table can't be split due to break-before-make rule on ARM64. This resulted in a couple of problems: - performance degradation - more TLB pressure - memory waste for kernel page table With FEAT_BBM level 2 support, splitting large block page table to smaller ones doesn't need to make the page table entry invalid anymore. This allows kernel split large block mapping on the fly. Add kernel page table split support and use large block mapping by default when FEAT_BBM level 2 is supported for rodata=full. When changing permissions for kernel linear mapping, the page table will be split to smaller size. The machine without FEAT_BBM level 2 will fallback to have kernel linear mapping PTE-mapped when rodata=full. With this we saw significant performance boost with some benchmarks and much less memory consumption on my AmpereOne machine (192 cores, 1P) with 256GB memory. * Memory use after boot Before: MemTotal: 258988984 kB MemFree: 254821700 kB After: MemTotal: 259505132 kB MemFree: 255410264 kB Around 500MB more memory are free to use. The larger the machine, the more memory saved. * Memcached We saw performance degradation when running Memcached benchmark with rodata=full vs rodata=on. Our profiling pointed to kernel TLB pressure. With this patchset we saw ops/sec is increased by around 3.5%, P99 latency is reduced by around 9.6%. The gain mainly came from reduced kernel TLB misses. The kernel TLB MPKI is reduced by 28.5%. The benchmark data is now on par with rodata=on too. * Disk encryption (dm-crypt) benchmark Ran fio benchmark with the below command on a 128G ramdisk (ext4) with disk encryption (by dm-crypt). fio --directory=/data --random_generator=lfsr --norandommap \ --randrepeat 1 --status-interval=999 --rw=write --bs=4k --loops=1 \ --ioengine=sync --iodepth=1 --numjobs=1 --fsync_on_close=1 \ --group_reporting --thread --name=iops-test-job --eta-newline=1 \ --size 100G The IOPS is increased by 90% - 150% (the variance is high, but the worst number of good case is around 90% more than the best number of bad case). The bandwidth is increased and the avg clat is reduced proportionally. * Sequential file read Read 100G file sequentially on XFS (xfs_io read with page cache populated). The bandwidth is increased by 150%. Co-developed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 21:36:37 +01:00
Dev Jain	a660194dd1	arm64: Enable permission change on arm64 kernel block mappings This patch paves the path to enable huge mappings in vmalloc space and linear map space by default on arm64. For this we must ensure that we can handle any permission games on the kernel (init_mm) pagetable. Previously, __change_memory_common() used apply_to_page_range() which does not support changing permissions for block mappings. We move away from this by using the pagewalk API, similar to what riscv does right now. It is the responsibility of the caller to ensure that the range over which permissions are being changed falls on leaf mapping boundaries. For systems with BBML2, this will be handled in future patches by dyanmically splitting the mappings when required. Unlike apply_to_page_range(), the pagewalk API currently enforces the init_mm.mmap_lock to be held. To avoid the unnecessary bottleneck of the mmap_lock for our usecase, this patch extends this generic API to be used locklessly, so as to retain the existing behaviour for changing permissions. Apart from this reason, it is noted at [1] that KFENCE can manipulate kernel pgtable entries during softirqs. It does this by calling set_memory_valid() -> __change_memory_common(). This being a non-sleepable context, we cannot take the init_mm mmap lock. Add comments to highlight the conditions under which we can use the lockless variant - no underlying VMA, and the user having exclusive control over the range, thus guaranteeing no concurrent access. We require that the start and end of a given range do not partially overlap block mappings, or cont mappings. Return -EINVAL in case a partial block mapping is detected in any of the PGD/P4D/PUD/PMD levels; add a corresponding comment in update_range_prot() to warn that eliminating such a condition is the responsibility of the caller. Note that, the pte level callback may change permissions for a whole contpte block, and that will be done one pte at a time, as opposed to an atomic operation for the block mappings. This is fine as any access will decode either the old or the new permission until the TLBI. apply_to_page_range() currently performs all pte level callbacks while in lazy mmu mode. Since arm64 can optimize performance by batching barriers when modifying kernel pgtables in lazy mmu mode, we would like to continue to benefit from this optimisation. Unfortunately walk_kernel_page_table_range() does not use lazy mmu mode. However, since the pagewalk framework is not allocating any memory, we can safely bracket the whole operation inside lazy mmu mode ourselves. Therefore, wrap the call to walk_kernel_page_table_range() with the lazy MMU helpers. Link: https://lore.kernel.org/linux-arm-kernel/89d0ad18-4772-4d8f-ae8a-7c48d26a927e@arm.com/ [1] Signed-off-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Yang Shi <yshi@os.amperecomputing.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 21:36:37 +01:00
Yang Shi	13efe932d2	arm64: cpufeature: add AmpereOne to BBML2 allow list AmpereOne supports BBML2 without conflict abort, add to the allow list. Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 21:26:50 +01:00
Jeremy Linton	ea87c5536a	arm64: probes: Fix incorrect bl/blr address and register usage The pt_regs registers are 64-bit on arm64, and should be u64 when manipulated. Correct this so that we aren't truncating the address during br/blr sequences. Fixes: `efb07ac534` ("arm64: probes: Add GCS support to bl/blr/ret") Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 21:06:59 +01:00
Sean Christopherson	c49aa98376	KVM: x86/pmu: Restrict GLOBAL_{CTRL,STATUS}, fixed PMCs, and PEBS to PMU v2+ Restrict support for GLOBAL_CTRL, GLOBAL_STATUS, fixed PMCs, and PEBS to v2 or later vPMUs. The SDM explicitly states that GLOBAL_{CTRL,STATUS} and fixed counters were introduced with PMU v2, and PEBS has hard dependencies on fixed counters and the bitmap MSR layouts established by PMU v2. Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-32-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:58:15 -07:00
Sean Christopherson	9bae7a0863	KVM: x86/pmu: Move initialization of valid PMCs bitmask to common x86 Move all initialization of all_valid_pmc_idx to common code, as the logic is 100% common to Intel and AMD, and KVM heavily relies on Intel and AMD having the same semantics. E.g. the fact that AMD doesn't support fixed counters doesn't allow KVM to use all_valid_pmc_idx[63:32] for other purposes. Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-31-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:58:13 -07:00
Dapeng Mi	30c0267f15	KVM: x86/pmu: Use BIT_ULL() instead of open coded equivalents Replace a variety of "1ull << N" and "(u64)1 << N" snippets with BIT_ULL() in the PMU code. No functional change intended. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> [sean: split to separate patch, write changelog] Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-30-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:57:20 -07:00
Dapeng Mi	2bff2edf69	KVM: VMX: Add helpers to toggle/change a bit in VMCS execution controls Expand the VMCS controls builder macros to generate helpers to change a bit to the desired value, and use the new helpers when toggling APICv related controls. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> [sean: rewrite changelog] Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-27-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:57:19 -07:00
Sean Christopherson	5a1a726e68	KVM: x86: Use KVM_REQ_RECALC_INTERCEPTS to react to CPUID updates Defer recalculating MSR and instruction intercepts after a CPUID update via RECALC_INTERCEPTS to converge on RECALC_INTERCEPTS as the "official" mechanism for triggering recalcs. As a bonus, because KVM does a "recalc" during vCPU creation, and every functional VMM sets CPUID at least once, for all intents and purposes this saves at least one recalc. Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-26-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:57:19 -07:00
Sean Christopherson	6057497336	KVM: x86: Rework KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS Rework the MSR_FILTER_CHANGED request into a more generic RECALC_INTERCEPTS request, and expand the responsibilities of vendor code to recalculate all intercepts that vary based on userspace input, e.g. instruction intercepts that are tied to guest CPUID. Providing a generic recalc request will allow the upcoming mediated PMU support to trigger a recalc when PMU features, e.g. PERF_CAPABILITIES, are set by userspace, without having to make multiple calls to/from PMU code. As a bonus, using a request will effectively coalesce recalcs, e.g. will reduce the number of recalcs for normal usage from 3+ to 1 (vCPU create, set CPUID, set PERF_CAPABILITIES (Intel only), set filter). The downside is that MSR filter changes that are done in isolation will do a small amount of unnecessary work, but that's already a relatively slow path, and the cost of recalculating instruction intercepts is negligible. Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-25-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:57:18 -07:00
Dapeng Mi	cdfed9370b	KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h and rename them with PERF_CAP prefix to keep consistent with other perf capabilities macros. No functional change intended. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-24-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:57:16 -07:00
Dapeng Mi	1e24bece26	KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers Rename the two helpers vmx_vmentry/vmexit_ctrl() to vmx_get_initial_vmentry/vmexit_ctrl() to represent their real meaning. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-23-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:56:45 -07:00
Sean Christopherson	51f34b1e65	KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities Take a snapshot of the unadulterated PMU capabilities provided by perf so that KVM can compare guest vPMU capabilities against hardware capabilities when determining whether or not to intercept PMU MSRs (and RDPMC). Reviewed-by: Sandipan Das <sandipan.das@amd.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:56:44 -07:00
Sean Christopherson	e3d1f2826d	KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs Gate access to PMC MSRs based on pmu->version, not on kvm->arch.enable_pmu, to more accurately reflect KVM's behavior. This is a glorified nop, as pmu->version and pmu->nr_arch_gp_counters can only be non-zero if amd_pmu_refresh() is reached, kvm_pmu_refresh() invokes amd_pmu_refresh() if and only if kvm->arch.enable_pmu is true, and amd_pmu_refresh() forces pmu->version to be 1 or 2. I.e. the following holds true: !pmu->nr_arch_gp_counters \|\| kvm->arch.enable_pmu == (pmu->version > 0) and so the only way for amd_pmu_get_pmc() to return a non-NULL value is if both kvm->arch.enable_pmu and pmu->version evaluate to true. No real functional change intended. Reviewed-by: Sandipan Das <sandipan.das@amd.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:56:28 -07:00
Sean Christopherson	4687a2c4e6	KVM: VMX: Setup canonical VMCS config prior to kvm_x86_vendor_init() Setup the golden VMCS config during vmx_init(), before the call to kvm_x86_vendor_init(), instead of waiting until the callback to do hardware setup. setup_vmcs_config() only touches VMX state, i.e. doesn't poke anything in kvm.ko, and has no runtime dependencies beyond hv_init_evmcs(). Setting the VMCS config early on will allow referencing VMCS and VMX capabilities at any point during setup, e.g. to check for PERF_GLOBAL_CTRL save/load support during mediated PMU initialization. Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:56:28 -07:00
Shanker Donthineni	cc80537caa	arm64: cpufeature: Add Olympus MIDR to BBML2 allow list The NVIDIA Olympus core supports BBML2 without conflict abort. Add its MIDR to the allow list to enable FEAT_BBM. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 20:21:37 +01:00
Shanker Donthineni	e185c8a0d8	arm64: cputype: Add NVIDIA Olympus definitions Add cpu part and model macro definitions for NVIDIA Olympus core. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 20:21:36 +01:00
Nick Chan	70fa521f4d	arm64: dts: apple: t8015: Add SPMI node Add SPMI node for Apple A11 SoC. Signed-off-by: Nick Chan <towinchenmi@gmail.com> Signed-off-by: Sven Peter <sven@kernel.org>	2025-09-18 21:13:45 +02:00
Nick Chan	8f6e6934e3	arm64: dts: apple: t8012: Add SPMI node Add SPMI node for Apple T2 SoC. Signed-off-by: Nick Chan <towinchenmi@gmail.com> Signed-off-by: Sven Peter <sven@kernel.org>	2025-09-18 21:13:45 +02:00
Hector Martin	637f7d2c73	arm64: dts: apple: Add J180d (Mac Pro, M2 Ultra, 2023) device tree The M2 Ultra in the Mac Pro differs from the M2 Ultra Mac Studio in its PCIe setup. It uses all available 16 PCIe Gen4 on the first die and 8 PCIe Gen4 lanes on the second die to connect to a 100 lane Microchip Switchtec PCIe switch. All internal PCIe devices and the PCIe slots are connected to the PCIe switch. Each die implements a PCIe controller with a single 16 or 8 lane port. The PCIe controller is mostly compatible with existing implementation in pcie-apple.c. The resources for other 8 lanes on the second die are used to connect the NVMe flash with the controller in the SoC. This initial device tree does not include PCIe support. Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Neal Gompa <neal@gompa.dev> Co-developed-by: Janne Grunau <j@jannau.net> Signed-off-by: Janne Grunau <j@jannau.net> Reviewed-by: Sven Peter <sven@kernel.org> Signed-off-by: Sven Peter <sven@kernel.org>	2025-09-18 19:06:13 +00:00
Mikko Rapeli	a1b20e0622	ARM: rockchip: remove REGULATOR conditional to PM PM is explicitly enabled in lines just below so REGULATOR can be too. Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org> Link: https://lore.kernel.org/r/20250915083317.2885761-5-mikko.rapeli@linaro.org Signed-off-by: Heiko Stuebner <heiko@sntech.de>	2025-09-18 21:05:39 +02:00
Kaison Deng	93781211e9	arm64: dts: rockchip: Add devicetree for the ROC-RK3588-RT Link: https://en.t-firefly.com/product/industry/rocrk3588rt The Firefly ROC-RK3588-RT is RK3588 based SBC featuring: - TF card slot - SATA 2242 socket - 1x USB 3.0 Port, 1x USB 2.0 Port, 1x Typec Port - 1x HDMI 2.1 out, 1x HDMI 2.0 out - 2x Gigabit Ethernet, 1x 2.5G Ethernet - M.2 E-KEY for Extended WiFI and Bluetoolh - ES8388 on-board sound codec - jack in/out - RTC - LED: WORK, DIY Signed-off-by: Kaison Deng <dkx@t-chip.com.cn> Reviewed-by: Andrew Lunn <andrew@lunn.ch> #gmac0, gmac1, mdio0, mdio1 nodes Link: https://lore.kernel.org/r/349c4226824efa52ceb14e3d8518c8bb5c7465fc.1757902513.git.dkx@t-chip.com.cn Signed-off-by: Heiko Stuebner <heiko@sntech.de>	2025-09-18 20:59:59 +02:00
Jakub Kicinski	f2cdc4c22b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc7). No conflicts. Adjacent changes: drivers/net/ethernet/mellanox/mlx5/core/en/fs.h `9536fbe10c` ("net/mlx5e: Add PSP steering in local NIC RX") `7601a0a462` ("net/mlx5e: Add a miss level for ipsec crypto offload") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-18 11:26:06 -07:00
Chukun Pan	cf311ff5e7	arm64: dts: rockchip: update pinctrl names for Radxa E52C Updated the pinctrl names of the user key and power LED according to the schematic. Also updated the nodenames of other pinctrls. Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn> Link: https://lore.kernel.org/r/20250901100027.164594-4-amadeus@jmu.edu.cn Signed-off-by: Heiko Stuebner <heiko@sntech.de>	2025-09-18 19:32:24 +02:00
Chukun Pan	3edb9c95ff	arm64: dts: rockchip: remove vcc_3v3_pmu regulator for Radxa E52C According to Radxa E52C Schematic V1.2 [1] page 5, vcc_3v3_pmu is directly connected to vcc_3v3_s3 via a 0 ohm resistor. The vcc_3v3_pmu is not a new regulator, so remove it. [1] https://dl.radxa.com/e/e52c/hw/radxa_e52c_v1.2_schematic.pdf Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn> Link: https://lore.kernel.org/r/20250901100027.164594-3-amadeus@jmu.edu.cn Signed-off-by: Heiko Stuebner <heiko@sntech.de>	2025-09-18 19:32:24 +02:00
Mark Rutland	52b49bd6de	arm64: cputype: Remove duplicate Cortex-X1C definitions We currently have duplicate definitions for ARM_CPU_PART_CORTEX_X1C and MIDR_CORTEX_X1C as a result of commits: `58d245e03c` ("arm64: cputype: Add Cortex-X1C definitions") `efe676a1a7` ("arm64: proton-pack: Add new CPUs 'k' values for branch mitigation") Due to inconsistent sorting when adding entries, there was no textual conflict between the two patches. Delete the duplicate definitions added by the latter commit. The definitions in general are largely (but not entirely) in order of the MIDR_EL1.PartNum value rather than by CPU name, and the remaining Cortex-X1C definitions appear later in the list. For now I haven't sorted the remaining MIDR definitions to minimize churn. I intend to perform some larger cleanup of these in the near future which should supersede that anyhow. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 17:51:50 +01:00
Linus Torvalds	86cc796e5e	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "These are mostly Oliver's Arm changes: lock ordering fixes for the vGIC, and reverts for a buggy attempt to avoid RCU stalls on large VMs. Arm: - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when visiting from an MMU notifier - Fixes to the TLB match process and TLB invalidation range for managing the VCNR pseudo-TLB - Prevent SPE from erroneously profiling guests due to UNKNOWN reset values in PMSCR_EL1 - Fix save/restore of host MDCR_EL2 to account for eagerly programming at vcpu_load() on VHE systems - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios where an xarray's spinlock was nested with a raw spinlock - Permit stage-2 read permission aborts which are possible in the case of NV depending on the guest hypervisor's stage-2 translation - Call raw_spin_unlock() instead of the internal spinlock API - Fix parameter ordering when assigning VBAR_EL1 - Reverted a couple of fixes for RCU stalls when destroying a stage-2 page table. There appears to be some nasty refcounting / UAF issues lurking in those patches and the band-aid we tried to apply didn't hold. s390: - mm fixes, including userfaultfd bug fix x86: - Sync the vTPR from the local APIC to the VMCB even when AVIC is active. This fixes a bug where host updates to the vTPR, e.g. via KVM_SET_LAPIC or emulation of a guest access, are lost and result in interrupt delivery issues in the guest" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active Revert "KVM: arm64: Split kvm_pgtable_stage2_destroy()" Revert "KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables" KVM: arm64: vgic: fix incorrect spinlock API usage KVM: arm64: Remove stage 2 read fault check KVM: arm64: Fix parameter ordering for VBAR_EL1 assignment KVM: arm64: nv: Fix incorrect VNCR invalidation range calculation KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks KVM: arm64: Spin off release helper from vgic_put_irq() KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs KVM: arm64: vgic: Drop stale comment on IRQ active state KVM: arm64: VHE: Save and restore host MDCR_EL2 value correctly KVM: arm64: Initialize PMSCR_EL1 when in VHE KVM: arm64: nv: fix VNCR TLB ASID match logic for non-Global entries KVM: s390: Fix FOLL_/FAULT_FLAG_ confusion KVM: s390: Fix incorrect usage of mmu_notifier_register() KVM: s390: Fix access to unavailable adapter indicator pages during postcopy KVM: arm64: Mark freed S2 MMUs as invalid	2025-09-18 09:42:55 -07:00
Linus Torvalds	f03e578c8a	Merge tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML fixes from Johannes Berg: "A few fixes for UML, which I'd meant to send earlier but then forgot. All of them are pretty long-standing issues that are either not really happening (the UAF), in rarely used code (the FD buffer issue), or an issue only for some host configurations (the executable stack): - mark stack not executable to work on more modern systems with selinux - fix use-after-free in a virtio error path - fix stack buffer overflow in external unix socket FD receive function" * tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: Fix FD copy size in os_rcv_fd_msg() um: virtio_uml: Fix use-after-free after put_device in probe um: Don't mark stack executable	2025-09-18 09:18:27 -07:00
Oliver Upton	3af1105c4f	KVM: arm64: nv: Apply guest's MDCR traps in nested context KVM needs to ensure the guest hypervisor's traps take effect when the vCPU is in a nested context. While supporting infrastructure is in place for most of the EL2 trap registers, MDCR_EL2 is not. Fold the guest's trap configuration into the effective MDCR_EL2. Apply it directly to the in-memory representation as it gets recomputed on every vcpu_load() anyway. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-18 16:46:20 +01:00
Oliver Upton	4a68408842	KVM: arm64: nv: Trap debug registers when in hyp context In case you haven't realized it yet, the architecture is _slightly_ broken in the context of nested virt. Here we have another example of FEAT_NV2 redirecting a sysreg (MDSCR_EL1) to memory that actually affects execution at vEL2. Fortunately, MDCR_EL2.TDA provides the necessary traps to hide this mess at the expense of unnecessarily trapping the breakpoint/watchpoint registers. Yes, FEAT_FGT gives us a precise trap but let's just opt for obvious correctness to start. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-18 16:46:20 +01:00
Guo Ren (Alibaba DAMO Academy)	16d18e3eaf	riscv: Move vendor errata definitions to new header Move vendor errata definitions into errata_list_vendors.h. Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Han Gao <rabenda.cn@gmail.com> Link: https://lore.kernel.org/r/20250713155321.2064856-2-guoren@kernel.org [pjw@kernel.org: updated to apply and to make the whitespace consistent] Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-09-18 08:22:00 -06:00
Heinrich Schuchardt	92c4995b4d	RISC-V: ACPI: enable parsing the BGRT table The BGRT table is used to display a vendor logo during the boot process. Add the code for parsing it. Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Link: https://lore.kernel.org/r/20250729131535.522205-2-heinrich.schuchardt@canonical.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-09-18 08:21:45 -06:00
Pu Lehui	205cbc7148	riscv: Enable ARCH_HAVE_NMI_SAFE_CMPXCHG The implement of cmpxchg() in riscv is based on atomic primitives and has NMI-safe features, so it can be used safely in the in_nmi context. ftrace's ringbuffer relies on NMI-safe cmpxchg() in the NMI context. Currently, in_nmi() is true when riscv kprobe is in trap-based mode, so this config needs to be selected, otherwise kprobetrace will not be available. Signed-off-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250711090443.1688404-1-pulehui@huaweicloud.com [pjw@kernel.org: moved to preserve alphabetical order] Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-09-18 08:20:59 -06:00
Masahiro Yamada	6dab7e15c0	riscv: pi: use 'targets' instead of extra-y in Makefile %.pi.o files are built as prerequisites of other objects. There is no need to use extra-y, which is planned for deprecation. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20250602181023.528550-1-masahiroy@kernel.org Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-09-18 08:20:56 -06:00
Ignacio Encinas	cc2294d3f9	riscv: introduce asm/swab.h Implement endianness swap macros for RISC-V. Use the rev8 instruction when Zbb is available. Otherwise, rely on the default mask-and-shift implementation. Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Ignacio Encinas <ignacio@iencinas.com> Link: https://lore.kernel.org/r/20250723-riscv-swab-v6-1-fc11e9a2efc9@iencinas.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-09-18 08:20:25 -06:00
Jessica Liu	316b60b984	riscv: mmap(): use unsigned offset type in riscv_sys_mmap The variable type of offset should be consistent with the relevant interfaces of mmap which described in commit `295f10061a` ("syscalls: mmap(): use unsigned offset type consistently"). Otherwise, a user input with the top bit set would result in a negative page offset rather than a large one. Signed-off-by: Jessica Liu <liu.xuemei1@zte.com.cn> Tested-by: Han Gao <rabenda.cn@gmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Link: https://lore.kernel.org/r/20250801104948133AaMr5S6E382PbNNhoJgHA@zte.com.cn [pjw@kernel.org: hand-applied mangled patch; fixed checkpatch error] Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-09-18 08:19:47 -06:00
Junhui Liu	17e9521044	riscv: mm: Use mmu-type from FDT to limit SATP mode Some RISC-V implementations may hang when attempting to write an unsupported SATP mode, even though the latest RISC-V specification states such writes should have no effect. To avoid this issue, the logic for selecting SATP mode has been refined: The kernel now determines the SATP mode limit by taking the minimum of the value specified by the kernel command line (noXlvl) and the "mmu-type" property in the device tree (FDT). If only one is specified, use that. - If the resulting limit is sv48 or higher, the kernel will probe SATP modes from this limit downward until a supported mode is found. - If the limit is sv39, the kernel will directly use sv39 without probing. This ensures SATP mode selection is safe and compatible with both hardware and user configuration, minimizing the risk of hangs. Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Link: https://lore.kernel.org/r/20250722-satp-from-fdt-v1-2-5ba22218fa5f@pigmoral.tech Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-09-18 08:18:14 -06:00
James Clark	00d7a1af5a	arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS SPE data source filtering (optional from Armv8.8) requires that traps to the filter register PMSDSFR be disabled. Document the requirements and disable the traps if the feature is present. Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:17:02 +01:00
James Clark	510a8fa49d	arm64/boot: Factor out a macro to check SPE version We check the version of SPE twice, and we'll add one more check in the next commit so factor out a macro to do this. Change the #3 magic number to the actual SPE version define (V1p2) to make it more readable. No functional changes intended. Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:17:02 +01:00
James Clark	b4401403af	perf: arm_spe: Support FEAT_SPEv1p4 filters FEAT_SPEv1p4 (optional from Armv8.8) adds some new filter bits and also makes some previously available bits unavailable again e.g: E[30], bit [30] When FEAT_SPEv1p4 is _not_ implemented ... Continuing to hard code the valid filter bits for each version isn't scalable, and it also doesn't work for filter bits that aren't related to SPE version. For example most bits have a further condition: E[15], bit [15] When ... and filtering on event 15 is supported: Whether "filtering on event 15" is implemented or not is only discoverable from the TRM of that specific CPU or by probing PMSEVFR_EL1. Instead of hard coding them, write all 1s to the PMSEVFR_EL1 register and read it back to discover the RES0 bits. Unsupported bits are RAZ/WI so should read as 0s. For any hardware that doesn't strictly follow RAZ/WI for unsupported filters: Any bits that should have been supported in a specific SPE version but now incorrectly appear to be RES0 wouldn't have worked anyway, so it's better to fail to open events that request them rather than behaving unexpectedly. Bits that aren't implemented but also aren't RAZ/WI will be incorrectly reported as supported, but allowing them to be used is harmless. Testing on N1SDP shows the probed RES0 bits to be the same as the hard coded ones. The FVP with SPEv1p4 shows only additional new RES0 bits, i.e. no previously hard coded RES0 bits are missing. Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:17:02 +01:00
James Clark	a7005ff2d0	arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register Add new fields and register that are introduced for the features FEAT_SPE_EFT (extended filtering) and FEAT_SPE_FDS (data source filtering). Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 14:17:02 +01:00

... 11 12 13 14 15 ...

238443 Commits