* for-next/pauth:
arm64: Add support of PAuth QARMA3 architected algorithm
arm64: cpufeature: Mark existing PAuth architected algorithm as QARMA5
arm64: cpufeature: Account min_field_value when cheking secondaries for PAuth
* for-next/mte:
docs: sysfs-devices-system-cpu: document "asymm" value for mte_tcf_preferred
arm64/mte: Remove asymmetric mode from the prctl() interface
kasan: fix a missing header include of static_keys.h
arm64/mte: Add userspace interface for enabling asymmetric mode
arm64/mte: Add hwcap for asymmetric mode
arm64/mte: Add a little bit of documentation for mte_update_sctlr_user()
arm64/mte: Document ABI for asymmetric mode
arm64: mte: avoid clearing PSTATE.TCO on entry unless necessary
kasan: split kasan_*enabled() functions into a separate header
* for-next/misc:
arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
arm64: clean up tools Makefile
arm64: drop unused includes of <linux/personality.h>
arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
arm64: prevent instrumentation of bp hardening callbacks
arm64: cpufeature: Remove cpu_has_fwb() check
arm64: atomics: remove redundant static branch
arm64: entry: Save some nops when CONFIG_ARM64_PSEUDO_NMI is not set
* for-next/linkage:
arm64: module: remove (NOLOAD) from linker script
linkage: remove SYM_FUNC_{START,END}_ALIAS()
x86: clean up symbol aliasing
arm64: clean up symbol aliasing
linkage: add SYM_FUNC_ALIAS{,_LOCAL,_WEAK}()
As pointed out by Evgenii Stepanov one potential issue with the new ABI for
enabling asymmetric is that if there are multiple places where MTE is
configured in a process, some of which were compiled with the old prctl.h
and some of which were compiled with the new prctl.h, there may be problems
keeping track of which MTE modes are requested. For example some code may
disable only sync and async modes leaving asymmetric mode enabled when it
intended to fully disable MTE.
In order to avoid such mishaps remove asymmetric mode from the prctl(),
instead implicitly allowing it if both sync and async modes are requested.
This should not disrupt userspace since a process requesting both may
already see a mix of sync and async modes due to differing defaults between
CPUs or changes in default while the process is running but it does mean
that userspace is unable to explicitly request asymmetric mode without
changing the system default for CPUs.
Reported-by: Evgenii Stepanov <eugenis@google.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Branislav Rankov <branislav.rankov@arm.com>
Link: https://lore.kernel.org/r/20220309131200.112637-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Commit 031495635b ("arm64: Do not defer reserve_crashkernel() for
platforms with no DMA memory zones") introduced different definitions
for 'arm64_dma_phys_limit' depending on CONFIG_ZONE_DMA{,32} based on
a late suggestion from Pasha. Sadly, this results in a build error when
passing W=1:
| arch/arm64/mm/init.c:90:19: error: conflicting type qualifiers for 'arm64_dma_phys_limit'
Drop the 'const' for now and use '__ro_after_init' consistently.
Link: https://lore.kernel.org/r/202203090241.aj7paWeX-lkp@intel.com
Link: https://lore.kernel.org/r/CA+CK2bDbbx=8R=UthkMesWOST8eJMtOGJdfMRTFSwVmo0Vn0EA@mail.gmail.com
Fixes: 031495635b ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones")
Signed-off-by: Will Deacon <will@kernel.org>
Remove unused gen-y.
Remove redundant $(shell ...) because 'mkdir' is done in cmd_gen_cpucaps.
Replace $(filter-out $(PHONY), $^) with the $(real-prereqs) shorthand.
The '&&' in cmd_gen_cpucaps should be replaced with ';' because it is
run under 'set -e' environment.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20220227085232.206529-1-masahiroy@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The following patches resulted in deferring crash kernel reservation to
mem_init(), mainly aimed at platforms with DMA memory zones (no IOMMU),
in particular Raspberry Pi 4.
commit 1a8e1cef76 ("arm64: use both ZONE_DMA and ZONE_DMA32")
commit 8424ecdde7 ("arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges")
commit 0a30c53573 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
commit 2687275a58 ("arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required")
Above changes introduced boot slowdown due to linear map creation for
all the memory banks with NO_BLOCK_MAPPINGS, see discussion[1]. The proposed
changes restore crash kernel reservation to earlier behavior thus avoids
slow boot, particularly for platforms with IOMMU (no DMA memory zones).
Tested changes to confirm no ~150ms boot slowdown on our SoC with IOMMU
and 8GB memory. Also tested with ZONE_DMA and/or ZONE_DMA32 configs to confirm
no regression to deferring scheme of crash kernel memory reservation.
In both cases successfully collected kernel crash dump.
[1] https://lore.kernel.org/all/9436d033-579b-55fa-9b00-6f4b661c2dd7@linux.microsoft.com/
Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Cc: stable@vger.kernel.org
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/1646242689-20744-1-git-send-email-vijayb@linux.microsoft.com
[will: Add #ifdef CONFIG_KEXEC_CORE guards to fix 'crashk_res' references in allnoconfig build]
Signed-off-by: Will Deacon <will@kernel.org>
When a IAR register read races with a GIC interrupt RELEASE event,
GIC-CPU interface could wrongly return a valid INTID to the CPU
for an interrupt that is already released(non activated) instead of 0x3ff.
As a side effect, an interrupt handler could run twice, once with
interrupt priority and then with idle priority.
As a workaround, gic_read_iar is updated so that it will return a
valid interrupt ID only if there is a change in the active priority list
after the IAR read on all the affected Silicons.
Since there are silicon variants where both 23154 and 38545 are applicable,
workaround for erratum 23154 has been extended to address both of them.
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220307143014.22758-1-lcherian@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>
When a contiguous HugeTLB page is mapped, set_pte_at() will be called
CONT_PTES/CONT_PMDS times. Therefore, __sync_icache_dcache() will
flush cache multiple times if the page is executable (to ensure
the I-D cache coherency). However, the first flushing cache already
covers subsequent cache flush operations. So only flusing cache
for the head page if it is a HugeTLB page to avoid redundant cache
flushing. In the next patch, it is also depends on this change
since the tail vmemmap pages of HugeTLB is mapped with read-only
meanning only head page struct can be modified.
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220302084624.33340-1-songmuchun@bytedance.com
Signed-off-by: Will Deacon <will@kernel.org>
We may call arm64_apply_bp_hardening() early during entry (e.g. in
el0_ia()) before it is safe to run instrumented code. Unfortunately this
may result in running instrumented code in two cases:
* The hardening callbacks called by arm64_apply_bp_hardening() are not
marked as `noinstr`, and have been observed to be instrumented when
compiled with either GCC or LLVM.
* Since arm64_apply_bp_hardening() itself is only marked as `inline`
rather than `__always_inline`, it is possible that the compiler
decides to place it out-of-line, whereupon it may be instrumented.
For example, with defconfig built with clang 13.0.0,
call_hvc_arch_workaround_1() is compiled as:
| <call_hvc_arch_workaround_1>:
| d503233f paciasp
| f81f0ffe str x30, [sp, #-16]!
| 320183e0 mov w0, #0x80008000
| d503201f nop
| d4000002 hvc #0x0
| f84107fe ldr x30, [sp], #16
| d50323bf autiasp
| d65f03c0 ret
... but when CONFIG_FTRACE=y and CONFIG_KCOV=y this is compiled as:
| <call_hvc_arch_workaround_1>:
| d503245f bti c
| d503201f nop
| d503201f nop
| d503233f paciasp
| a9bf7bfd stp x29, x30, [sp, #-16]!
| 910003fd mov x29, sp
| 94000000 bl 0 <__sanitizer_cov_trace_pc>
| 320183e0 mov w0, #0x80008000
| d503201f nop
| d4000002 hvc #0x0
| a8c17bfd ldp x29, x30, [sp], #16
| d50323bf autiasp
| d65f03c0 ret
... with a patchable function entry registered with ftrace, and a direct
call to __sanitizer_cov_trace_pc(). Neither of these are safe early
during entry sequences.
This patch avoids the unsafe instrumentation by marking
arm64_apply_bp_hardening() as `__always_inline` and by marking the
hardening functions as `noinstr`. This avoids the potential for
instrumentation, and causes clang to consistently generate the function
as with the defconfig sample.
Note: in the defconfig compilation, when CONFIG_SVE=y, x30 is spilled to
the stack without being placed in a frame record, which will result in a
missing entry if call_hvc_arch_workaround_1() is backtraced. Similar is
true of qcom_link_stack_sanitisation(), where inline asm spills the LR
to a GPR prior to corrupting it. This is not a significant issue
presently as we will only backtrace here if an exception is taken, and
in such cases we may omit entries for other reasons today.
The relevant hardening functions were introduced in commits:
ec82b567a7 ("arm64: Implement branch predictor hardening for Falkor")
b092201e00 ("arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support")
... and these were subsequently moved in commit:
d4647f0a2a ("arm64: Rewrite Spectre-v2 mitigation code")
The arm64_apply_bp_hardening() function was introduced in commit:
0f15adbb28 ("arm64: Add skeleton to harden the branch predictor against aliasing attacks")
... and was subsequently moved and reworked in commit:
6279017e80 ("KVM: arm64: Move BP hardening helpers into spectre.h")
Fixes: ec82b567a7 ("arm64: Implement branch predictor hardening for Falkor")
Fixes: b092201e00 ("arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support")
Fixes: d4647f0a2a ("arm64: Rewrite Spectre-v2 mitigation code")
Fixes: 0f15adbb28 ("arm64: Add skeleton to harden the branch predictor against aliasing attacks")
Fixes: 6279017e80 ("KVM: arm64: Move BP hardening helpers into spectre.h")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220224181028.512873-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
The architecture provides an asymmetric mode for MTE where tag mismatches
are checked asynchronously for stores but synchronously for loads. Allow
userspace processes to select this and make it available as a default mode
via the existing per-CPU sysfs interface.
Since there PR_MTE_TCF_ values are a bitmask (allowing the kernel to choose
between the multiple modes) and there are no free bits adjacent to the
existing PR_MTE_TCF_ bits the set of bits used to specify the mode becomes
disjoint. Programs using the new interface should be aware of this and
programs that do not use it will not see any change in behaviour.
When userspace requests two possible modes but the system default for the
CPU is the third mode (eg, default is synchronous but userspace requests
either asynchronous or asymmetric) the preference order is:
ASYMM > ASYNC > SYNC
This situation is not currently possible since there are only two modes and
it is mandatory to have a system default so there could be no ambiguity and
there is no ABI change. The chosen order is basically arbitrary as we do not
have a clear metric for what is better here.
If userspace requests specifically asymmetric mode via the prctl() and the
system does not support it then we will return an error, this mirrors
how we handle the case where userspace enables MTE on a system that does
not support MTE at all and the behaviour that will be seen if running on
an older kernel that does not support userspace use of asymmetric mode.
Attempts to set asymmetric mode as the default mode will result in an error
if the system does not support it.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Tested-by: Branislav Rankov <branislav.rankov@arm.com>
Link: https://lore.kernel.org/r/20220216173224.2342152-5-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
cpu_has_fwb() is supposed to warn user is following architectural
requirement is not valid:
LoUU, bits [29:27] - Level of Unification Uniprocessor for the cache
hierarchy.
Note
When FEAT_S2FWB is implemented, the architecture requires that
this field is zero so that no levels of data cache need to be
cleaned in order to manage coherency with instruction fetches.
LoUIS, bits [23:21] - Level of Unification Inner Shareable for the
cache hierarchy.
Note
When FEAT_S2FWB is implemented, the architecture requires that
this field is zero so that no levels of data cache need to be
cleaned in order to manage coherency with instruction fetches.
It is not really clear what user have to do if assertion fires. Having
assertions about the CPU design like this inspire even more assertions
to be added and the kernel definitely is not the right place for that,
so let's remove cpu_has_fwb() altogether.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Link: https://lore.kernel.org/r/20220224164739.119168-1-vladimir.murzin@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
QARMA3 is relaxed version of the QARMA5 algorithm which expected to
reduce the latency of calculation while still delivering a suitable
level of security.
Support for QARMA3 can be discovered via ID_AA64ISAR2_EL1
APA3, bits [15:12] Indicates whether the QARMA3 algorithm is
implemented in the PE for address
authentication in AArch64 state.
GPA3, bits [11:8] Indicates whether the QARMA3 algorithm is
implemented in the PE for generic code
authentication in AArch64 state.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220224124952.119612-4-vladimir.murzin@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
On some microarchitectures, clearing PSTATE.TCO is expensive. Clearing
TCO is only necessary if in-kernel MTE is enabled, or if MTE is
enabled in the userspace process in synchronous (or, soon, asymmetric)
mode, because we do not report uaccess faults to userspace in none
or asynchronous modes. Therefore, adjust the kernel entry code to
clear TCO only if necessary.
Because it is now possible to switch to a task in which TCO needs to
be clear from a task in which TCO is set, we also need to do the same
thing on task switch.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/I52d82a580bd0500d420be501af2c35fa8c90729e
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220219012945.894950-2-pcc@google.com
Signed-off-by: Will Deacon <will@kernel.org>
It is a preparation patch for eBPF atomic supports under arm64. eBPF
needs support atomic[64]_fetch_add, atomic[64]_[fetch_]{and,or,xor} and
atomic[64]_{xchg|cmpxchg}. The ordering semantics of eBPF atomics are
the same with the implementations in linux kernel.
Add three helpers to support LDCLR/LDEOR/LDSET/SWP, CAS and DMB
instructions. STADD/STCLR/STEOR/STSET are simply encoded as aliases for
LDADD/LDCLR/LDEOR/LDSET with XZR as the destination register, so no extra
helper is added. atomic_fetch_add() and other atomic ops needs support for
STLXR instruction, so extend enum aarch64_insn_ldst_type to do that.
LDADD/LDEOR/LDSET/SWP and CAS instructions are only available when LSE
atomics is enabled, so just return AARCH64_BREAK_FAULT directly in
these newly-added helpers if CONFIG_ARM64_LSE_ATOMICS is disabled.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220217072232.1186625-3-houtao1@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
If CONFIG_ARM64_LSE_ATOMICS is off, encoders for LSE-related instructions
can return AARCH64_BREAK_FAULT directly in insn.h. In order to access
AARCH64_BREAK_FAULT in insn.h, we can not include debug-monitors.h in
insn.h, because debug-monitors.h has already depends on insn.h, so just
move AARCH64_BREAK_FAULT into insn-def.h.
It will be used by the following patch to eliminate unnecessary LSE-related
encoders when CONFIG_ARM64_LSE_ATOMICS is off.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220217072232.1186625-2-houtao1@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Now that we have SYM_FUNC_ALIAS() and SYM_FUNC_ALIAS_WEAK(), use those
to simplify the definition of function aliases across arch/x86.
For clarity, where there are multiple annotations such as
EXPORT_SYMBOL(), I've tried to keep annotations grouped by symbol. For
example, where a function has a name and an alias which are both
exported, this is organised as:
SYM_FUNC_START(func)
... asm insns ...
SYM_FUNC_END(func)
EXPORT_SYMBOL(func)
SYM_FUNC_ALIAS(alias, func)
EXPORT_SYMBOL(alias)
Where there are only aliases and no exports or other annotations, I have
not bothered with line spacing, e.g.
SYM_FUNC_START(func)
... asm insns ...
SYM_FUNC_END(func)
SYM_FUNC_ALIAS(alias, func)
The tools/perf/ copies of memset_64.S and memset_64.S are updated
likewise to avoid the build system complaining these are mismatched:
| Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
| diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S
| Warning: Kernel ABI header at 'tools/arch/x86/lib/memset_64.S' differs from latest version at 'arch/x86/lib/memset_64.S'
| diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220216162229.1076788-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Now that we have SYM_FUNC_ALIAS() and SYM_FUNC_ALIAS_WEAK(), use those
to simplify and more consistently define function aliases across
arch/arm64.
Aliases are now defined in terms of a canonical function name. For
position-independent functions I've made the __pi_<func> name the
canonical name, and defined other alises in terms of this.
The SYM_FUNC_{START,END}_PI(func) macros obscure the __pi_<func> name,
and make this hard to seatch for. The SYM_FUNC_START_WEAK_PI() macro
also obscures the fact that the __pi_<func> fymbol is global and the
<func> symbol is weak. For clarity, I have removed these macros and used
SYM_FUNC_{START,END}() directly with the __pi_<func> name.
For example:
SYM_FUNC_START_WEAK_PI(func)
... asm insns ...
SYM_FUNC_END_PI(func)
EXPORT_SYMBOL(func)
... becomes:
SYM_FUNC_START(__pi_func)
... asm insns ...
SYM_FUNC_END(__pi_func)
SYM_FUNC_ALIAS_WEAK(func, __pi_func)
EXPORT_SYMBOL(func)
For clarity, where there are multiple annotations such as
EXPORT_SYMBOL(), I've tried to keep annotations grouped by symbol. For
example, where a function has a name and an alias which are both
exported, this is organised as:
SYM_FUNC_START(func)
... asm insns ...
SYM_FUNC_END(func)
EXPORT_SYMBOL(func)
SYM_FUNC_ALIAS(alias, func)
EXPORT_SYMBOL(alias)
For consistency with the other string functions, I've defined strrchr as
a position-independent function, as it can safely be used as such even
though we have no users today.
As we no longer use SYM_FUNC_{START,END}_ALIAS(), our local copies are
removed. The common versions will be removed by a subsequent patch.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220216162229.1076788-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
For each vma mapped with PROT_MTE (the VM_MTE flag set), generate a
PT_ARM_MEMTAG_MTE segment in the core file and dump the corresponding
tags. The in-file size for such segments is 128 bytes per page.
For pages in a VM_MTE vma which are not present in the user page tables
or don't have the PG_mte_tagged flag set (e.g. execute-only), just write
zeros in the core file.
An example of program headers for two vmas, one 2-page, the other 4-page
long:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
...
LOAD 0x030000 0x0000ffff80034000 0x0000000000000000 0x000000 0x002000 RW 0x1000
LOAD 0x030000 0x0000ffff80036000 0x0000000000000000 0x004000 0x004000 RW 0x1000
...
LOPROC+0x1 0x05b000 0x0000ffff80034000 0x0000000000000000 0x000100 0x002000 0
LOPROC+0x1 0x05b100 0x0000ffff80036000 0x0000000000000000 0x000200 0x004000 0
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Luis Machado <luis.machado@linaro.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220131165456.2160675-5-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Due to a historical oversight, we emit a redundant static branch for
each atomic/atomic64 operation when CONFIG_ARM64_LSE_ATOMICS is
selected. We can safely remove this, making the kernel Image reasonably
smaller.
When CONFIG_ARM64_LSE_ATOMICS is selected, every LSE atomic operation
has two preceding static branches with the same target, e.g.
b f7c <kernel_init_freeable+0xa4>
b f7c <kernel_init_freeable+0xa4>
mov w0, #0x1 // #1
ldadd w0, w0, [x19]
This is because the __lse_ll_sc_body() wrapper uses
system_uses_lse_atomics(), which checks both `arm64_const_caps_ready`
and `cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]`, each of which emits a
static branch. This has been the case since commit:
addfc38672 ("arm64: atomics: avoid out-of-line ll/sc atomics")
However, there was never a need to check `arm64_const_caps_ready`, which
was itself introduced in commit:
63a1e1c95e ("arm64/cpufeature: don't use mutex in bringup path")
... so that cpus_have_const_cap() could fall back to checking the
`cpu_hwcaps` bitmap prior to the static keys for individual caps
becoming enabled. As system_uses_lse_atomics() doesn't check
`cpu_hwcaps`, and doesn't need to as we can safely use the LL/SC atomics
prior to enabling the `ARM64_HAS_LSE_ATOMICS` static key, it doesn't
need to check `arm64_const_caps_ready`.
This patch removes the `arm64_const_caps_ready` check from
system_uses_lse_atomics(). As the arch_atomic_* routines are meant to be
safely usable in noinstr code, I've also marked
system_uses_lse_atomics() as __always_inline.
This results in one fewer static branch per atomic operation, with the
prior example becoming:
b f78 <kernel_init_freeable+0xa0>
mov w0, #0x1 // #1
ldadd w0, w0, [x19]
Each static branch consists of the branch itself and an associated
__jump_table entry. Removing these has a reasonable impact on the Image
size, with a GCC 11.1.0 defconfig v5.17-rc2 Image being reduced by
128KiB:
| [mark@lakrids:~/src/linux]% ls -al Image*
| -rw-r--r-- 1 mark mark 34619904 Feb 3 18:24 Image.baseline
| -rw-r--r-- 1 mark mark 34488832 Feb 3 18:33 Image.onebranch
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220204104439.270567-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
When the insn framework is used to encode an AND/ORR/EOR instruction,
aarch64_encode_immediate() is used to pick the immr imms values.
If the immediate is a 64bit mask, with bit 63 set, and zeros in any
of the upper 32 bits, the immr value is incorrectly calculated meaning
the wrong mask is generated.
For example, 0x8000000000000001 should have an immr of 1, but 32 is used,
meaning the resulting mask is 0x0000000300000000.
It would appear eBPF is unable to hit these cases, as build_insn()'s
imm value is a s32, so when used with BPF_ALU64, the sign-extended
u64 immediate would always have all-1s or all-0s in the upper 32 bits.
KVM does not generate a va_mask with any of the top bits set as these
VA wouldn't be usable with TTBR0_EL2.
This happens because the rotation is calculated from fls(~imm), which
takes an unsigned int, but the immediate may be 64bit.
Use fls64() so the 64bit mask doesn't get truncated to a u32.
Signed-off-by: James Morse <james.morse@arm.com>
Brown-paper-bag-for: Marc Zyngier <maz@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127162127.2391947-4-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
The 'fixmap' is a global resource and is used recursively by
create pud mapping(), leading to a potential race condition in the
presence of a concurrent call to alloc_init_pud():
kernel_init thread virtio-mem workqueue thread
================== ===========================
alloc_init_pud(...) alloc_init_pud(...)
pudp = pud_set_fixmap_offset(...) pudp = pud_set_fixmap_offset(...)
READ_ONCE(*pudp)
pud_clear_fixmap(...)
READ_ONCE(*pudp) // CRASH!
As kernel may sleep during creating pud mapping, introduce a mutex lock to
serialise use of the fixmap entries by alloc_init_pud(). However, there is
no need for locking in early boot stage and it doesn't work well with
KASLR enabled when early boot. So, enable lock when system_state doesn't
equal to "SYSTEM_BOOTING".
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: f471044545 ("arm64: mm: use fixmap when creating page tables")
Link: https://lore.kernel.org/r/20220201114400.56885-1-jianyong.wu@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Pull perf fixes from Borislav Petkov:
- Intel/PT: filters could crash the kernel
- Intel: default disable the PMU for SMM, some new-ish EFI firmware has
started using CPL3 and the PMU CPL filters don't discriminate against
SMM, meaning that CPL3 (userspace only) events now also count EFI/SMM
cycles.
- Fixup for perf_event_attr::sig_data
* tag 'perf_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/pt: Fix crash with stop filters in single-range mode
perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures
selftests/perf_events: Test modification of perf_event_attr::sig_data
perf: Copy perf_event_attr::sig_data on modification
x86/perf: Default set FREEZE_ON_SMI for all
Pull xen fixes from Juergen Gross:
- documentation fixes related to Xen
- enable x2apic mode when available when running as hardware
virtualized guest under Xen
- cleanup and fix a corner case of vcpu enumeration when running a
paravirtualized Xen guest
* tag 'for-linus-5.17a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/Xen: streamline (and fix) PV CPU enumeration
xen: update missing ioctl magic numers documentation
Improve docs for IOCTL_GNTDEV_MAP_GRANT_REF
xen: xenbus_dev.h: delete incorrect file name
xen/x2apic: enable x2apic mode when supported for HVM
Pull kvm fixes from Paolo Bonzini:
"ARM:
- A couple of fixes when handling an exception while a SError has
been delivered
- Workaround for Cortex-A510's single-step erratum
RISC-V:
- Make CY, TM, and IR counters accessible in VU mode
- Fix SBI implementation version
x86:
- Report deprecation of x87 features in supported CPUID
- Preparation for fixing an interrupt delivery race on AMD hardware
- Sparse fix
All except POWER and s390:
- Rework guest entry code to correctly mark noinstr areas and fix
vtime' accounting (for x86, this was already mostly correct but not
entirely; for ARM, MIPS and RISC-V it wasn't)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointer
KVM: x86: Report deprecated x87 features in supported CPUID
KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
KVM: arm64: Stop handle_exit() from handling HVC twice when an SError occurs
KVM: arm64: Avoid consuming a stale esr value when SError occur
RISC-V: KVM: Fix SBI implementation version
RISC-V: KVM: make CY, TM, and IR counters accessible in VU mode
kvm/riscv: rework guest entry logic
kvm/arm64: rework guest entry logic
kvm/x86: rework guest entry logic
kvm/mips: rework guest entry logic
kvm: add guest_state_{enter,exit}_irqoff()
KVM: x86: Move delivery of non-APICv interrupt into vendor code
kvm: Move KVM_GET_XSAVE2 IOCTL definition at the end of kvm.h
KVM/arm64 fixes for 5.17, take #2
- A couple of fixes when handling an exception while a SError has been
delivered
- Workaround for Cortex-A510's single-step[ erratum
Pull random number generator fixes from Jason Donenfeld:
"For this week, we have:
- A fix to make more frequent use of hwgenerator randomness, from
Dominik.
- More cleanups to the boot initialization sequence, from Dominik.
- A fix for an old shortcoming with the ZAP ioctl, from me.
- A workaround for a still unfixed Clang CFI/FullLTO compiler bug,
from me. On one hand, it's a bummer to commit workarounds for
experimental compiler features that have bugs. But on the other, I
think this actually improves the code somewhat, independent of the
bug. So a win-win"
* tag 'random-5.17-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: only call crng_finalize_init() for primary_crng
random: access primary_pool directly rather than through pointer
random: wake up /dev/random writers after zap
random: continually use hwgenerator randomness
lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI
blake2s_compress_generic is weakly aliased by blake2s_compress. The
current harness for function selection uses a function pointer, which is
ordinarily inlined and resolved at compile time. But when Clang's CFI is
enabled, CFI still triggers when making an indirect call via a weak
symbol. This seems like a bug in Clang's CFI, as though it's bucketing
weak symbols and strong symbols differently. It also only seems to
trigger when "full LTO" mode is used, rather than "thin LTO".
[ 0.000000][ T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444)
[ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1
[ 0.000000][ T0] Hardware name: MT6873 (DT)
[ 0.000000][ T0] Call trace:
[ 0.000000][ T0] dump_backtrace+0xfc/0x1dc
[ 0.000000][ T0] dump_stack_lvl+0xa8/0x11c
[ 0.000000][ T0] panic+0x194/0x464
[ 0.000000][ T0] __cfi_check_fail+0x54/0x58
[ 0.000000][ T0] __cfi_slowpath_diag+0x354/0x4b0
[ 0.000000][ T0] blake2s_update+0x14c/0x178
[ 0.000000][ T0] _extract_entropy+0xf4/0x29c
[ 0.000000][ T0] crng_initialize_primary+0x24/0x94
[ 0.000000][ T0] rand_initialize+0x2c/0x6c
[ 0.000000][ T0] start_kernel+0x2f8/0x65c
[ 0.000000][ T0] __primary_switched+0xc4/0x7be4
[ 0.000000][ T0] Rebooting in 5 seconds..
Nonetheless, the function pointer method isn't so terrific anyway, so
this patch replaces it with a simple boolean, which also gets inlined
away. This successfully works around the Clang bug.
In general, I'm not too keen on all of the indirection involved here; it
clearly does more harm than good. Hopefully the whole thing can get
cleaned up down the road when lib/crypto is overhauled more
comprehensively. But for now, we go with a simple bandaid.
Fixes: 6048fdcc5f ("lib/crypto: blake2s: include as built-in")
Link: https://github.com/ClangBuiltLinux/linux/issues/1567
Reported-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
CPUID.(EAX=7,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6] and
CPUID.(EAX=7,ECX=0):EBX.ZERO_FCS_FDS[bit 13] are "defeature"
bits. Unlike most of the other CPUID feature bits, these bits are
clear if the features are present and set if the features are not
present. These bits should be reported in KVM_GET_SUPPORTED_CPUID,
because if these bits are set on hardware, they cannot be cleared in
the guest CPUID. Doing so would claim guest support for a feature that
the hardware doesn't support and that can't be efficiently emulated.
Of course, any software (e.g WIN87EM.DLL) expecting these features to
be present likely predates these CPUID feature bits and therefore
doesn't know to check for them anyway.
Aaron Lewis added the corresponding X86_FEATURE macros in
commit cbb99c0f58 ("x86/cpufeatures: Add FDP_EXCPTN_ONLY and
ZERO_FCS_FDS"), with the intention of reporting these bits in
KVM_GET_SUPPORTED_CPUID, but I was unable to find a proposed patch on
the kvm list.
Opportunistically reordered the CPUID_7_0_EBX capability bits from
least to most significant.
Cc: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220204001348.2844660-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>