When a PCI error is encountered 6th time in an hour we
set the channel state to perm_failure and notify the
driver about the permanent failure.
However, after upstream commit 38ddc01147 ("powerpc/eeh:
Make permanently failed devices non-actionable"), EEH handler
stops calling any routine once the device is marked as
permanent failure. This issue can lead to fatal consequences
like kernel hang with certain PCI devices.
Following log is observed with lpfc driver, with and without
this change, Without this change kernel hangs, If PCI error
is encountered 6 times for a device in an hour.
Without the change
EEH: Beginning: 'error_detected(permanent failure)'
PCI 0132:60:00.0#600000: EEH: not actionable (1,1,1)
PCI 0132:60:00.1#600000: EEH: not actionable (1,1,1)
EEH: Finished:'error_detected(permanent failure)'
With the change
EEH: Beginning: 'error_detected(permanent failure)'
EEH: Invoking lpfc->error_detected(permanent failure)
EEH: lpfc driver reports: 'disconnect'
EEH: Invoking lpfc->error_detected(permanent failure)
EEH: lpfc driver reports: 'disconnect'
EEH: Finished:'error_detected(permanent failure)'
To fix the issue, set channel state to permanent failure after
notifying the drivers.
Fixes: 38ddc01147 ("powerpc/eeh: Make permanently failed devices non-actionable")
Suggested-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230209105649.127707-1-ganeshgr@linux.ibm.com
Use the per-cpu CIF_ENABLED_WAIT flag to decide if an interrupt
occurred while a cpu was idle, instead of checking two conditions
within the old psw.
Also move clearing of the CIF_ENABLED_WAIT bit to the early interrupt
handler, which in turn makes arch_vcpu_is_preempted() also a bit more
precise, since the flag is now cleared before interrupt handlers have
been called.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Let cpu helper functions return boolean values. This also allows to
make the code a bit simpler by getting rid of the "!!" construct.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Baoquan He reported lots of KFENCE reports when /proc/kcore is read,
e.g. with crash or even simpler with dd:
BUG: KFENCE: invalid read in copy_from_kernel_nofault+0x5e/0x120
Invalid read at 0x00000000f4f5149f:
copy_from_kernel_nofault+0x5e/0x120
read_kcore+0x6b2/0x870
proc_reg_read+0x9a/0xf0
vfs_read+0x94/0x270
ksys_read+0x70/0x100
__do_syscall+0x1d0/0x200
system_call+0x82/0xb0
The reason for this is that read_kcore() simply reads memory that might
have been unmapped by KFENCE with copy_from_kernel_nofault(). Any fault due
to pages being unmapped by KFENCE would be handled gracefully by the fault
handler (exception table fixup).
However the s390 fault handler first reports the fault, and only afterwards
would perform the exception table fixup. Most architectures have this in
reversed order, which also avoids the false positive KFENCE reports when an
unmapped page is accessed.
Therefore change the s390 fault handler so it handles exception table
fixups before KFENCE page faults are reported.
Reported-by: Baoquan He <bhe@redhat.com>
Tested-by: Baoquan He <bhe@redhat.com>
Acked-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/r/20230213183858.1473681-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Modify the CPRBX struct to expose a new field ctfm for use with hardware
command filtering within a CEX8 crypto card in CCA coprocessor mode.
The field replaces a reserved byte padding field so that the layout of the
struct and the size does not change.
The new field is used only by user space applications which may use this to
expose the HW filtering facilities in the crypto firmware layers.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Conor Dooley <conor@kernel.org> says:
From: Conor Dooley <conor.dooley@microchip.com>
I've yoinked patch 1 from Drew's series adding support for Zicboz &
attached two more patches here that remove the need for, and then drop
the toolchain support checks for Zicbom. The goal is to remove the need
for checking the presence of toolchain Zicbom support in the work being
done to support non instruction based CMOs [1].
I've tested compliation on a number of different configurations with
the Zicbom config option enabled. The important ones to call out I
guess are:
- clang/llvm 14 w/ LLVM=1 which doesn't support Zicbom atm.
- gcc 11 w/ binutils 2.37 which doesn't support Zicbom atm either.
- clang/llvm 15 w/ LLVM=1 BUT with binutils 2.37's ld. This is the
configuration that prompted adding the LD checks as cc/as supports
Zicbom, but ld doesn't [2].
- gcc 12 w/ binutils 2.39 & clang 15 w/ LLVM=1, both of these supported
Zicbom before and still do.
I also checked building the THEAD errata etc with
CONFIG_RISCV_ISA_ZICBOM disabled, and there were no build issues there
either.
* b4-shazam-merge:
RISC-V: remove toolchain version checks for Zicbom
RISC-V: replace cbom instructions with an insn-def
RISC-V: insn-def: Add I-type insn-def
Link: https://lore.kernel.org/r/20230108163356.3063839-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Commit b8c86872d1 ("riscv: fix detection of toolchain Zicbom
support") fixed building on systems where Zicbom was supported by the
compiler/assembler but not by the linker in an easily backportable
manner.
Now that the we have insn-defs for the 3 instructions, toolchain support
is no longer required for Zicbom.
Stop emitting "_zicbom" in -march when Zicbom is enabled & drop the
version checks entirely.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230108163356.3063839-4-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Conor Dooley <conor@kernel.org> says:
From: Conor Dooley <conor.dooley@microchip.com>
Ever since RISC-V starting using generic arch topology code, the code
paths for cpu-capacity have been there but there's no binding defined to
actually convey the information. Defining the same property as used on
arm seems to be the only logical thing to do, so do it.
[Palmer: This is on top of the fix required to make it work, which
itself wasn't merged until late in the 6.2 cycle and thus pulls in
various other fixes.]
* b4-shazam-merge:
dt-bindings: riscv: add a capacity-dmips-mhz cpu property
dt-bindings: arm: move cpu-capacity to a shared loation
riscv: Move call to init_cpu_topology() to later initialization stage
riscv/kprobe: Fix instruction simulation of JALR
riscv: fix -Wundef warning for CONFIG_RISCV_BOOT_SPINWAIT
MAINTAINERS: add an IRC entry for RISC-V
RISC-V: fix compile error from deduplicated __ALTERNATIVE_CFG_2
dt-bindings: riscv: fix single letter canonical order
dt-bindings: riscv: fix underscore requirement for multi-letter extensions
riscv: uaccess: fix type of 0 variable on error in get_user()
riscv, kprobes: Stricter c.jr/c.jalr decoding
Link: https://lore.kernel.org/r/20230104180513.1379453-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The comment that says mwait_play_dead() returns only on failure is a bit
misleading because mwait_play_dead() could actually return for valid
reasons (such as mwait not being supported by the platform) that do not
indicate a failure of the CPU offline operation. So, remove the comment.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230128003751.141317-1-srivatsa@csail.mit.edu
Previously, R_ALPHA_LITERAL relocations would overflow for large kernel
modules.
This was because the Alpha's apply_relocate_add was relying on the kernel's
module loader to have sorted the GOT towards the very end of the module as it
was mapped into memory in order to correctly assign the global pointer. While
this behavior would mostly work fine for small kernel modules, this approach
would overflow on kernel modules with large GOT's since the global pointer
would be very far away from the GOT, and thus, certain entries would be out of
range.
This patch fixes this by instead using the Tru64 behavior of assigning the
global pointer to be 32KB away from the start of the GOT. The change made
in this patch won't work for multi-GOT kernel modules as it makes the
assumption the module only has one GOT located at the beginning of .got,
although for the vast majority kernel modules, this should be fine. Of the
kernel modules that would previously result in a relocation error, none of
them, even modules like nouveau, have even come close to filling up a single
GOT, and they've all worked fine under this patch.
Signed-off-by: Edward Humes <aurxenon@lunos.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Add a space after ','.
Add spaces around the '=', '>' and '=='.
Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Remove a number of asm headers locally redirected to the respective
generic or generated versions.
For asm-offsets.h all that is needed is a Kbuild entry for the generic
version, and for div64.h, irq_regs.h and kdebug.h nothing is needed as
in their absence they will be redirected automatically according to
include/asm-generic/Kbuild.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Calling the osf_mount system call with an invalid typenr value will
spam the kernel log with error messages. Reduce the spamming by making
it a ratelimited printk. Issue found when exercising with the stress-ng
enosys system call stressor.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Pull kvm fixes from Paolo Bonzini:
"Certain AMD processors are vulnerable to a cross-thread return address
predictions bug. When running in SMT mode and one of the sibling
threads transitions out of C0 state, the other thread gets access to
twice as many entries in the RSB, but unfortunately the predictions of
the now-halted logical processor are not purged. Therefore, the
executing processor could speculatively execute from locations that
the now-halted processor had trained the RSB on.
The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
when context switching to the idle thread. However, KVM allows a VMM
to prevent exiting guest mode when transitioning out of C0 using the
KVM_CAP_X86_DISABLE_EXITS capability can be used by a VMM to change
this behavior. To mitigate the cross-thread return address predictions
bug, a VMM must not be allowed to override the default behavior to
intercept C0 transitions.
These patches introduce a KVM module parameter that, if set, will
prevent the user from disabling the HLT, MWAIT and CSTATE exits"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
Documentation/hw-vuln: Add documentation for Cross-Thread Return Predictions
KVM: x86: Mitigate the cross-thread return address predictions bug
x86/speculation: Identify processors vulnerable to SMT RSB predictions
The comment for addr_t doesn't make too much sense. Given that also
the formatting is incorrect, just remove it.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Get rid of CONFIG_AS_IS_LLVM in entry.S to make the code a bit more
readable. This removes a micro-optimization, but given that the llvm IAS
limitation will likely stay, just use the version that works with llvm.
See commit 4c25f0ff63 ("s390/entry: workaround llvm's IAS limitations")
for further details.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Commit bf64f0517e ("s390/mem_detect: handle online memory limit
just once") introduced truncation of mem_detect online ranges
based on identity mapping size. For kdump case however the full
set of online memory ranges has to be feed into memblock_physmem_add
so that crashed system memory could be extracted.
Instead of truncating introduce a "usable limit" which is respected by
mem_detect api. Also add extra online memory ranges iterator which still
provides full set of online memory ranges disregarding the "usable limit".
Fixes: bf64f0517e ("s390/mem_detect: handle online memory limit just once")
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The __uint128_t member was only added for future convenience to the
__vector128 struct. However this is a uapi header file, 31/32 bit (aka
compat layer) is still supported, but doesn't know anything about this
type:
/usr/include/asm/types.h:27:17: error: unknown type name __uint128_t
27 | __uint128_t v;
Therefore remove it again.
Fixes: b0b7b43fcc ("s390/vx: add 64 and 128 bit members to __vector128 struct")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
RDP instruction allows to reset DAT-protection bit in a PTE, with less
CPU synchronization overhead than IPTE instruction. In particular, IPTE
can cause machine-wide synchronization overhead, and excessive IPTE usage
can negatively impact machine performance.
RDP can be used instead of IPTE, if the new PTE only differs in SW bits
and _PAGE_PROTECT HW bit, for PTE protection changes from RO to RW.
SW PTE bit changes are allowed, e.g. for dirty and young tracking, but none
of the other HW-defined part of the PTE must change. This is because the
architecture forbids such changes to an active and valid PTE, which
is why invalidation with IPTE is always used first, before writing a new
entry.
The RDP optimization helps mainly for fault-driven SW dirty-bit tracking.
Writable PTEs are initially always mapped with HW _PAGE_PROTECT bit set,
to allow SW dirty-bit accounting on first write protection fault, where
the DAT-protection would then be reset. The reset is now done with RDP
instead of IPTE, if RDP instruction is available.
RDP cannot always guarantee that the DAT-protection reset is propagated
to all CPUs immediately. This means that spurious TLB protection faults
on other CPUs can now occur. For this, common code provides a
flush_tlb_fix_spurious_fault() handler, which will now be used to do a
CPU-local TLB flush. However, this will clear the whole TLB of a CPU, and
not just the affected entry. For more fine-grained flushing, by simply
doing a (local) RDP again, flush_tlb_fix_spurious_fault() would need to
also provide the PTE pointer.
Note that spurious TLB protection faults cannot really be distinguished
from racing pagetable updates, where another thread already installed the
correct PTE. In such a case, the local TLB flush would be unnecessary
overhead, but overall reduction of CPU synchronization overhead by not
using IPTE is still expected to be beneficial.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Commit
90b926e68f ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")
broke the use case of running Xen dom0 kernels on machines with an
external disk enclosure attached via USB, see Link tag.
What this commit was originally fixing - SEV-SNP guests on Hyper-V - is
a more specialized situation which has other issues at the moment anyway
so reverting this now and addressing the issue properly later is the
prudent thing to do.
So revert it in time for the 6.2 proper release.
[ bp: Rewrite commit message. ]
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/4fe9541e-4d4c-2b2a-f8c8-2d34a7284930@nerdbynature.de
vpbroadcastb and vpbroadcastd are not AVX instructions.
But the aria-avx assembly code contains these instructions.
So, kernel panic will occur if the aria-avx works on AVX2 unsupported
CPU.
vbroadcastss, and vpshufb are used to avoid using vpbroadcastb in it.
Unfortunately, this change reduces performance by about 5%.
Also, vpbroadcastd is simply replaced by vmovdqa in it.
Fixes: ba3579e6e4 ("crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher")
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* kvm-arm64/nv-prefix:
: Preamble to NV support, courtesy of Marc Zyngier.
:
: This brings in a set of prerequisite patches for supporting nested
: virtualization in KVM/arm64. Of course, there is a long way to go until
: NV is actually enabled in KVM.
:
: - Introduce cpucap / vCPU feature flag to pivot the NV code on
:
: - Add support for EL2 vCPU register state
:
: - Basic nested exception handling
:
: - Hide unsupported features from the ID registers for NV-capable VMs
KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID
KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
KVM: arm64: nv: Filter out unsupported features from ID regs
KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
KVM: arm64: nv: Handle SMCs taken from virtual EL2
KVM: arm64: nv: Handle trapped ERET from virtual EL2
KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
KVM: arm64: nv: Support virtual EL2 exceptions
KVM: arm64: nv: Handle HCR_EL2.NV system register traps
KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
KVM: arm64: nv: Add EL2 system registers to vcpu context
KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
KVM: arm64: nv: Introduce nested virtualization VCPU feature
KVM: arm64: Use the S2 MMU context to iterate over S2 table
arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/misc:
: Miscellaneous updates
:
: - Convert CPACR_EL1_TTA to the new, generated system register
: definitions.
:
: - Serialize toggling CPACR_EL1.SMEN to avoid unexpected exceptions when
: accessing SVCR in the host.
:
: - Avoid quiescing the guest if a vCPU accesses its own redistributor's
: SGIs/PPIs, eliminating the need to IPI. Largely an optimization for
: nested virtualization, as the L1 accesses the affected registers
: rather often.
:
: - Conversion to kstrtobool()
:
: - Common definition of INVALID_GPA across architectures
:
: - Enable CONFIG_USERFAULTFD for CI runs of KVM selftests
KVM: arm64: Fix non-kerneldoc comments
KVM: selftests: Enable USERFAULTFD
KVM: selftests: Remove redundant setbuf()
arm64/sysreg: clean up some inconsistent indenting
KVM: MMU: Make the definition of 'INVALID_GPA' common
KVM: arm64: vgic-v3: Use kstrtobool() instead of strtobool()
KVM: arm64: vgic-v3: Limit IPI-ing when accessing GICR_{C,S}ACTIVER0
KVM: arm64: Synchronize SMEN on vcpu schedule out
KVM: arm64: Kill CPACR_EL1_TTA definition
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/apple-vgic-mi:
: VGIC maintenance interrupt support for the AIC, courtesy of Marc Zyngier.
:
: The AIC provides a non-maskable VGIC maintenance interrupt, which until
: now was not supported by KVM. This series (1) allows the registration of
: a non-maskable maintenance interrupt and (2) wires in support for this
: with the AIC driver.
irqchip/apple-aic: Correctly map the vgic maintenance interrupt
irqchip/apple-aic: Register vgic maintenance interrupt with KVM
KVM: arm64: vgic: Allow registration of a non-maskable maintenance interrupt
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/psci-relay-fixes:
: Fixes for CPU on/resume with pKVM, courtesy Quentin Perret.
:
: A consequence of deprivileging the host is that pKVM relays PSCI calls
: on behalf of the host. pKVM's CPU initialization failed to fully
: initialize the CPU's EL2 state, which notably led to unexpected SVE
: traps resulting in a hyp panic.
:
: The issue is addressed by reusing parts of __finalise_el2 to restore CPU
: state in the PSCI relay.
KVM: arm64: Finalise EL2 state from pKVM PSCI relay
KVM: arm64: Use sanitized values in __check_override in nVHE
KVM: arm64: Introduce finalise_el2_state macro
KVM: arm64: Provide sanitized SYS_ID_AA64SMFR0_EL1 to nVHE
* kvm-arm64/nv-timer-improvements:
: Timer emulation improvements, courtesy of Marc Zyngier.
:
: - Avoid re-arming an hrtimer for a guest timer that is already pending
:
: - Only reload the affected timer context when emulating a sysreg access
: instead of both the virtual/physical timers.
KVM: arm64: timers: Don't BUG() on unhandled timer trap
KVM: arm64: Reduce overhead of trapped timer sysreg accesses
KVM: arm64: Don't arm a hrtimer for an already pending timer
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/parallel-access-faults:
: Parallel stage-2 access fault handling
:
: The parallel faults changes that went in to 6.2 covered most stage-2
: aborts, with the exception of stage-2 access faults. Building on top of
: the new infrastructure, this series adds support for handling access
: faults (i.e. updating the access flag) in parallel.
:
: This is expected to provide a performance uplift for cores that do not
: implement FEAT_HAFDBS, such as those from the fruit company.
KVM: arm64: Condition HW AF updates on config option
KVM: arm64: Handle access faults behind the read lock
KVM: arm64: Don't serialize if the access flag isn't set
KVM: arm64: Return EAGAIN for invalid PTE in attr walker
KVM: arm64: Ignore EAGAIN for walks outside of a fault
KVM: arm64: Use KVM's pte type/helpers in handle_access_fault()
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/virtual-cache-geometry:
: Virtualized cache geometry for KVM guests, courtesy of Akihiko Odaki.
:
: KVM/arm64 has always exposed the host cache geometry directly to the
: guest, even though non-secure software should never perform CMOs by
: Set/Way. This was slightly wrong, as the cache geometry was derived from
: the PE on which the vCPU thread was running and not a sanitized value.
:
: All together this leads to issues migrating VMs on heterogeneous
: systems, as the cache geometry saved/restored could be inconsistent.
:
: KVM/arm64 now presents 1 level of cache with 1 set and 1 way. The cache
: geometry is entirely controlled by userspace, such that migrations from
: older kernels continue to work.
KVM: arm64: Mark some VM-scoped allocations as __GFP_ACCOUNT
KVM: arm64: Normalize cache configuration
KVM: arm64: Mask FEAT_CCIDX
KVM: arm64: Always set HCR_TID2
arm64/cache: Move CLIDR macro definitions
arm64/sysreg: Add CCSIDR2_EL1
arm64/sysreg: Convert CCSIDR_EL1 to automatic generation
arm64: Allow the definition of UNKNOWN system register fields
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>