Add tests for SError injection considering KVM is more directly involved
in delivery:
- Pending SErrors are taken at the first CSE after SErrors are unmasked
- Pending SErrors aren't taken and remain pending if SErrors are masked
- Unmasked SErrors are taken immediately when injected (implementation
detail)
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-25-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
KVM might have an emulated SError queued for the guest if userspace
returned an abort for MMIO. Better yet, it could actually be a
*synchronous* exception in disguise if SCTLR2_ELx.EASE is set.
Don't advance PC if KVM owes an emulated SError, just like the handling
of emulated SEA injection.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-24-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
As the name might imply, when NMEA is set SErrors are non-maskable and
can be taken regardless of PSTATE.A. As is the recurring theme with
DoubleFault2, the effects on SError routing are entirely backwards to
this.
If at EL1, NMEA is *not* considered for SError routing when TMEA is set
and the exception is taken to EL2 when PSTATE.A is set.
Link: https://lore.kernel.org/r/20250708172532.1699409-20-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
HCRX_EL2.TMEA further modifies the external abort behavior where
unmasked aborts are taken to EL1 and masked aborts are taken to EL2.
It's rather weird when you consider that SEAs are, well, *synchronous*
and therefore not actually maskable. However, for the purposes of
exception routing, they're considered "masked" if the A flag is set.
This gets a bit hairier when considering the fact that TMEA
also enables vSErrors, i.e. KVM has delegated the HW vSError context to
the guest hypervisor. We can keep the vSError context delegation as-is
by taking advantage of a couple properties:
- If SErrors are unmasked, the 'physical' SError can be taken
in-context immediately. In other words, KVM can emulate the EL1
SError while preserving vEL2's ownership of the vSError context.
- If SErrors are masked, the 'physical' SError is taken to EL2
immediately and needs the usual nested exception entry.
Note that the new in-context handling has the benign effect where
unmasked SError injections are emulated even for non-nested VMs.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-19-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
One of the finest additions of FEAT_DoubleFault2 is the ability for
software to request *synchronous* external aborts be taken to the
SError vector, which of coure are *asynchronous* in nature.
Opinions be damned, implement the architecture and send SEAs to the
SError vector if EASE is set for the target context.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-18-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
When HCR_EL2.AMO is set, physical SErrors are routed to EL2 and virtual
SError injection is enabled for EL1. Conceptually treating
host-initiated SErrors as 'physical', this means we can delegate control
of the vSError injection context to the guest hypervisor when nesting &&
AMO is set.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-9-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
To date KVM has used HCR_EL2.VSE to track the state of a pending SError
for the guest. With this bit set, hardware respects the EL1 exception
routing / masking rules and injects the vSError when appropriate.
This isn't correct for NV guests as hardware is oblivious to vEL2's
intentions for SErrors. Better yet, with FEAT_NV2 the guest can change
the routing behind our back as HCR_EL2 is redirected to memory. Cope
with this mess by:
- Using a flag (instead of HCR_EL2.VSE) to track the pending SError
state when SErrors are unconditionally masked for the current context
- Resampling the routing / masking of a pending SError on every guest
entry/exit
- Emulating exception entry when SError routing implies a translation
regime change
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
A common idiom in the KVM code is to check if we are currently
dealing with a "nested" context, defined as having NV enabled,
but being in the EL1&0 translation regime.
This is usually expressed as:
if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu) ... )
which is a mouthful and a bit hard to read, specially when followed
by additional conditions.
Introduce a new helper that encapsulate these two terms, allowing
the above to be written as
if (is_nested_context(vcpu) ... )
which is both shorter and easier to read, and makes more obvious
the potential for simplification on some code paths.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Pull i2c fixes from Wolfram Sang:
- subsystem: convert drivers to use recent callbacks of struct
i2c_algorithm A typical after-rc1 cleanup, which I couldn't send in
time for rc2
- tegra: fix YAML conversion of device tree bindings
- k1: re-add a check which got lost during upstreaming
* tag 'i2c-for-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: k1: check for transfer error
i2c: use inclusive callbacks in struct i2c_algorithm
dt-bindings: i2c: nvidia,tegra20-i2c: Specify the required properties
Pull x86 fixes from Borislav Petkov:
- Make sure the array tracking which kernel text positions need to be
alternatives-patched doesn't get mishandled by out-of-order
modifications, leading to it overflowing and causing page faults when
patching
- Avoid an infinite loop when early code does a ranged TLB invalidation
before the broadcast TLB invalidation count of how many pages it can
flush, has been read from CPUID
- Fix a CONFIG_MODULES typo
- Disable broadcast TLB invalidation when PTI is enabled to avoid an
overflow of the bitmap tracking dynamic ASIDs which need to be
flushed when the kernel switches between the user and kernel address
space
- Handle the case of a CPU going offline and thus reporting zeroes when
reading top-level events in the resctrl code
* tag 'x86_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives: Fix int3 handling failure from broken text_poke array
x86/mm: Fix early boot use of INVPLGB
x86/its: Fix an ifdef typo in its_alloc()
x86/mm: Disable INVLPGB when PTI is enabled
x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem
Pull irq fixes from Borislav Petkov:
- Fix missing prototypes warnings
- Properly initialize work context when allocating it
- Remove a method tracking when managed interrupts are suspended during
hotplug, in favor of the code using a IRQ disable depth tracking now,
and have interrupts get properly enabled again on restore
- Make sure multiple CPUs getting hotplugged don't cause wrong tracking
of the managed IRQ disable depth
* tag 'irq_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/ath79-misc: Fix missing prototypes warnings
genirq/irq_sim: Initialize work context pointers properly
genirq/cpuhotplug: Restore affinity even for suspended IRQ
genirq/cpuhotplug: Rebalance managed interrupts across multi-CPU hotplug
Pull perf fixes from Borislav Petkov:
- Avoid a crash on a heterogeneous machine where not all cores support
the same hw events features
- Avoid a deadlock when throttling events
- Document the perf event states more
- Make sure a number of perf paths switching off or rescheduling events
call perf_cgroup_event_disable()
- Make sure perf does task sampling before its userspace mapping is
torn down, and not after
* tag 'perf_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix crash in icl_update_topdown_event()
perf: Fix the throttle error of some clock events
perf: Add comment to enum perf_event_state
perf/core: Fix WARN in perf_cgroup_switch()
perf: Fix dangling cgroup pointer in cpuctx
perf: Fix cgroup state vs ERROR
perf: Fix sample vs do_exit()
Pull locking fixes from Borislav Petkov:
- Make sure the switch to the global hash is requested always under a
lock so that two threads requesting that simultaneously cannot get to
inconsistent state
- Reject negative NUMA nodes earlier in the futex NUMA interface
handling code
- Selftests fixes
* tag 'locking_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Verify under the lock if hash can be replaced
futex: Handle invalid node numbers supplied by user
selftests/futex: Set the home_node in futex_numa_mpol
selftests/futex: getopt() requires int as return value.
Pull EDAC fixes from Borislav Petkov:
- amd64: Correct the number of memory controllers on some AMD Zen
clients
- igen6: Handle firmware-disabled memory controllers properly
* tag 'edac_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/igen6: Fix NULL pointer dereference
EDAC/amd64: Correct number of UMCs for family 19h models 70h-7fh
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix another set of FP/SIMD/SVE bugs affecting NV, and plugging some
missing synchronisation
- A small fix for the irqbypass hook fixes, tightening the check and
ensuring that we only deal with MSI for both the old and the new
route entry
- Rework the way the shadow LRs are addressed in a nesting
configuration, plugging an embarrassing bug as well as simplifying
the whole process
- Add yet another fix for the dreaded arch_timer_edge_cases selftest
RISC-V:
- Fix the size parameter check in SBI SFENCE calls
- Don't treat SBI HFENCE calls as NOPs
x86 TDX:
- Complete API for handling complex TDVMCALLs in userspace.
This was delayed because the spec lacked a way for userspace to
deny supporting these calls; the new exit code is now approved"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: TDX: Exit to userspace for GetTdVmCallInfo
KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>
KVM: TDX: Add new TDVMCALL status code for unsupported subfuncs
KVM: arm64: VHE: Centralize ISBs when returning to host
KVM: arm64: Remove cpacr_clear_set()
KVM: arm64: Remove ad-hoc CPTR manipulation from kvm_hyp_handle_fpsimd()
KVM: arm64: Remove ad-hoc CPTR manipulation from fpsimd_sve_sync()
KVM: arm64: Reorganise CPTR trap manipulation
KVM: arm64: VHE: Synchronize CPTR trap deactivation
KVM: arm64: VHE: Synchronize restore of host debug registers
KVM: arm64: selftests: Close the GIC FD in arch_timer_edge_cases
KVM: arm64: Explicitly treat routing entry type changes as changes
KVM: arm64: nv: Fix tracking of shadow list registers
RISC-V: KVM: Don't treat SBI HFENCE calls as NOPs
RISC-V: KVM: Fix the size parameter check in SBI SFENCE calls
Pull smb client fixes from Steve French:
- Multichannel channel allocation fix for Kerberos mounts
- Two reconnect fixes
- Fix netfs_writepages crash with smbdirect/RDMA
- Directory caching fix
- Three minor cleanup fixes
- Log error when close cached dirs fails
* tag 'v6.16-rc2-smb3-client-fixes-v2' of git://git.samba.org/sfrench/cifs-2.6:
smb: minor fix to use SMB2_NTLMV2_SESSKEY_SIZE for auth_key size
smb: minor fix to use sizeof to initialize flags_string buffer
smb: Use loff_t for directory position in cached_dirents
smb: Log an error when close_all_cached_dirs fails
cifs: Fix prepare_write to negotiate wsize if needed
smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma()
smb: client: fix first command failure during re-negotiation
cifs: Remove duplicate fattr->cf_dtype assignment from wsl_to_fattr() function
smb: fix secondary channel creation issue with kerberos by populating hostname when adding channels
If spacemit_i2c_xfer_msg() times out waiting for a message transfer to
complete, or if the hardware reports an error, it returns a negative
error code (-ETIMEDOUT, -EAGAIN, -ENXIO. or -EIO).
The sole caller of spacemit_i2c_xfer_msg() is spacemit_i2c_xfer(),
which is the i2c_algorithm->xfer callback function. It currently
does not save the value returned by spacemit_i2c_xfer_msg().
The result is that transfer errors go unreported, and a caller
has no indication anything is wrong.
When this code was out for review, the return value *was* checked
in early versions. But for some reason, that assignment got dropped
between versions 5 and 6 of the series, perhaps related to reworking
the code to merge spacemit_i2c_xfer_core() into spacemit_i2c_xfer().
Simply assigning the value returned to "ret" fixes the problem.
Fixes: 5ea558473f ("i2c: spacemit: add support for SpacemiT K1 SoC")
Signed-off-by: Alex Elder <elder@riscstar.com>
Cc: <stable@vger.kernel.org> # v6.15+
Reviewed-by: Troy Mitchell <troymitchell988@gmail.com>
Link: https://lore.kernel.org/r/20250616125137.1555453-1-elder@riscstar.com
Signed-off-by: Andi Shyti <andi@smida.it>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Pull nfsd fixes from Chuck Lever:
- Two fixes for commits in the nfsd-6.16 merge
- One fix for the recently-added NFSD netlink facility
- One fix for a remote SunRPC crasher
* tag 'nfsd-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
sunrpc: handle SVC_GARBAGE during svc auth processing as auth error
nfsd: use threads array as-is in netlink interface
SUNRPC: Cleanup/fix initial rq_pages allocation
NFSD: Avoid corruption of a referring call list
Pull erofs fixes from Gao Xiang:
- Use the mounter’s credentials for file-backed mounts to resolve
Android SELinux permission issues
- Remove the unused trace event `erofs_destroy_inode`
- Error out on crafted out-of-file-range encoded extents
- Remove an incorrect check for encoded extents
* tag 'erofs-for-6.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: remove a superfluous check for encoded extents
erofs: refuse crafted out-of-file-range encoded extents
erofs: remove unused trace event erofs_destroy_inode
erofs: impersonate the opener's credentials when accessing backing file
Replaced hardcoded value 16 with SMB2_NTLMV2_SESSKEY_SIZE
in the auth_key definition and memcpy call.
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Change the pos field in struct cached_dirents from int to loff_t
to support large directory offsets. This avoids overflow and
matches kernel conventions for directory positions.
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Under low-memory conditions, close_all_cached_dirs() can't move the
dentries to a separate list to dput() them once the locks are dropped.
This will result in a "Dentry still in use" error, so add an error
message that makes it clear this is what happened:
[ 495.281119] CIFS: VFS: \\otters.example.com\share Out of memory while dropping dentries
[ 495.281595] ------------[ cut here ]------------
[ 495.281887] BUG: Dentry ffff888115531138{i=78,n=/} still in use (2) [unmount of cifs cifs]
[ 495.282391] WARNING: CPU: 1 PID: 2329 at fs/dcache.c:1536 umount_check+0xc8/0xf0
Also, bail out of looping through all tcons as soon as a single
allocation fails, since we're already in trouble, and kmalloc() attempts
for subseqeuent tcons are likely to fail just like the first one did.
Signed-off-by: Paul Aurich <paul@darkrain42.org>
Acked-by: Bharath SM <bharathsm@microsoft.com>
Suggested-by: Ruben Devos <rdevos@oxya.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
after fabc4ed200f9, server_unresponsive add a condition to check whether client
need to reconnect depending on server->lstrp. When client failed to reconnect
for some time and abort connection, server->lstrp is updated for the last time.
In the following scene, server->lstrp is too old. This cause next command
failure in re-negotiation rather than waiting for re-negotiation done.
1. mount -t cifs -o username=Everyone,echo_internal=10 //$server_ip/export /mnt
2. ssh $server_ip "echo b > /proc/sysrq-trigger &"
3. ls /mnt
4. sleep 21s
5. ssh $server_ip "service firewalld stop"
6. ls # return EHOSTDOWN
If the interval between 5 and 6 is too small, 6 may trigger sending negotiation
request. Before backgrounding cifsd thread try to receive negotiation response
from server in cifs_readv_from_socket, server_unresponsive may trigger
cifs_reconnect which cause 6 to be failed:
ls thread
----------------
smb2_negotiate
server->tcpStatus = CifsInNegotiate
compound_send_recv
wait_for_compound_request
cifsd thread
----------------
cifs_readv_from_socket
server_unresponsive
server->tcpStatus == CifsInNegotiate && jiffies > server->lstrp + 20s
cifs_reconnect
cifs_abort_connection: mid_state = MID_RETRY_NEEDED
ls thread
----------------
cifs_sync_mid_result return EAGAIN
smb2_negotiate return EHOSTDOWN
Though server->lstrp means last server response time, it is updated in
cifs_abort_connection and cifs_get_tcp_session. We can also update server->lstrp
before switching into CifsInNegotiate state to avoid failure in 6.
Fixes: 7ccc146546 ("smb: client: fix hang in wait_for_response() for negproto")
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Acked-by: Meetakshi Setiya <msetiya@microsoft.com>
Signed-off-by: zhangjian <zhangjian496@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull io_uring fix from Jens Axboe:
"A single fix to hopefully wrap up the saga of receive bundles"
* tag 'io_uring-6.16-20250621' of git://git.kernel.dk/linux:
io_uring/net: always use current transfer count for buffer put
Pull ACPI fix from Rafael Wysocki:
"Fix a crash in ACPICA while attempting to evaluate a control method
that expects more arguments than are being passed to it, which was
exposed by a defective firmware update from a prominent OEM on
multiple systems (Rafael Wysocki)"
* tag 'acpi-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Refuse to evaluate a method if arguments are missing
Pull PCI fixes from Bjorn Helgaas:
- Set up runtime PM even for devices that lack a PM Capability as we
did before 4d4c10f763 ("PCI: Explicitly put devices into D0 when
initializing"), which broke resume in some VFIO scenarios (Mario
Limonciello)
- Ignore pciehp Presence Detect Changed events caused by DPC, even if
they occur after a Data Link Layer State Changed event, to fix a VFIO
GPU passthrough regression in v6.13 (Lukas Wunner)
* tag 'pci-v6.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: pciehp: Ignore belated Presence Detect Changed caused by DPC
PCI/PM: Set up runtime PM even for devices without PCI PM
Pull RCU fix from Joel Fernandes:
"We recently got a report of a crash [1] with misuse of call_rcu().
Instead of crashing the kernel, a warning and graceful return is
better:
- rcu: Return early if callback is not specified (Uladzislau Rezki)"
Link: https://lore.kernel.org/all/aEnVuzK7VhGSizWj@pc636/ [1]
* tag 'rcu/fixes-for-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
rcu: Return early if callback is not specified
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix some file descriptor leaks that stand out with recent changes to
'perf list'
- Fix prctl include to fix building 'perf bench futex' hash with musl
libc
- Restrict 'perf test' uniquifying entry to machines with 'uncore_imc'
PMUs
- Document new output fields (op, cache, mem, dtlb, snoop) used with
'perf mem'
- Synchronize kernel header copies
* tag 'perf-tools-fixes-for-v6.16-1-2025-06-20' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
tools headers x86 cpufeatures: Sync with the kernel sources
perf bench futex: Fix prctl include in musl libc
perf test: Directory file descriptor leak
perf evsel: Missed close() when probing hybrid core PMUs
tools headers: Synchronize linux/bits.h with the kernel sources
tools arch amd ibs: Sync ibs.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers: Syncronize linux/build_bug.h with the kernel sources
tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
tools headers UAPI: Sync linux/kvm.h with the kernel sources
tools headers UAPI: Sync the drm/drm.h with the kernel sources
perf beauty: Update copy of linux/socket.h with the kernel sources
tools headers UAPI: Sync kvm header with the kernel sources
tools headers x86 svm: Sync svm headers with the kernel sources
tools headers UAPI: Sync KVM's vmx.h header with the kernel sources
tools kvm headers arm64: Update KVM header from the kernel sources
tools headers UAPI: Sync linux/prctl.h with the kernel sources to pick FUTEX knob
perf mem: Document new output fields (op, cache, mem, dtlb, snoop)
tools headers: Update the fs headers with the kernel sources
perf test: Restrict uniquifying test to machines with 'uncore_imc'
Pull mtd fixes from Miquel Raynal:
"The main fix that really needs to get in is the revert of the patch
adding the new mtd_master class, because it entirely fails the
partitioning if a specific Kconfig option is set. We need to think how
to handle that differently, so let's revert it as we need to get back
to the pen and paper situation again.
Otherwise the definition of some Winbond SPI NAND chips are receiving
some fixes (geometry and maximum frequency, mostly).
And finally a small memory leak gets also fixed"
* tag 'mtd/fixes-for-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: spinand: fix memory leak of ECC engine conf
mtd: spinand: winbond: Prevent unsupported frequencies on dual/quad I/O variants
mtd: spinand: winbond: Increase maximum frequency on an octal operation
mtd: spinand: winbond: Fix W35N number of planes/LUN
Revert "mtd: core: always create master device"
Pull SCSI fixes from James Bottomley:
"Two small and obvious driver fixes"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: elx: efct: Fix memory leak in efct_hw_parse_filter()
scsi: target: Fix NULL pointer dereference in core_scsi3_decode_spec_i_port()