Commit Graph

1265388 Commits

Author SHA1 Message Date
Paolo Bonzini
44ecfa3e5f Merge branch 'svm' of https://github.com/kvm-x86/linux into HEAD
Clean up SVM's enter/exit assembly code so that it can be compiled
without OBJECT_FILES_NON_STANDARD.  The "standard" __svm_vcpu_run() can't
be made 100% bulletproof, as RBP isn't restored on #VMEXIT, but that's
also the case for __vmx_vcpu_run(), and getting "close enough" is better
than not even trying.

As for SEV-ES, after yet another refresher on swap types, I realized
KVM can simply let the hardware restore registers after #VMEXIT, all
that's missing is storing the current values to the host save area
(they are swap type B).  This should provide 100% accuracy when using
stack frames for unwinding, and requires less assembly.

In between, build the SEV-ES code iff CONFIG_KVM_AMD_SEV=y, and yank out
"support" for 32-bit kernels in __svm_sev_es_vcpu_run, which was
unnecessarily polluting the code for a configuration that is disabled
at build time.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-17 11:44:37 -04:00
Paolo Bonzini
1c3bed8006 Merge tag 'kvm-x86-fixes-6.9-rcN' of https://github.com/kvm-x86/linux into HEAD
- Fix a mostly benign bug in the gfn_to_pfn_cache infrastructure where KVM
  would allow userspace to refresh the cache with a bogus GPA.  The bug has
  existed for quite some time, but was exposed by a new sanity check added in
  6.9 (to ensure a cache is either GPA-based or HVA-based).

- Drop an unused param from gfn_to_pfn_cache_invalidate_start() that got left
  behind during a 6.9 cleanup.

- Disable support for virtualizing adaptive PEBS, as KVM's implementation is
  architecturally broken and can leak host LBRs to the guest.

- Fix a bug where KVM neglects to set the enable bits for general purpose
  counters in PERF_GLOBAL_CTRL when initializing the virtual PMU.  Both Intel
  and AMD architectures require the bits to be set at RESET in order for v2
  PMUs to be backwards compatible with software that was written for v1 PMUs,
  i.e. for software that will never manually set the global enables.

- Disable LBR virtualization on CPUs that don't support LBR callstacks, as
  KVM unconditionally uses PERF_SAMPLE_BRANCH_CALL_STACK when creating the
  virtual LBR perf event, i.e. KVM will always fail to create LBR events on
  such CPUs.

- Fix a math goof in x86's hugepage logic for KVM_SET_MEMORY_ATTRIBUTES that
  results in an array overflow (detected by KASAN).

- Fix a flaw in the max_guest_memory selftest that results in it exhausting
  the supply of ucall structures when run with more than 256 vCPUs.

- Mark KVM_MEM_READONLY as supported for RISC-V in set_memory_region_test.

- Fix a bug where KVM incorrectly thinks a TDP MMU root is an indirect shadow
  root due KVM unnecessarily clobbering root_role.direct when userspace sets
  guest CPUID.

- Fix a dirty logging bug in the where KVM fails to write-protect TDP MMU
  SPTEs used for L2 if Page-Modification Logging is enabled for L1 and the L1
  hypervisor is NOT using EPT (if nEPT is enabled, KVM doesn't use the TDP MMU
  to run L2).  For simplicity, KVM always disables PML when running L2, but
  the TDP MMU wasn't accounting for root-specific conditions that force write-
  protect based dirty logging.
2024-04-16 12:50:21 -04:00
Sean Christopherson
eefb85b3f0 KVM: Drop unused @may_block param from gfn_to_pfn_cache_invalidate_start()
Remove gfn_to_pfn_cache_invalidate_start()'s unused @may_block parameter,
which was leftover from KVM's abandoned (for now) attempt to support guest
usage of gfn_to_pfn caches.

Fixes: a4bff3df51 ("KVM: pfncache: remove KVM_GUEST_USES_PFN usage")
Reported-by: Like Xu <like.xu.linux@gmail.com>
Cc: Paul Durrant <paul@xen.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240305003742.245767-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11 12:58:53 -07:00
David Matlack
40e0ee6338 KVM: selftests: Add coverage of EPT-disabled to vmx_dirty_log_test
Extend vmx_dirty_log_test to include accesses made by L2 when EPT is
disabled.

This commit adds explicit coverage of a bug caught by syzkaller, where
the TDP MMU would clear D-bits instead of write-protecting SPTEs being
used to map an L2, which only happens when L1 does not enable EPT,
causing writes made by L2 to not be reflected in the dirty log when PML
is enabled:

  $ ./vmx_dirty_log_test
  Nested EPT: disabled
  ==== Test Assertion Failure ====
    x86_64/vmx_dirty_log_test.c:151: test_bit(0, bmap)
    pid=72052 tid=72052 errno=4 - Interrupted system call
    (stack trace empty)
    Page 0 incorrectly reported clean

Opportunistically replace the volatile casts with {READ,WRITE}_ONCE().

Link: https://lore.kernel.org/kvm/000000000000c6526f06137f18cc@google.com/
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20240315230541.1635322-5-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11 12:58:52 -07:00
David Matlack
b1a8d2b02b KVM: x86/mmu: Fix and clarify comments about clearing D-bit vs. write-protecting
Drop the "If AD bits are enabled/disabled" verbiage from the comments
above kvm_tdp_mmu_clear_dirty_{slot,pt_masked}() since TDP MMU SPTEs may
need to be write-protected even when A/D bits are enabled. i.e. These
comments aren't technically correct.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20240315230541.1635322-4-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11 12:58:51 -07:00
David Matlack
feac19aa4c KVM: x86/mmu: Remove function comments above clear_dirty_{gfn_range,pt_masked}()
Drop the comments above clear_dirty_gfn_range() and
clear_dirty_pt_masked(), since each is word-for-word identical to the
comment above their parent function.

Leave the comment on the parent functions since they are APIs called by
the KVM/x86 MMU.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20240315230541.1635322-3-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11 12:58:50 -07:00
David Matlack
2673dfb591 KVM: x86/mmu: Write-protect L2 SPTEs in TDP MMU when clearing dirty status
Check kvm_mmu_page_ad_need_write_protect() when deciding whether to
write-protect or clear D-bits on TDP MMU SPTEs, so that the TDP MMU
accounts for any role-specific reasons for disabling D-bit dirty logging.

Specifically, TDP MMU SPTEs must be write-protected when the TDP MMU is
being used to run an L2 (i.e. L1 has disabled EPT) and PML is enabled.
KVM always disables PML when running L2, even when L1 and L2 GPAs are in
the some domain, so failing to write-protect TDP MMU SPTEs will cause
writes made by L2 to not be reflected in the dirty log.

Reported-by: syzbot+900d58a45dcaab9e4821@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=900d58a45dcaab9e4821
Fixes: 5982a53926 ("KVM: x86/mmu: Use kvm_ad_enabled() to determine if TDP MMU SPTEs need wrprot")
Cc: stable@vger.kernel.org
Cc: Vipin Sharma <vipinsh@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20240315230541.1635322-2-dmatlack@google.com
[sean: massage shortlog and changelog, tweak ternary op formatting]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11 12:58:50 -07:00
Sean Christopherson
1bc26cb909 KVM: x86/mmu: Precisely invalidate MMU root_role during CPUID update
Set kvm_mmu_page_role.invalid to mark the various MMU root_roles invalid
during CPUID update in order to force a refresh, instead of zeroing out
the entire role.  This fixes a bug where kvm_mmu_free_roots() incorrectly
thinks a root is indirect, i.e. not a TDP MMU, due to "direct" being
zeroed, which in turn causes KVM to take mmu_lock for write instead of
read.

Note, paving over the entire role was largely unintentional, commit
7a458f0e1b ("KVM: x86/mmu: remove extended bits from mmu_role, rename
field") simply missed that "invalid" could be set.

Fixes: 576a15de8d ("KVM: x86/mmu: Free TDP MMU roots while holding mmy_lock for read")
Reported-by: syzbot+dc308fcfcd53f987de73@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000009b38080614c49bdb@google.com
Cc: Phi Nguyen <phind.uet@gmail.com>
Link: https://lore.kernel.org/r/20240408231115.1387279-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11 12:58:49 -07:00
Sean Christopherson
bb9dc85908 KVM: VMX: Disable LBR virtualization if the CPU doesn't support LBR callstacks
Disable LBR virtualization if the CPU doesn't support callstacks, which
were introduced in HSW (see commit e9d7f7cd97 ("perf/x86/intel: Add
basic Haswell LBR call stack support"), as KVM unconditionally configures
the perf LBR event with PERF_SAMPLE_BRANCH_CALL_STACK, i.e. LBR
virtualization always fails on pre-HSW CPUs.

Simply disable LBR support on such CPUs, as it has never worked, i.e.
there is no risk of breaking an existing setup, and figuring out a way
to performantly context switch LBRs on old CPUs is not worth the effort.

Fixes: be635e34c2 ("KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES")
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Jim Mattson <jmattson@google.com>
Tested-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20240307011344.835640-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11 12:58:48 -07:00
Sean Christopherson
0d0b608650 perf/x86/intel: Expose existence of callback support to KVM
Add a "has_callstack" field to the x86_pmu_lbr structure used to pass
information to KVM, and set it accordingly in x86_perf_get_lbr().  KVM
will use has_callstack to avoid trying to create perf LBR events with
PERF_SAMPLE_BRANCH_CALL_STACK on CPUs that don't support callstacks.

Reviewed-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20240307011344.835640-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11 12:58:47 -07:00
Sean Christopherson
447112d7ed KVM: VMX: Snapshot LBR capabilities during module initialization
Snapshot VMX's LBR capabilities once during module initialization instead
of calling into perf every time a vCPU reconfigures its vPMU.  This will
allow massaging the LBR capabilities, e.g. if the CPU doesn't support
callstacks, without having to remember to update multiple locations.

Opportunistically tag vmx_get_perf_capabilities() with __init, as it's
only called from vmx_set_cpu_caps().

Reviewed-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20240307011344.835640-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11 12:58:46 -07:00
Sandipan Das
49ff3b4aec KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms
On AMD and Hygon platforms, the local APIC does not automatically set
the mask bit of the LVTPC register when handling a PMI and there is
no need to clear it in the kernel's PMI handler.

For guests, the mask bit is currently set by kvm_apic_local_deliver()
and unless it is cleared by the guest kernel's PMI handler, PMIs stop
arriving and break use-cases like sampling with perf record.

This does not affect non-PerfMonV2 guests because PMIs are handled in
the guest kernel by x86_pmu_handle_irq() which always clears the LVTPC
mask bit irrespective of the vendor.

Before:

  $ perf record -e cycles:u true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.001 MB perf.data (1 samples) ]

After:

  $ perf record -e cycles:u true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.002 MB perf.data (19 samples) ]

Fixes: a16eb25b09 ("KVM: x86: Mask LVTPC when handling a PMI")
Cc: stable@vger.kernel.org
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
[sean: use is_intel_compatible instead of !is_amd_or_hygon()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240405235603.1173076-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-11 12:58:59 -04:00
Sean Christopherson
fd706c9b16 KVM: x86: Snapshot if a vCPU's vendor model is AMD vs. Intel compatible
Add kvm_vcpu_arch.is_amd_compatible to cache if a vCPU's vendor model is
compatible with AMD, i.e. if the vCPU vendor is AMD or Hygon, along with
helpers to check if a vCPU is compatible AMD vs. Intel.  To handle Intel
vs. AMD behavior related to masking the LVTPC entry, KVM will need to
check for vendor compatibility on every PMI injection, i.e. querying for
AMD will soon be a moderately hot path.

Note!  This subtly (or maybe not-so-subtly) makes "Intel compatible" KVM's
default behavior, both if userspace omits (or never sets) CPUID 0x0 and if
userspace sets a completely unknown vendor.  One could argue that KVM
should treat such vCPUs as not being compatible with Intel *or* AMD, but
that would add useless complexity to KVM.

KVM needs to do *something* in the face of vendor specific behavior, and
so unless KVM conjured up a magic third option, choosing to treat unknown
vendors as neither Intel nor AMD means that checks on AMD compatibility
would yield Intel behavior, and checks for Intel compatibility would yield
AMD behavior.  And that's far worse as it would effectively yield random
behavior depending on whether KVM checked for AMD vs. Intel vs. !AMD vs.
!Intel.  And practically speaking, all x86 CPUs follow either Intel or AMD
architecture, i.e. "supporting" an unknown third architecture adds no
value.

Deliberately don't convert any of the existing guest_cpuid_is_intel()
checks, as the Intel side of things is messier due to some flows explicitly
checking for exactly vendor==Intel, versus some flows assuming anything
that isn't "AMD compatible" gets Intel behavior.  The Intel code will be
cleaned up in the future.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240405235603.1173076-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-11 12:58:56 -04:00
Sean Christopherson
27ca867042 KVM: x86: Stop compiling vmenter.S with OBJECT_FILES_NON_STANDARD
Stop compiling vmenter.S with OBJECT_FILES_NON_STANDARD to skip objtool's
stack validation now that __svm_vcpu_run() and __svm_sev_es_vcpu_run()
create stack frames (though the former's effectiveness is dubious).

Note, due to a quirk in how OBJECT_FILES_NON_STANDARD was handled by the
build system prior to commit bf48d9b756 ("kbuild: change tool coverage
variables to take the path relative to $(obj)"), vmx/vmenter.S got lumped
in with svm/vmenter.S.  __vmx_vcpu_run() already plays nice with frame
pointers, i.e. it was collateral damage when commit 7f4b5cde24 ("kvm:
Disable objtool frame pointer checking for vmenter.S") added the
OBJECT_FILES_NON_STANDARD hack-a-fix.

Link: https://lore.kernel.org/all/20240217055504.2059803-1-masahiroy@kernel.org
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240223204233.3337324-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 10:21:44 -07:00
Sean Christopherson
4367a75887 KVM: SVM: Create a stack frame in __svm_sev_es_vcpu_run()
Now that KVM uses the host save area to context switch RBP, i.e.
preserves RBP for the entirety of __svm_sev_es_vcpu_run(), create a stack
frame using the standared FRAME_{BEGIN,END} macros.

Note, __svm_sev_es_vcpu_run() is subtly not a leaf function as it can call
into ibpb_feature() via UNTRAIN_RET_VM.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240223204233.3337324-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 10:21:10 -07:00
Sean Christopherson
adac42bf42 KVM: SVM: Save/restore args across SEV-ES VMRUN via host save area
Use the host save area to preserve volatile registers that are used in
__svm_sev_es_vcpu_run() to access function parameters after #VMEXIT.
Like saving/restoring non-volatile registers, there's no reason not to
take advantage of hardware restoring registers on #VMEXIT, as doing so
shaves a few instructions and the save area is going to be accessed no
matter what.

Converting all register save/restore code to use the host save area also
make it easier to follow the SEV-ES VMRUN flow in its entirety, as
opposed to having a mix of stack-based versus host save area save/restore.

Add a parameter to RESTORE_HOST_SPEC_CTRL_BODY so that the SEV-ES path
doesn't need to write @spec_ctrl_intercepted to memory just to play nice
with the common macro.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240223204233.3337324-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 10:21:10 -07:00
Sean Christopherson
c92be2fd8e KVM: SVM: Save/restore non-volatile GPRs in SEV-ES VMRUN via host save area
Use the host save area to save/restore non-volatile (callee-saved)
registers in __svm_sev_es_vcpu_run() to take advantage of hardware loading
all registers from the save area on #VMEXIT.  KVM still needs to save the
registers it wants restored, but the loads are handled automatically by
hardware.

Aside from less assembly code, letting hardware do the restoration means
stack frames are preserved for the entirety of __svm_sev_es_vcpu_run().

Opportunistically add a comment to call out why @svm needs to be saved
across VMRUN->#VMEXIT, as it's not easy to decipher that from the macro
hell.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240223204233.3337324-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 10:20:29 -07:00
Sean Christopherson
87e8e360a0 KVM: SVM: Clobber RAX instead of RBX when discarding spec_ctrl_intercepted
POP @spec_ctrl_intercepted into RAX instead of RBX when discarding it from
the stack so that __svm_sev_es_vcpu_run() doesn't modify any non-volatile
registers.  __svm_sev_es_vcpu_run() doesn't return a value, and RAX is
already are clobbered multiple times in the #VMEXIT path.

This will allowing using the host save area to save/restore non-volatile
registers in __svm_sev_es_vcpu_run().

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240223204233.3337324-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 10:20:29 -07:00
Sean Christopherson
331282fdb1 KVM: SVM: Drop 32-bit "support" from __svm_sev_es_vcpu_run()
Drop 32-bit "support" from __svm_sev_es_vcpu_run(), as SEV/SEV-ES firmly
64-bit only.  The "support" was purely the result of bad copy+paste from
__svm_vcpu_run(), which in turn was slightly less bad copy+paste from
__vmx_vcpu_run().

Opportunistically convert to unadulterated register accesses so that it's
easier (but still not easy) to follow which registers hold what arguments,
and when.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240223204233.3337324-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 10:20:29 -07:00
Sean Christopherson
7774c8f32e KVM: SVM: Wrap __svm_sev_es_vcpu_run() with #ifdef CONFIG_KVM_AMD_SEV
Compile (and link) __svm_sev_es_vcpu_run() if and only if SEV support is
actually enabled.  This will allow dropping non-existent 32-bit "support"
from __svm_sev_es_vcpu_run() without causing undue confusion.

Intentionally don't provide a stub (but keep the declaration), as any sane
compiler, even with things like KASAN enabled, should eliminate the call
to __svm_sev_es_vcpu_run() since sev_es_guest() unconditionally returns
"false" if CONFIG_KVM_AMD_SEV=n.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240223204233.3337324-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 10:20:28 -07:00
Sean Christopherson
19597a71a0 KVM: SVM: Create a stack frame in __svm_vcpu_run() for unwinding
Unconditionally create a stack frame in __svm_vcpu_run() to play nice with
unwinding via frame pointers, at least until the point where RBP is loaded
with the guest's value.  Don't bother conditioning the code on
CONFIG_FRAME_POINTER=y, as RBP needs to be saved and restored anyways (due
to it being clobbered with the guest's value); omitting the "MOV RSP, RBP"
is not worth the extra #ifdef.

Creating a stack frame will allow removing the OBJECT_FILES_NON_STANDARD
tag from vmenter.S once __svm_sev_es_vcpu_run() is fixed to not stomp all
over RBP for no reason.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240223204233.3337324-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 10:20:28 -07:00
Christophe JAILLET
4710e4fc3e KVM: SVM: Remove a useless zeroing of allocated memory
Remove KVM's unnecessary zeroing of memory when allocating the pages array
in sev_pin_memory() via __vmalloc(), as the array is only used to hold
kernel pointers.  The kmalloc() path for "small" regions doesn't zero the
array, and if KVM leaks state and/or accesses uninitialized data, then the
kernel has bigger problems.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/c7619a3d3cbb36463531a7c73ccbde9db587986c.1710004509.git.christophe.jaillet@wanadoo.fr
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 10:15:30 -07:00
Tao Su
7f2817ef52 KVM: VMX: Ignore MKTME KeyID bits when intercepting #PF for allow_smaller_maxphyaddr
Use the raw/true host.MAXPHYADDR when deciding whether or not KVM must
intercept #PFs when allow_smaller_maxphyaddr is enabled, as any adjustments
the kernel makes to boot_cpu_data.x86_phys_bits to account for MKTME KeyID
bits do not apply to the guest physical address space.  I.e. the KeyID are
off-limits for host physical addresses, but are not reserved for GPAs as
far as hardware is concerned.

Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20240319031111.495006-1-tao1.su@linux.intel.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 14:22:10 -07:00
Andrew Jones
449c0811d8 KVM: selftests: fix supported_flags for riscv
commit 849c181643 ("KVM: selftests: fix supported_flags for aarch64")
fixed the set-memory-region test for aarch64 by declaring the read-only
flag is supported. riscv also supports the read-only flag. Fix it too.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240403123300.63923-2-ajones@ventanamicro.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 13:22:08 -07:00
Maxim Levitsky
0ef2dd1f41 KVM: selftests: fix max_guest_memory_test with more that 256 vCPUs
max_guest_memory_test uses ucalls to sync with the host, but
it also resets the guest RIP back to its initial value in between
tests stages.

This makes the guest never reach the code which frees the ucall struct
and since a fixed pool of 512 ucall structs is used, the test starts
to fail when more that 256 vCPUs are used.

Fix that by replacing the manual register reset with a loop in
the guest code.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20240315143507.102629-1-mlevitsk@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 13:21:40 -07:00
Sean Christopherson
08a828249b KVM: selftests: Verify post-RESET value of PERF_GLOBAL_CTRL in PMCs test
Add a guest assert in the PMU counters test to verify that KVM stuffs
the vCPU's post-RESET value to globally enable all general purpose
counters.  Per Intel's SDM,

  IA32_PERF_GLOBAL_CTRL:  Sets bits n-1:0 and clears the upper bits.

and

  Where "n" is the number of general-purpose counters available in
  the processor.

For the edge case where there are zero GP counters, follow the spirit
of the architecture, not the SDM's literal wording, which doesn't account
for this possibility and would require the CPU to set _all_ bits in
PERF_GLOBAL_CTRL.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240309013641.1413400-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 13:20:28 -07:00
Sean Christopherson
de120e1d69 KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET"
Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL
when refreshing the PMU to emulate the MSR's architecturally defined
post-RESET behavior.  Per Intel's SDM:

  IA32_PERF_GLOBAL_CTRL:  Sets bits n-1:0 and clears the upper bits.

and

  Where "n" is the number of general-purpose counters available in the processor.

AMD also documents this behavior for PerfMonV2 CPUs in one of AMD's many
PPRs.

Do not set any PERF_GLOBAL_CTRL bits if there are no general purpose
counters, although a literal reading of the SDM would require the CPU to
set either bits 63:0 or 31:0.  The intent of the behavior is to globally
enable all GP counters; honor the intent, if not the letter of the law.

Leaving PERF_GLOBAL_CTRL '0' effectively breaks PMU usage in guests that
haven't been updated to work with PMUs that support PERF_GLOBAL_CTRL.
This bug was recently exposed when KVM added supported for AMD's
PerfMonV2, i.e. when KVM started exposing a vPMU with PERF_GLOBAL_CTRL to
guest software that only knew how to program v1 PMUs (that don't support
PERF_GLOBAL_CTRL).

Failure to emulate the post-RESET behavior results in such guests
unknowingly leaving all general purpose counters globally disabled (the
entire reason the post-RESET value sets the GP counter enable bits is to
maintain backwards compatibility).

The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been
supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone
is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel
PMUs.  And because up until v6.0, KVM _did_ emulate the behavior for Intel
CPUs, although the old behavior was likely dumb luck.

Because (a) that old code was also broken in its own way (the history of
this code is a comedy of errors), and (b) PERF_GLOBAL_CTRL was documented
as having a value of '0' post-RESET in all SDMs before March 2023.

Initial vPMU support in commit f5132b0138 ("KVM: Expose a version 2
architectural PMU to a guests") *almost* got it right (again likely by
dumb luck), but for some reason only set the bits if the guest PMU was
advertised as v1:

        if (pmu->version == 1) {
                pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1;
                return;
        }

Commit f19a0c2c2e ("KVM: PMU emulation: GLOBAL_CTRL MSR should be
enabled on reset") then tried to remedy that goof, presumably because
guest PMUs were leaving PERF_GLOBAL_CTRL '0', i.e. weren't enabling
counters.

        pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
                (((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED);
        pmu->global_ctrl_mask = ~pmu->global_ctrl;

That was KVM's behavior up until commit c49467a45f ("KVM: x86/pmu:
Don't overwrite the pmu->global_ctrl when refreshing") removed
*everything*.  However, it did so based on the behavior defined by the
SDM , which at the time stated that "Global Perf Counter Controls" is
'0' at Power-Up and RESET.

But then the March 2023 SDM (325462-079US), stealthily changed its
"IA-32 and Intel 64 Processor States Following Power-up, Reset, or INIT"
table to say:

  IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.

Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a
"pure" RESET flow.  But it can only be called prior to the first KVM_RUN,
i.e. the guest will only ever observe the final value.

Note #2, KVM has always cleared global_ctrl during refresh (see commit
f5132b0138 ("KVM: Expose a version 2 architectural PMU to a guests")),
i.e. there is no danger of breaking existing setups by clobbering a value
set by userspace.

Reported-by: Babu Moger <babu.moger@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Like Xu <like.xu.linux@gmail.com>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240309013641.1413400-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 13:20:27 -07:00
Rick Edgecombe
992b54bd08 KVM: x86/mmu: x86: Don't overflow lpage_info when checking attributes
Fix KVM_SET_MEMORY_ATTRIBUTES to not overflow lpage_info array and trigger
KASAN splat, as seen in the private_mem_conversions_test selftest.

When memory attributes are set on a GFN range, that range will have
specific properties applied to the TDP. A huge page cannot be used when
the attributes are inconsistent, so they are disabled for those the
specific huge pages. For internal KVM reasons, huge pages are also not
allowed to span adjacent memslots regardless of whether the backing memory
could be mapped as huge.

What GFNs support which huge page sizes is tracked by an array of arrays
'lpage_info' on the memslot, of ‘kvm_lpage_info’ structs. Each index of
lpage_info contains a vmalloc allocated array of these for a specific
supported page size. The kvm_lpage_info denotes whether a specific huge
page (GFN and page size) on the memslot is supported. These arrays include
indices for unaligned head and tail huge pages.

Preventing huge pages from spanning adjacent memslot is covered by
incrementing the count in head and tail kvm_lpage_info when the memslot is
allocated, but disallowing huge pages for memory that has mixed attributes
has to be done in a more complicated way. During the
KVM_SET_MEMORY_ATTRIBUTES ioctl KVM updates lpage_info for each memslot in
the range that has mismatched attributes. KVM does this a memslot at a
time, and marks a special bit, KVM_LPAGE_MIXED_FLAG, in the kvm_lpage_info
for any huge page. This bit is essentially a permanently elevated count.
So huge pages will not be mapped for the GFN at that page size if the
count is elevated in either case: a huge head or tail page unaligned to
the memslot or if KVM_LPAGE_MIXED_FLAG is set because it has mixed
attributes.

To determine whether a huge page has consistent attributes, the
KVM_SET_MEMORY_ATTRIBUTES operation checks an xarray to make sure it
consistently has the incoming attribute. Since level - 1 huge pages are
aligned to level huge pages, it employs an optimization. As long as the
level - 1 huge pages are checked first, it can just check these and assume
that if each level - 1 huge page contained within the level sized huge
page is not mixed, then the level size huge page is not mixed. This
optimization happens in the helper hugepage_has_attrs().

Unfortunately, although the kvm_lpage_info array representing page size
'level' will contain an entry for an unaligned tail page of size level,
the array for level - 1  will not contain an entry for each GFN at page
size level. The level - 1 array will only contain an index for any
unaligned region covered by level - 1 huge page size, which can be a
smaller region. So this causes the optimization to overflow the level - 1
kvm_lpage_info and perform a vmalloc out of bounds read.

In some cases of head and tail pages where an overflow could happen,
callers skip the operation completely as KVM_LPAGE_MIXED_FLAG is not
required to prevent huge pages as discussed earlier. But for memslots that
are smaller than the 1GB page size, it does call hugepage_has_attrs(). In
this case the huge page is both the head and tail page. The issue can be
observed simply by compiling the kernel with CONFIG_KASAN_VMALLOC and
running the selftest “private_mem_conversions_test”, which produces the
output like the following:

BUG: KASAN: vmalloc-out-of-bounds in hugepage_has_attrs+0x7e/0x110
Read of size 4 at addr ffffc900000a3008 by task private_mem_con/169
Call Trace:
  dump_stack_lvl
  print_report
  ? __virt_addr_valid
  ? hugepage_has_attrs
  ? hugepage_has_attrs
  kasan_report
  ? hugepage_has_attrs
  hugepage_has_attrs
  kvm_arch_post_set_memory_attributes
  kvm_vm_ioctl

It is a little ambiguous whether the unaligned head page (in the bug case
also the tail page) should be expected to have KVM_LPAGE_MIXED_FLAG set.
It is not functionally required, as the unaligned head/tail pages will
already have their kvm_lpage_info count incremented. The comments imply
not setting it on unaligned head pages is intentional, so fix the callers
to skip trying to set KVM_LPAGE_MIXED_FLAG in this case, and in doing so
not call hugepage_has_attrs().

Cc: stable@vger.kernel.org
Fixes: 90b4fe1798 ("KVM: x86: Disallow hugepages when memory attributes are mixed")
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Chao Peng <chao.p.peng@linux.intel.com>
Link: https://lore.kernel.org/r/20240314212902.2762507-1-rick.p.edgecombe@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 13:20:26 -07:00
Sean Christopherson
9e985cbf29 KVM: x86/pmu: Disable support for adaptive PEBS
Drop support for virtualizing adaptive PEBS, as KVM's implementation is
architecturally broken without an obvious/easy path forward, and because
exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak
host kernel addresses to the guest.

Bug #1 is that KVM doesn't account for the upper 32 bits of
IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g
fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters()
stores local variables as u8s and truncates the upper bits too, etc.

Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value
for PEBS events, perf will _always_ generate an adaptive record, even if
the guest requested a basic record.  Note, KVM will also enable adaptive
PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the
guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero,
i.e. the guest will only ever see Basic records.

Bug #3 is in perf.  intel_pmu_disable_fixed() doesn't clear the upper
bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and
intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE
either.  I.e. perf _always_ enables ADAPTIVE counters, regardless of what
KVM requests.

Bug #4 is that adaptive PEBS *might* effectively bypass event filters set
by the host, as "Updated Memory Access Info Group" records information
that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER.

Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least
zeros) when entering a vCPU with adaptive PEBS, which allows the guest
to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries"
records.

Disable adaptive PEBS support as an immediate fix due to the severity of
the LBR leak in particular, and because fixing all of the bugs will be
non-trivial, e.g. not suitable for backporting to stable kernels.

Note!  This will break live migration, but trying to make KVM play nice
with live migration would be quite complicated, wouldn't be guaranteed to
work (i.e. KVM might still kill/confuse the guest), and it's not clear
that there are any publicly available VMMs that support adaptive PEBS,
let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't
support PEBS in any capacity.

Link: https://lore.kernel.org/all/20240306230153.786365-1-seanjc@google.com
Link: https://lore.kernel.org/all/ZeepGjHCeSfadANM@google.com
Fixes: c59a1f106f ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
Cc: stable@vger.kernel.org
Cc: Like Xu <like.xu.linux@gmail.com>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhang Xiong <xiong.y.zhang@intel.com>
Cc: Lv Zhiyuan <zhiyuan.lv@intel.com>
Cc: Dapeng Mi <dapeng1.mi@intel.com>
Cc: Jim Mattson <jmattson@google.com>
Acked-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20240307005833.827147-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 13:20:25 -07:00
Sean Christopherson
fc62a4e8de KVM: Explicitly disallow activatating a gfn_to_pfn_cache with INVALID_GPA
Explicit disallow activating a gfn_to_pfn_cache with an error gpa, i.e.
INVALID_GPA, to ensure that KVM doesn't mistake a GPA-based cache for an
HVA-based cache (KVM uses INVALID_GPA as a magic value to differentiate
between GPA-based and HVA-based caches).

WARN if KVM attempts to activate a cache with INVALID_GPA, purely so that
new caches need to at least consider what to do with a "bad" GPA, as all
existing usage of kvm_gpc_activate() guarantees gpa != INVALID_GPA.  I.e.
removing the WARN in the future is completely reasonable if doing so would
yield cleaner/better code overall.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20240320001542.3203871-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 13:20:24 -07:00
Sean Christopherson
5c9ca4ed89 KVM: Check validity of offset+length of gfn_to_pfn_cache prior to activation
When activating a gfn_to_pfn_cache, verify that the offset+length is sane
and usable before marking the cache active.  Letting __kvm_gpc_refresh()
detect the problem results in a cache being marked active without setting
the GPA (or any other fields), which in turn results in KVM trying to
refresh a cache with INVALID_GPA.

Attempting to refresh a cache with INVALID_GPA isn't functionally
problematic, but it runs afoul of the sanity check that exactly one of
GPA or userspace HVA is valid, i.e. that a cache is either GPA-based or
HVA-based.

Reported-by: syzbot+106a4f72b0474e1d1b33@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000005fa5cc0613f1cebd@google.com
Fixes: 721f5b0dda ("KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA")
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Paul Durrant <paul@xen.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240320001542.3203871-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 13:20:24 -07:00
Sean Christopherson
18f06e9769 KVM: Add helpers to consolidate gfn_to_pfn_cache's page split check
Add a helper to check that the incoming length for a gfn_to_pfn_cache is
valid with respect to the cache's GPA and/or HVA.  To avoid activating a
cache with a bogus GPA, a future fix will fork the page split check in
the inner refresh path into activate() and the public rerfresh() APIs, at
which point KVM will check the length in three separate places.

Deliberately keep the "page offset" logic open coded, as the only other
path that consumes the offset, __kvm_gpc_refresh(), already needs to
differentiate between GPA-based and HVA-based caches, and it's not obvious
that using a helper is a net positive in overall code readability.

Note, for GPA-based caches, this has a subtle side effect of using the GPA
instead of the resolved HVA in the check() path, but that should be a nop
as the HVA offset is derived from the GPA, i.e. the two offsets are
identical, barring a KVM bug.

Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240320001542.3203871-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 13:20:23 -07:00
Linus Torvalds
fec50db703 Linux 6.9-rc3 v6.9-rc3 2024-04-07 13:22:46 -07:00
Linus Torvalds
9fe30842a9 Merge tag 'x86-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:

 - Fix MCE timer reinit locking

 - Fix/improve CoCo guest random entropy pool init

 - Fix SEV-SNP late disable bugs

 - Fix false positive objtool build warning

 - Fix header dependency bug

 - Fix resctrl CPU offlining bug

* tag 'x86-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk
  x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
  x86/CPU/AMD: Track SNP host status with cc_platform_*()
  x86/cc: Add cc_platform_set/_clear() helpers
  x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM
  x86/coco: Require seeding RNG with RDRAND on CoCo systems
  x86/numa/32: Include missing <asm/pgtable_areas.h>
  x86/resctrl: Fix uninitialized memory read when last CPU of domain goes offline
2024-04-07 09:33:21 -07:00
Linus Torvalds
3520c35e5f Merge tag 'timers-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Fix various timer bugs:

   - Fix a timer migration bug that may result in missed events

   - Fix timer migration group hierarchy event updates

   - Fix a PowerPC64 build warning

   - Fix a handful of DocBook annotation bugs"

* tag 'timers-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers/migration: Return early on deactivation
  timers/migration: Fix ignored event due to missing CPU update
  vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.h
  timers: Fix text inconsistencies and spelling
  tick/sched: Fix struct tick_sched doc warnings
  tick/sched: Fix various kernel-doc warnings
  timers: Fix kernel-doc format and add Return values
  time/timekeeping: Fix kernel-doc warnings and typos
  time/timecounter: Fix inline documentation
2024-04-07 09:20:50 -07:00
Linus Torvalds
e2948effa9 Merge tag 'perf-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fix from Ingo Molnar:
 "Fix a combined PEBS events bug on x86 Intel CPUs"

* tag 'perf-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/ds: Don't clear ->pebs_data_cfg for the last PEBS event
2024-04-07 09:14:46 -07:00
Linus Torvalds
f2f80ac809 Merge tag 'nfsd-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:

 - Address a slow memory leak with RPC-over-TCP

 - Prevent another NFS4ERR_DELAY loop during CREATE_SESSION

* tag 'nfsd-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: hold a lighter-weight client reference over CB_RECALL_ANY
  SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP
2024-04-06 09:37:50 -07:00
Linus Torvalds
cf17b9503f Merge tag 'i2c-for-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
 "A host driver build fix"

* tag 'i2c-for-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: pxa: hide unused icr_bits[] variable
2024-04-06 09:27:36 -07:00
Linus Torvalds
9520c192e8 Merge tag 'xfs-6.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fix from Chandan Babu:

 - Allow creating new links to special files which were not associated
   with a project quota

* tag 'xfs-6.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: allow cross-linking special files without project quota
2024-04-06 09:14:18 -07:00
Linus Torvalds
119c289409 Merge tag '6.9-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:

 - fix to retry close to avoid potential handle leaks when server
   returns EBUSY

 - DFS fixes including a fix for potential use after free

 - fscache fix

 - minor strncpy cleanup

 - reconnect race fix

 - deal with various possible UAF race conditions tearing sessions down

* tag '6.9-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect()
  smb: client: fix potential UAF in smb2_is_network_name_deleted()
  smb: client: fix potential UAF in is_valid_oplock_break()
  smb: client: fix potential UAF in smb2_is_valid_oplock_break()
  smb: client: fix potential UAF in smb2_is_valid_lease_break()
  smb: client: fix potential UAF in cifs_stats_proc_show()
  smb: client: fix potential UAF in cifs_stats_proc_write()
  smb: client: fix potential UAF in cifs_dump_full_key()
  smb: client: fix potential UAF in cifs_debug_files_proc_show()
  smb3: retrying on failed server close
  smb: client: serialise cifs_construct_tcon() with cifs_mount_mutex
  smb: client: handle DFS tcons in cifs_construct_tcon()
  smb: client: refresh referral without acquiring refpath_lock
  smb: client: guarantee refcounted children from parent session
  cifs: Fix caching to try to do open O_WRONLY as rdwr on server
  smb: client: fix UAF in smb2_reconnect_server()
  smb: client: replace deprecated strncpy with strscpy
2024-04-06 09:06:17 -07:00
Borislav Petkov (AMD)
b377c66ae3 x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk
srso_alias_untrain_ret() is special code, even if it is a dummy
which is called in the !SRSO case, so annotate it like its real
counterpart, to address the following objtool splat:

  vmlinux.o: warning: objtool: .export_symbol+0x2b290: data relocation to !ENDBR: srso_alias_untrain_ret+0x0

Fixes: 4535e1a417 ("x86/bugs: Fix the SRSO mitigation on Zen3/4")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240405144637.17908-1-bp@kernel.org
2024-04-06 13:01:50 +02:00
Ingo Molnar
5f2ca44ed2 Merge branch 'linus' into x86/urgent, to pick up dependent commit
We want to fix:

  0e11073247 ("x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO")

So merge in Linus's latest into x86/urgent to have it available.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-06 13:00:32 +02:00
Wolfram Sang
5ceeabb0eb Merge tag 'i2c-host-fixes-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
An unused const variable kind of error has been fixed by placing
the definition of icr_bits[] inside the ifdef block where it is
used.
2024-04-06 11:29:15 +02:00
Linus Torvalds
6c6e47d69d Merge tag 'firewire-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fixes from Takashi Sakamoto:
 "The firewire-ohci kernel module has a parameter for verbose kernel
  logging. It is well-known that it logs the spurious IRQ for bus-reset
  event due to the unmasked register for IRQ event. This update fixes
  the issue"

* tag 'firewire-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: ohci: mask bus reset interrupts between ISR and bottom half
2024-04-05 21:25:31 -07:00
Adam Goldman
752e3c53de firewire: ohci: mask bus reset interrupts between ISR and bottom half
In the FireWire OHCI interrupt handler, if a bus reset interrupt has
occurred, mask bus reset interrupts until bus_reset_work has serviced and
cleared the interrupt.

Normally, we always leave bus reset interrupts masked. We infer the bus
reset from the self-ID interrupt that happens shortly thereafter. A
scenario where we unmask bus reset interrupts was introduced in 2008 in
a007bb857e: If
OHCI_PARAM_DEBUG_BUSRESETS (8) is set in the debug parameter bitmask, we
will unmask bus reset interrupts so we can log them.

irq_handler logs the bus reset interrupt. However, we can't clear the bus
reset event flag in irq_handler, because we won't service the event until
later. irq_handler exits with the event flag still set. If the
corresponding interrupt is still unmasked, the first bus reset will
usually freeze the system due to irq_handler being called again each
time it exits. This freeze can be reproduced by loading firewire_ohci
with "modprobe firewire_ohci debug=-1" (to enable all debugging output).
Apparently there are also some cases where bus_reset_work will get called
soon enough to clear the event, and operation will continue normally.

This freeze was first reported a few months after a007bb85 was committed,
but until now it was never fixed. The debug level could safely be set
to -1 through sysfs after the module was loaded, but this would be
ineffectual in logging bus reset interrupts since they were only
unmasked during initialization.

irq_handler will now leave the event flag set but mask bus reset
interrupts, so irq_handler won't be called again and there will be no
freeze. If OHCI_PARAM_DEBUG_BUSRESETS is enabled, bus_reset_work will
unmask the interrupt after servicing the event, so future interrupts
will be caught as desired.

As a side effect to this change, OHCI_PARAM_DEBUG_BUSRESETS can now be
enabled through sysfs in addition to during initial module loading.
However, when enabled through sysfs, logging of bus reset interrupts will
be effective only starting with the second bus reset, after
bus_reset_work has executed.

Signed-off-by: Adam Goldman <adamg@pobox.com>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-04-06 09:36:46 +09:00
Linus Torvalds
104db052b6 Merge tag 'spi-fix-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
 "A few small driver specific fixes, the most important being the
  s3c64xx change which is likely to be hit during normal operation"

* tag 'spi-fix-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: mchp-pci1xxx: Fix a possible null pointer dereference in pci1xxx_spi_probe
  spi: spi-fsl-lpspi: remove redundant spi_controller_put call
  spi: s3c64xx: Use DMA mode from fifo size
2024-04-05 17:26:43 -07:00
Linus Torvalds
20668408ab Merge tag 'regulator-fix-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
 "One simple regualtor fix, fixing module autoloading on tps65132"

* tag 'regulator-fix-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: tps65132: Add of_match table
2024-04-05 17:24:04 -07:00
Linus Torvalds
a6bec447a8 Merge tag 'regmap-fix-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
 "Richard found a nasty corner case in the maple tree code which he
  fixed, and also fixed a compiler warning which was showing up with the
  toolchain he uses and helpfully identified a possible incorrect error
  code which could have runtime impacts"

* tag 'regmap-fix-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: maple: Fix uninitialized symbol 'ret' warnings
  regmap: maple: Fix cache corruption in regcache_maple_drop()
2024-04-05 17:21:16 -07:00
Linus Torvalds
8a05ef7087 Merge tag 'block-6.9-20240405' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
      - Atomic queue limits fixes (Christoph)
      - Fabrics fixes (Hannes, Daniel)

 - Discard overflow fix (Li)

 - Cleanup fix for null_blk (Damien)

* tag 'block-6.9-20240405' of git://git.kernel.dk/linux:
  nvme-fc: rename free_ctrl callback to match name pattern
  nvmet-fc: move RCU read lock to nvmet_fc_assoc_exists
  nvmet: implement unique discovery NQN
  nvme: don't create a multipath node for zero capacity devices
  nvme: split nvme_update_zone_info
  nvme-multipath: don't inherit LBA-related fields for the multipath node
  block: fix overflow in blk_ioctl_discard()
  nullblk: Fix cleanup order in null_add_dev() error path
2024-04-05 17:04:11 -07:00
Linus Torvalds
4f72ed492d Merge tag 'io_uring-6.9-20240405' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:

 - Backport of some fixes that came up during development of the 6.10
   io_uring patches. This includes some kbuf cleanups and reference
   fixes.

 - Disable multishot read if we don't have NOWAIT support on the target

 - Fix for a dependency issue with workqueue flushing

* tag 'io_uring-6.9-20240405' of git://git.kernel.dk/linux:
  io_uring/kbuf: hold io_buffer_list reference over mmap
  io_uring/kbuf: protect io_buffer_list teardown with a reference
  io_uring/kbuf: get rid of bl->is_ready
  io_uring/kbuf: get rid of lower BGID lists
  io_uring: use private workqueue for exit work
  io_uring: disable io-wq execution of multishot NOWAIT requests
  io_uring/rw: don't allow multishot reads without NOWAIT support
2024-04-05 16:58:52 -07:00