linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-02 08:11:32 -04:00

Author	SHA1	Message	Date
Jakub Kicinski	1127170d45	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-02-09 We've added 126 non-merge commits during the last 16 day(s) which contain a total of 201 files changed, 4049 insertions(+), 2215 deletions(-). The main changes are: 1) Add custom BPF allocator for JITs that pack multiple programs into a huge page to reduce iTLB pressure, from Song Liu. 2) Add __user tagging support in vmlinux BTF and utilize it from BPF verifier when generating loads, from Yonghong Song. 3) Add per-socket fast path check guarding from cgroup/BPF overhead when used by only some sockets, from Pavel Begunkov. 4) Continued libbpf deprecation work of APIs/features and removal of their usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko and various others. 5) Improve BPF instruction set documentation by adding byte swap instructions and cleaning up load/store section, from Christoph Hellwig. 6) Switch BPF preload infra to light skeleton and remove libbpf dependency from it, from Alexei Starovoitov. 7) Fix architecture-agnostic macros in libbpf for accessing syscall arguments from BPF progs for non-x86 architectures, from Ilya Leoshkevich. 8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be of 16-bit field with anonymous zero padding, from Jakub Sitnicki. 9) Add new bpf_copy_from_user_task() helper to read memory from a different task than current. Add ability to create sleepable BPF iterator progs, from Kenny Yu. 10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski. 11) Generate temporary netns names for BPF selftests to avoid naming collisions, from Hangbin Liu. 12) Implement bpf_core_types_are_compat() with limited recursion for in-kernel usage, from Matteo Croce. 13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5 to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor. 14) Misc minor fixes to libbpf and selftests from various folks. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits) selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide libbpf: Fix compilation warning due to mismatched printf format selftests/bpf: Test BPF_KPROBE_SYSCALL macro libbpf: Add BPF_KPROBE_SYSCALL macro libbpf: Fix accessing the first syscall argument on s390 libbpf: Fix accessing the first syscall argument on arm64 libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390 libbpf: Fix accessing syscall arguments on riscv libbpf: Fix riscv register names libbpf: Fix accessing syscall arguments on powerpc selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro libbpf: Add PT_REGS_SYSCALL_REGS macro selftests/bpf: Fix an endianness issue in bpf_syscall_macro test bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE bpf: Fix leftover header->pages in sparc and powerpc code. libbpf: Fix signedness bug in btf_dump_array_data() selftests/bpf: Do not export subtest as standalone test bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures ... ==================== Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-09 18:40:56 -08:00
Song Liu	0f350231b5	bpf: Fix leftover header->pages in sparc and powerpc code. Replace header->pages * PAGE_SIZE with new header->size. Fixes: `ed2d9e1a26` ("bpf: Use size instead of pages in bpf_binary_header") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220208220509.4180389-2-song@kernel.org	2022-02-08 14:52:05 -08:00
Song Liu	f95f768f0a	bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures Instead of BUG_ON(), fail gracefully and return orig_prog. Fixes: `1022a5498f` ("bpf, x86_64: Use bpf_jit_binary_pack_alloc") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220208062533.3802081-1-song@kernel.org	2022-02-08 09:23:18 -08:00
Song Liu	1022a5498f	bpf, x86_64: Use bpf_jit_binary_pack_alloc Use bpf_jit_binary_pack_alloc in x86_64 jit. The jit engine first writes the program to the rw buffer. When the jit is done, the program is copied to the final location with bpf_jit_binary_pack_finalize. Note that we need to do bpf_tail_call_direct_fixup after finalize. Therefore, the text_live = false logic in __bpf_arch_text_poke is no longer needed. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-10-song@kernel.org	2022-02-07 18:13:01 -08:00
Song Liu	ebc1415d9b	bpf: Introduce bpf_arch_text_copy This will be used to copy JITed text to RO protected module memory. On x86, bpf_arch_text_copy is implemented with text_poke_copy. bpf_arch_text_copy returns pointer to dst on success, and ERR_PTR(errno) on errors. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-7-song@kernel.org	2022-02-07 18:13:01 -08:00
Song Liu	0e06b40371	x86/alternative: Introduce text_poke_copy This will be used by BPF jit compiler to dump JITed binary to a RX huge page, and thus allow multiple BPF programs sharing the a huge (2MB) page. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-6-song@kernel.org	2022-02-07 18:13:01 -08:00
Song Liu	fac54e2bfb	x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP This enables module_alloc() to allocate huge page for 2MB+ requests. To check the difference of this change, we need enable config CONFIG_PTDUMP_DEBUGFS, and call module_alloc(2MB). Before the change, /sys/kernel/debug/page_tables/kernel shows pte for this map. With the change, /sys/kernel/debug/page_tables/ show pmd for thie map. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-2-song@kernel.org	2022-02-07 18:13:01 -08:00
Hou Tao	b5e975d256	bpf, arm64: Enable kfunc call Since commit `b2eed9b588` ("arm64/kernel: kaslr: reduce module randomization range to 2 GB"), for arm64 whether KASLR is enabled or not, the module is placed within 2GB of the kernel region, so s32 in bpf_kfunc_desc is sufficient to represente the offset of module function relative to __bpf_call_base. The only thing needed is to override bpf_jit_supports_kfunc_call(). Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220130092917.14544-2-hotforest@gmail.com	2022-02-04 16:37:13 +01:00
Jakub Kicinski	c59400a68c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-03 17:36:16 -08:00
Linus Torvalds	d394bb77dd	Merge tag 'mips-fixes-5.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - fix missed change for PTR->PTR_WD conversion - kernel-doc fixes * tag 'mips-fixes-5.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: KVM: fix vz.c kernel-doc notation MIPS: octeon: Fix missed PTR->PTR_WD conversion	2022-02-03 06:45:34 -08:00
Randy Dunlap	2161ba0709	MIPS: KVM: fix vz.c kernel-doc notation Fix all kernel-doc warnings in mips/kvm/vz.c as reported by the kernel test robot: arch/mips/kvm/vz.c:471: warning: Function parameter or member 'out_compare' not described in '_kvm_vz_save_htimer' arch/mips/kvm/vz.c:471: warning: Function parameter or member 'out_cause' not described in '_kvm_vz_save_htimer' arch/mips/kvm/vz.c:471: warning: Excess function parameter 'compare' description in '_kvm_vz_save_htimer' arch/mips/kvm/vz.c:471: warning: Excess function parameter 'cause' description in '_kvm_vz_save_htimer' arch/mips/kvm/vz.c:1551: warning: No description found for return value of 'kvm_trap_vz_handle_cop_unusable' arch/mips/kvm/vz.c:1552: warning: expecting prototype for kvm_trap_vz_handle_cop_unusuable(). Prototype was for kvm_trap_vz_handle_cop_unusable() instead arch/mips/kvm/vz.c:1597: warning: No description found for return value of 'kvm_trap_vz_handle_msa_disabled' Fixes: `c992a4f6a9` ("KVM: MIPS: Implement VZ support") Fixes: `f4474d50c7` ("KVM: MIPS/VZ: Support hardware guest timer") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: linux-mips@vger.kernel.org Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Cc: James Hogan <jhogan@kernel.org> Cc: kvm@vger.kernel.org Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>	2022-02-01 08:36:27 +01:00
Thomas Bogendoerfer	50317b636e	MIPS: octeon: Fix missed PTR->PTR_WD conversion Fixes: `fa62f39dc7` ("MIPS: Fix build error due to PTR used in more places") Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>	2022-02-01 08:35:12 +01:00
Akhmat Karakotov	26859240e4	txhash: Add socket option to control TX hash rethink behavior Add the SO_TXREHASH socket option to control hash rethink behavior per socket. When default mode is set, sockets disable rehash at initialization and use sysctl option when entering listen state. setsockopt() overrides default behavior. Signed-off-by: Akhmat Karakotov <hmukos@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:05:25 +00:00
Linus Torvalds	a96d3a5b15	Merge tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add another Intel CPU model to the list of CPUs supporting the processor inventory unique number - Allow writing to MCE thresholding sysfs files again - a previous change had accidentally disabled it and no one noticed. Goes to show how much is this stuff used * tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN x86/MCE/AMD: Allow thresholding interface updates after init	2022-01-30 12:55:06 +02:00
Randy Dunlap	dbecf9b8b8	ia64: make IA64_MCA_RECOVERY bool instead of tristate In linux-next, IA64_MCA_RECOVERY uses the (new) function make_task_dead(), which is not exported for use by modules. Instead of exporting it for one user, convert IA64_MCA_RECOVERY to be a bool Kconfig symbol. In a config file from "kernel test robot <lkp@intel.com>" for a different problem, this linker error was exposed when CONFIG_IA64_MCA_RECOVERY=m. Fixes this build error: ERROR: modpost: "make_task_dead" [arch/ia64/kernel/mca_recovery.ko] undefined! Link: https://lkml.kernel.org/r/20220124213129.29306-1-rdunlap@infradead.org Fixes: `0e25498f8c` ("exit: Add and use make_task_dead.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-30 09:56:58 +02:00
Robert Hancock	e461bd6f43	arm64: dts: zynqmp: Added GEM reset definitions The Cadence GEM/MACB driver now utilizes the platform-level reset on the ZynqMP platform. Add reset definitions to the ZynqMP platform device tree to allow this to be used. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-29 17:49:21 +00:00
Linus Torvalds	f8c7e4ede4	Merge tag 'pci-v5.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci fixes from Bjorn Helgaas: - Fix compilation warnings in new mt7621 driver (Sergio Paracuellos) - Restore the sysfs "rom" file for VGA shadow ROMs, which was broken when converting "rom" to be a static attribute (Bjorn Helgaas) * tag 'pci-v5.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI/sysfs: Find shadow ROM before static attribute initialization PCI: mt7621: Remove unused function pcie_rmw() PCI: mt7621: Drop of_match_ptr() to avoid unused variable	2022-01-29 19:05:47 +02:00
Linus Torvalds	d66c1e79b9	Merge tag 'powerpc-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix VM debug warnings on boot triggered via __set_fixmap(). - Fix a debug warning in the 64-bit Book3S PMU handling code. - Fix nested guest HFSCR handling with multiple vCPUs on Power9 or later. - Fix decrementer storm caused by a recent change, seen with some configs. Thanks to Alexey Kardashevskiy, Athira Rajeev, Christophe Leroy, Fabiano Rosas, Maxime Bizon, Nicholas Piggin, and Sachin Sant. * tag 'powerpc-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s/interrupt: Fix decrementer storm KVM: PPC: Book3S HV Nested: Fix nested HFSCR being clobbered with multiple vCPUs powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending powerpc/fixmap: Fix VM debug warning on unmap	2022-01-29 14:46:19 +02:00
Linus Torvalds	216e2aede2	Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Errata workarounds for Cortex-A510: broken hardware dirty bit management, detection code for the TRBE (tracing) bugs with the actual fixes going in via the CoreSight tree. - Cortex-X2 errata handling for TRBE (inheriting the workarounds from Cortex-A710). - Fix ex_handler_load_unaligned_zeropad() to use the correct struct members. - A couple of kselftest fixes for FPSIMD. - Silence the vdso "no previous prototype" warning. - Mark start_backtrace() notrace and NOKPROBE_SYMBOL. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: cpufeature: List early Cortex-A510 parts as having broken dbm kselftest/arm64: Correct logging of FPSIMD register read via ptrace kselftest/arm64: Skip VL_INHERIT tests for unsupported vector types arm64: errata: Add detection for TRBE trace data corruption arm64: errata: Add detection for TRBE invalid prohibited states arm64: errata: Add detection for TRBE ignored system register writes arm64: Add Cortex-A510 CPU part definition arm64: extable: fix load_unaligned_zeropad() reg indices arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL arm64: errata: Update ARM64_ERRATUM_[2119858\|2224489] with Cortex-X2 ranges arm64: Add Cortex-X2 CPU part definition arm64: vdso: Fix "no previous prototype" warning	2022-01-29 08:57:22 +02:00
Linus Torvalds	df0001545b	Merge tag 'trace-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pulltracing fixes from Steven Rostedt: - Limit mcount build time sorting to only those archs that we know it works for. - Fix memory leak in error path of histogram setup - Fix and clean up rel_loc array out of bounds issue - tools/rtla documentation fixes - Fix issues with histogram logic * tag 'trace-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Don't inc err_log entry count if entry allocation fails tracing: Propagate is_signed to expression tracing: Fix smatch warning for do while check in event_hist_trigger_parse() tracing: Fix smatch warning for null glob in event_hist_trigger_parse() tools/tracing: Update Makefile to build rtla rtla: Make doc build optional tracing/perf: Avoid -Warray-bounds warning for __rel_loc macro tracing: Avoid -Warray-bounds warning for __rel_loc macro tracing/histogram: Fix a potential memory leak for kstrdup() ftrace: Have architectures opt-in for mcount build time sorting	2022-01-28 19:30:35 +02:00
Linus Torvalds	3cd7cd8a62	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Two larger x86 series: - Redo incorrect fix for SEV/SMAP erratum - Windows 11 Hyper-V workaround Other x86 changes: - Various x86 cleanups - Re-enable access_tracking_perf_test - Fix for #GP handling on SVM - Fix for CPUID leaf 0Dh in KVM_GET_SUPPORTED_CPUID - Fix for ICEBP in interrupt shadow - Avoid false-positive RCU splat - Enable Enlightened MSR-Bitmap support for real ARM: - Correctly update the shadow register on exception injection when running in nVHE mode - Correctly use the mm_ops indirection when performing cache invalidation from the page-table walker - Restrict the vgic-v3 workaround for SEIS to the two known broken implementations Generic code changes: - Dead code cleanup" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) KVM: eventfd: Fix false positive RCU usage warning KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread() KVM: nVMX: Rename vmcs_to_field_offset{,_table} KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS selftests: kvm: check dynamic bits against KVM_X86_XCOMP_GUEST_SUPP KVM: x86: add system attribute to retrieve full set of supported xsave states KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr selftests: kvm: move vm_xsave_req_perm call to amx_test KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS KVM: x86: Keep MSR_IA32_XSS unchanged for INIT KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2} KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest KVM: x86: Check .flags in kvm_cpuid_check_equal() too KVM: x86: Forcibly leave nested virt when SMM state is toggled KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments() KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real ...	2022-01-28 19:00:26 +02:00
Linus Torvalds	e0152705e4	Merge tag 'mips-fixes-5.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS build fix from Thomas Bogendoerfer: "Fix for allmodconfig build" * tag 'mips-fixes-5.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Fix build error due to PTR used in more places	2022-01-28 18:53:45 +02:00
Linus Torvalds	7eb3625489	Merge tag 's390-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix loading of modules with lots of relocations and add a regression test for it. - Fix machine check handling for vector validity and guarded storage validity failures in KVM guests. - Fix hypervisor performance data to include z/VM guests with access control group set. - Fix z900 build problem in uaccess code. - Update defconfigs. * tag 's390-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/hypfs: include z/VM guests with access control group set s390: update defconfigs s390/module: test loading modules with a lot of relocations s390/module: fix loading modules with a lot of relocations s390/uaccess: fix compile error s390/nmi: handle vector validity failures for KVM guests s390/nmi: handle guarded storage validity failures for KVM guests	2022-01-28 18:50:05 +02:00
James Morse	297ae1eb23	arm64: cpufeature: List early Cortex-A510 parts as having broken dbm Versions of Cortex-A510 before r0p3 are affected by a hardware erratum where the hardware update of the dirty bit is not correctly ordered. Add these cpus to the cpu_has_broken_dbm list. Signed-off-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20220125154040.549272-3-james.morse@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-01-28 16:15:46 +00:00
Catalin Marinas	df20597044	Merge tag 'trbe-cortex-a510-errata' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux into for-next/fixes coresight: trbe: Workaround Cortex-A510 erratas This pull request is providing arm64 definitions to support TRBE Cortex-A510 erratas. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> * tag 'trbe-cortex-a510-errata' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux: arm64: errata: Add detection for TRBE trace data corruption arm64: errata: Add detection for TRBE invalid prohibited states arm64: errata: Add detection for TRBE ignored system register writes arm64: Add Cortex-A510 CPU part definition	2022-01-28 16:14:06 +00:00
Paolo Bonzini	17179d0068	Merge tag 'kvmarm-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #1 - Correctly update the shadow register on exception injection when running in nVHE mode - Correctly use the mm_ops indirection when performing cache invalidation from the page-table walker - Restrict the vgic-v3 workaround for SEIS to the two known broken implementations	2022-01-28 07:45:15 -05:00
Vitaly Kuznetsov	6cbbaab60f	KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use Hyper-V TLFS explicitly forbids VMREAD and VMWRITE instructions when Enlightened VMCS interface is in use: "Any VMREAD or VMWRITE instructions while an enlightened VMCS is active is unsupported and can result in unexpected behavior."" Windows 11 + WSL2 seems to ignore this, attempts to VMREAD VMCS field 0x4404 ("VM-exit interruption information") are observed. Failing these attempts with nested_vmx_failInvalid() makes such guests unbootable. Microsoft confirms this is a Hyper-V bug and claims that it'll get fixed eventually but for the time being we need a workaround. (Temporary) allow VMREAD to get data from the currently loaded Enlightened VMCS. Note: VMWRITE instructions remain forbidden, it is not clear how to handle them properly and hopefully won't ever be needed. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:38:26 -05:00
Vitaly Kuznetsov	892a42c10d	KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread() In preparation to allowing reads from Enlightened VMCS from handle_vmread(), implement evmcs_field_offset() to get the correct read offset. get_evmcs_offset(), which is being used by KVM-on-Hyper-V, is almost what's needed but a few things need to be adjusted. First, WARN_ON() is unacceptable for handle_vmread() as any field can (in theory) be supplied by the guest and not all fields are defined in eVMCS v1. Second, we need to handle 'holes' in eVMCS (missing fields). It also sounds like a good idea to WARN_ON() if such fields are ever accessed by KVM-on-Hyper-V. Implement dedicated evmcs_field_offset() helper. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:38:26 -05:00
Vitaly Kuznetsov	2423a4c0d1	KVM: nVMX: Rename vmcs_to_field_offset{,_table} vmcs_to_field_offset{,_table} may sound misleading as VMCS is an opaque blob which is not supposed to be accessed directly. In fact, vmcs_to_field_offset{,_table} are related to KVM defined VMCS12 structure. Rename vmcs_field_to_offset() to get_vmcs12_field_offset() for clarity. No functional change intended. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:38:26 -05:00
Vitaly Kuznetsov	7a601e2cf6	KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Enlightened VMCS v1 doesn't have VMX_PREEMPTION_TIMER_VALUE field, PIN_BASED_VMX_PREEMPTION_TIMER is also filtered out already so it makes sense to filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER too. Note, none of the currently existing Windows/Hyper-V versions are known to enable 'save VMX-preemption timer value' when eVMCS is in use, the change is aimed at making the filtering future proof. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:38:25 -05:00
Vitaly Kuznetsov	f80ae0ef08	KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS Similar to MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS, MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS pair, MSR_IA32_VMX_TRUE_PINBASED_CTLS needs to be filtered the same way MSR_IA32_VMX_PINBASED_CTLS is currently filtered as guests may solely rely on 'true' MSR data. Note, none of the currently existing Windows/Hyper-V versions are known to stumble upon the unfiltered MSR_IA32_VMX_TRUE_PINBASED_CTLS, the change is aimed at making the filtering future proof. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:38:25 -05:00
Paolo Bonzini	dd6e631220	KVM: x86: add system attribute to retrieve full set of supported xsave states Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that have not been enabled. Probing those, for example so that they can be passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl. The latter is in fact worse, even though that is what the rest of the API uses, because it would require supported_xcr0 to be moved from the KVM module to the kernel just for this use. In addition, the value would be nonsensical (or an error would have to be returned) until the KVM module is loaded in. Therefore, to limit the growth of system ioctls, add a /dev/kvm variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86 with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:33:32 -05:00
Sean Christopherson	56f289a8d2	KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr Add a helper to handle converting the u64 userspace address embedded in struct kvm_device_attr into a userspace pointer, it's all too easy to forget the intermediate "unsigned long" cast as well as the truncation check. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:32:00 -05:00
Steven Rostedt (Google)	4ed308c445	ftrace: Have architectures opt-in for mcount build time sorting First S390 complained that the sorting of the mcount sections at build time caused the kernel to crash on their architecture. Now PowerPC is complaining about it too. And also ARM64 appears to be having issues. It may be necessary to also update the relocation table for the values in the mcount table. Not only do we have to sort the table, but also update the relocations that may be applied to the items in the table. If the system is not relocatable, then it is fine to sort, but if it is, some architectures may have issues (although x86 does not as it shifts all addresses the same). Add a HAVE_BUILDTIME_MCOUNT_SORT that an architecture can set to say it is safe to do the sorting at build time. Also update the config to compile in build time sorting in the sorttable code in scripts/ to depend on CONFIG_BUILDTIME_MCOUNT_SORT. Link: https://lore.kernel.org/all/944D10DA-8200-4BA9-8D0A-3BED9AA99F82@linux.ibm.com/ Link: https://lkml.kernel.org/r/20220127153821.3bc1ac6e@gandalf.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Yinan Liu <yinan@linux.alibaba.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Kees Cook <keescook@chromium.org> Reported-by: Sachin Sant <sachinp@linux.ibm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64] Tested-by: Sachin Sant <sachinp@linux.ibm.com> Fixes: `72b3942a17` ("scripts: ftrace - move the sort-processing in ftrace_init") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2022-01-27 19:15:44 -05:00
Hou Tao	b6ec79518e	bpf, x86: Remove unnecessary handling of BPF_SUB atomic op According to the LLVM commit (https://reviews.llvm.org/D72184), sync_fetch_and_sub() is implemented as a negation followed by sync_fetch_and_add(), so there will be no BPF_SUB op, thus just remove it. BPF_SUB is also rejected by the verifier anyway. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Brendan Jackman <jackmanb@google.com> Link: https://lore.kernel.org/bpf/20220127083240.1425481-1-houtao1@huawei.com	2022-01-27 22:47:05 +01:00
Anshuman Khandual	708e8af492	arm64: errata: Add detection for TRBE trace data corruption TRBE implementations affected by Arm erratum #1902691 might corrupt trace data or deadlock, when it's being written into the memory. So effectively TRBE is broken and hence cannot be used to capture trace data. This adds a new errata ARM64_ERRATUM_1902691 in arm64 errata framework. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-doc@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1643120437-14352-5-git-send-email-anshuman.khandual@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>	2022-01-27 12:01:53 -07:00
Anshuman Khandual	3bd94a8759	arm64: errata: Add detection for TRBE invalid prohibited states TRBE implementations affected by Arm erratum #2038923 might get TRBE into an inconsistent view on whether trace is prohibited within the CPU. As a result, the trace buffer or trace buffer state might be corrupted. This happens after TRBE buffer has been enabled by setting TRBLIMITR_EL1.E, followed by just a single context synchronization event before execution changes from a context, in which trace is prohibited to one where it isn't, or vice versa. In these mentioned conditions, the view of whether trace is prohibited is inconsistent between parts of the CPU, and the trace buffer or the trace buffer state might be corrupted. This adds a new errata ARM64_ERRATUM_2038923 in arm64 errata framework. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-doc@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1643120437-14352-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>	2022-01-27 12:01:53 -07:00
Anshuman Khandual	607a9afaae	arm64: errata: Add detection for TRBE ignored system register writes TRBE implementations affected by Arm erratum #2064142 might fail to write into certain system registers after the TRBE has been disabled. Under some conditions after TRBE has been disabled, writes into certain TRBE registers TRBLIMITR_EL1, TRBPTR_EL1, TRBBASER_EL1, TRBSR_EL1 and TRBTRG_EL1 will be ignored and not be effected. This adds a new errata ARM64_ERRATUM_2064142 in arm64 errata framework. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-doc@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1643120437-14352-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>	2022-01-27 12:01:53 -07:00
Anshuman Khandual	53960faf2b	arm64: Add Cortex-A510 CPU part definition Add the CPU Partnumbers for the new Arm designs. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1643120437-14352-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>	2022-01-27 12:01:53 -07:00
Thomas Bogendoerfer	fa62f39dc7	MIPS: Fix build error due to PTR used in more places Use PTR_WD instead of PTR to avoid clashes with other parts. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>	2022-01-27 09:04:19 +01:00
Evgenii Stepanov	3758a6c74e	arm64: extable: fix load_unaligned_zeropad() reg indices In ex_handler_load_unaligned_zeropad() we erroneously extract the data and addr register indices from ex->type rather than ex->data. As ex->type will contain EX_TYPE_LOAD_UNALIGNED_ZEROPAD (i.e. 4): * We'll always treat X0 as the address register, since EX_DATA_REG_ADDR is extracted from bits [9:5]. Thus, we may attempt to dereference an arbitrary address as X0 may hold an arbitrary value. * We'll always treat X4 as the data register, since EX_DATA_REG_DATA is extracted from bits [4:0]. Thus we will corrupt X4 and cause arbitrary behaviour within load_unaligned_zeropad() and its caller. Fix this by extracting both values from ex->data as originally intended. On an MTE-enabled QEMU image we are hitting the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: fixup_exception+0xc4/0x108 __do_kernel_fault+0x3c/0x268 do_tag_check_fault+0x3c/0x104 do_mem_abort+0x44/0xf4 el1_abort+0x40/0x64 el1h_64_sync_handler+0x60/0xa0 el1h_64_sync+0x7c/0x80 link_path_walk+0x150/0x344 path_openat+0xa0/0x7dc do_filp_open+0xb8/0x168 do_sys_openat2+0x88/0x17c __arm64_sys_openat+0x74/0xa0 invoke_syscall+0x48/0x148 el0_svc_common+0xb8/0xf8 do_el0_svc+0x28/0x88 el0_svc+0x24/0x84 el0t_64_sync_handler+0x88/0xec el0t_64_sync+0x1b4/0x1b8 Code: f8695a69 71007d1f 540000e0 927df12a (f940014a) Fixes: `753b323687` ("arm64: extable: add load_unaligned_zeropad() handler") Cc: <stable@vger.kernel.org> # 5.16.x Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Evgenii Stepanov <eugenis@google.com> Link: https://lore.kernel.org/r/20220125182217.2605202-1-eugenis@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-01-26 18:58:12 +00:00
Like Xu	05a9e06505	KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time XCR0 is reset to 1 by RESET but not INIT and IA32_XSS is zeroed by both RESET and INIT. The kvm_set_msr_common()'s handling of MSR_IA32_XSS also needs to update kvm_update_cpuid_runtime(). In the above cases, the size in bytes of the XSAVE area containing all states enabled by XCR0 or (XCRO \| IA32_XSS) needs to be updated. For simplicity and consistency, existing helpers are used to write values and call kvm_update_cpuid_runtime(), and it's not exactly a fast path. Fixes: `a554d207dc` ("KVM: X86: Processor States following Reset or INIT") Cc: stable@vger.kernel.org Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220126172226.2298529-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:42:45 -05:00
Like Xu	4c282e51e4	KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS Do a runtime CPUID update for a vCPU if MSR_IA32_XSS is written, as the size in bytes of the XSAVE area is affected by the states enabled in XSS. Fixes: `203000993d` ("kvm: vmx: add MSR logic for XSAVES") Cc: stable@vger.kernel.org Signed-off-by: Like Xu <likexu@tencent.com> [sean: split out as a separate patch, adjust Fixes tag] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220126172226.2298529-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:42:45 -05:00
Xiaoyao Li	be4f3b3f82	KVM: x86: Keep MSR_IA32_XSS unchanged for INIT It has been corrected from SDM version 075 that MSR_IA32_XSS is reset to zero on Power up and Reset but keeps unchanged on INIT. Fixes: `a554d207dc` ("KVM: X86: Processor States following Reset or INIT") Cc: stable@vger.kernel.org Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220126172226.2298529-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:42:44 -05:00
Vasily Gorbik	663d34c8df	s390/hypfs: include z/VM guests with access control group set Currently if z/VM guest is allowed to retrieve hypervisor performance data globally for all guests (privilege class B) the query is formed in a way to include all guests but the group name is left empty. This leads to that z/VM guests which have access control group set not being included in the results (even local vm). Change the query group identifier from empty to "any" to retrieve information about all guests from any groups (or without a group set). Cc: stable@vger.kernel.org Fixes: `31cb4bd31a` ("[S390] Hypervisor filesystem (s390_hypfs) for z/VM") Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2022-01-26 18:42:39 +01:00
Sean Christopherson	811f95ff95	KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2} Free the "struct kvm_cpuid_entry2" array on successful post-KVM_RUN KVM_SET_CPUID{,2} to fix a memory leak, the callers of kvm_set_cpuid() free the array only on failure. BUG: memory leak unreferenced object 0xffff88810963a800 (size 2048): comm "syz-executor025", pid 3610, jiffies 4294944928 (age 8.080s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 0d 00 00 00 ................ 47 65 6e 75 6e 74 65 6c 69 6e 65 49 00 00 00 00 GenuntelineI.... backtrace: [<ffffffff814948ee>] kmalloc_node include/linux/slab.h:604 [inline] [<ffffffff814948ee>] kvmalloc_node+0x3e/0x100 mm/util.c:580 [<ffffffff814950f2>] kvmalloc include/linux/slab.h:732 [inline] [<ffffffff814950f2>] vmemdup_user+0x22/0x100 mm/util.c:199 [<ffffffff8109f5ff>] kvm_vcpu_ioctl_set_cpuid2+0x8f/0xf0 arch/x86/kvm/cpuid.c:423 [<ffffffff810711b9>] kvm_arch_vcpu_ioctl+0xb99/0x1e60 arch/x86/kvm/x86.c:5251 [<ffffffff8103e92d>] kvm_vcpu_ioctl+0x4ad/0x950 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4066 [<ffffffff815afacc>] vfs_ioctl fs/ioctl.c:51 [inline] [<ffffffff815afacc>] __do_sys_ioctl fs/ioctl.c:874 [inline] [<ffffffff815afacc>] __se_sys_ioctl fs/ioctl.c:860 [inline] [<ffffffff815afacc>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860 [<ffffffff844a3335>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff844a3335>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: `c6617c61e8` ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN") Cc: stable@vger.kernel.org Reported-by: syzbot+be576ad7655690586eec@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220125210445.2053429-1-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:42:31 -05:00
Sean Christopherson	d6e656cd26	KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 WARN if KVM attempts to allocate a shadow VMCS for vmcs02. KVM emulates VMCS shadowing but doesn't virtualize it, i.e. KVM should never allocate a "real" shadow VMCS for L2. The previous code WARNed but continued anyway with the allocation, presumably in an attempt to avoid NULL pointer dereference. However, alloc_vmcs (and hence alloc_shadow_vmcs) can fail, and indeed the sole caller does: if (enable_shadow_vmcs && !alloc_shadow_vmcs(vcpu)) goto out_shadow_vmcs; which makes it not a useful attempt. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220125220527.2093146-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:04 -05:00
Vitaly Kuznetsov	033a3ea59a	KVM: x86: Check .flags in kvm_cpuid_check_equal() too kvm_cpuid_check_equal() checks for the (full) equality of the supplied CPUID data so .flags need to be checked too. Reported-by: Sean Christopherson <seanjc@google.com> Fixes: `c6617c61e8` ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220126131804.2839410-1-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:03 -05:00
Sean Christopherson	f7e570780e	KVM: x86: Forcibly leave nested virt when SMM state is toggled Forcibly leave nested virtualization operation if userspace toggles SMM state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS. If userspace forces the vCPU out of SMM while it's post-VMXON and then injects an SMI, vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both vmxon=false and smm.vmxon=false, but all other nVMX state allocated. Don't attempt to gracefully handle the transition as (a) most transitions are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't sufficient information to handle all transitions, e.g. SVM wants access to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede KVM_SET_NESTED_STATE during state restore as the latter disallows putting the vCPU into L2 if SMM is active, and disallows tagging the vCPU as being post-VMXON in SMM if SMM is not active. Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU in an architecturally impossible state. WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Modules linked in: CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00 Call Trace: <TASK> kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline] kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460 kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline] kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline] kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250 kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273 __fput+0x286/0x9f0 fs/file_table.c:311 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xb29/0x2a30 kernel/exit.c:806 do_group_exit+0xd2/0x2f0 kernel/exit.c:935 get_signal+0x4b0/0x28c0 kernel/signal.c:2862 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Cc: stable@vger.kernel.org Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220125220358.2091737-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:03 -05:00
Vitaly Kuznetsov	aa3b39f38c	KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments() Commit `3fa5e8fd0a` ("KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized") re-arranged svm_vcpu_init_msrpm() call in svm_create_vcpu(), thus making the comment about vmcb being NULL obsolete. Drop it. While on it, drop superfluous vmcb_is_clean() check: vmcb_mark_dirty() is a bit flip, an extra check is unlikely to bring any performance gain. Drop now-unneeded vmcb_is_clean() helper as well. Fixes: `3fa5e8fd0a` ("KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211220152139.418372-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:03 -05:00

1 2 3 4 5 ...

193129 Commits