linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-12-27 08:45:26 -05:00

Author	SHA1	Message	Date
Linus Torvalds	feb06d2690	Merge tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Enhancements to Linux as the root partition for Microsoft Hypervisor: - Support a new mode called L1VH, which allows Linux to drive the hypervisor running the Azure Host directly - Support for MSHV crash dump collection - Allow Linux's memory management subsystem to better manage guest memory regions - Fix issues that prevented a clean shutdown of the whole system on bare metal and nested configurations - ARM64 support for the MSHV driver - Various other bug fixes and cleanups - Add support for Confidential VMBus for Linux guest on Hyper-V - Secure AVIC support for Linux guests on Hyper-V - Add the mshv_vtl driver to allow Linux to run as the secure kernel in a higher virtual trust level for Hyper-V * tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (58 commits) mshv: Cleanly shutdown root partition with MSHV mshv: Use reboot notifier to configure sleep state mshv: Add definitions for MSHV sleep state configuration mshv: Add support for movable memory regions mshv: Add refcount and locking to mem regions mshv: Fix huge page handling in memory region traversal mshv: Move region management to mshv_regions.c mshv: Centralize guest memory region destruction mshv: Refactor and rename memory region handling functions mshv: adjust interrupt control structure for ARM64 Drivers: hv: use kmalloc_array() instead of kmalloc() mshv: Add ioctl for self targeted passthrough hvcalls Drivers: hv: Introduce mshv_vtl driver Drivers: hv: Export some symbols for mshv_vtl static_call: allow using STATIC_CALL_TRAMP_STR() from assembly mshv: Extend create partition ioctl to support cpu features mshv: Allow mappings that overlap in uaddr mshv: Fix create memory region overlap check mshv: add WQ_PERCPU to alloc_workqueue users Drivers: hv: Use kmalloc_array() instead of kmalloc() ...	2025-12-09 06:10:17 +09:00
Linus Torvalds	208eed95fc	Merge tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "This is the first half of the driver changes: - A treewide interface change to the "syscore" operations for power management, as a preparation for future Tegra specific changes - Reset controller updates with added drivers for LAN969x, eic770 and RZ/G3S SoCs - Protection of system controller registers on Renesas and Google SoCs, to prevent trivially triggering a system crash from e.g. debugfs access - soc_device identification updates on Nvidia, Exynos and Mediatek - debugfs support in the ST STM32 firewall driver - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI - Cleanups for memory controller support on Nvidia and Renesas" * tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (114 commits) memory: tegra186-emc: Fix missing put_bpmp Documentation: reset: Remove reset_controller_add_lookup() reset: fix BIT macro reference reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe reset: th1520: Support reset controllers in more subsystems reset: th1520: Prepare for supporting multiple controllers dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets reset: remove legacy reset lookup code clk: davinci: psc: drop unused reset lookup reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support reset: eswin: Add eic7700 reset driver dt-bindings: reset: eswin: Documentation for eic7700 SoC reset: sparx5: add LAN969x support dt-bindings: reset: microchip: Add LAN969x support soc: rockchip: grf: Add select correct PWM implementation on RK3368 soc/tegra: pmc: Add USB wake events for Tegra234 amba: tegra-ahb: Fix device leak on SMMU enable ...	2025-12-05 17:29:04 -08:00
Praveen K Paladugu	615a6e7d83	mshv: Cleanly shutdown root partition with MSHV Root partitions running on MSHV currently attempt ACPI power-off, which MSHV intercepts and triggers a Machine Check Exception (MCE), leading to a kernel panic. Root partitions panic with a trace similar to: [ 81.306348] reboot: Power down [ 81.314709] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b2000000c0060001 [ 81.314711] mce: [Hardware Error]: TSC 3b8cb60a66 PPIN 11d98332458e4ea9 [ 81.314713] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1759339405 SOCKET 0 APIC 0 microcode ffffffff [ 81.314715] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [ 81.314716] mce: [Hardware Error]: Machine check: Processor context corrupt [ 81.314717] Kernel panic - not syncing: Fatal machine check To avoid this, configure the sleep state in the hypervisor and invoke the HVCALL_ENTER_SLEEP_STATE hypercall as the final step in the shutdown sequence. This ensures a clean and safe shutdown of the root partition. Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com> Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-12-05 23:25:05 +00:00
Praveen K Paladugu	f0be2600ac	mshv: Use reboot notifier to configure sleep state Configure sleep state information from ACPI in MSHV hypervisor using a reboot notifier. This data allows the hypervisor to correctly power off the host during shutdown. Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com> Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Reviewed-by: Stansialv Kinsburskii <skinsburskii@linux.miscrosoft.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-12-05 23:25:01 +00:00
Stanislav Kinsburskii	b9a66cd5cc	mshv: Add support for movable memory regions Introduce support for movable memory regions in the Hyper-V root partition driver to improve memory management flexibility and enable advanced use cases such as dynamic memory remapping. Mirror the address space between the Linux root partition and guest VMs using HMM. The root partition owns the memory, while guest VMs act as devices with page tables managed via hypercalls. MSHV handles VP intercepts by invoking hmm_range_fault() and updating SLAT entries. When memory is reclaimed, HMM invalidates the relevant regions, prompting MSHV to clear SLAT entries; guest VMs will fault again on access. Integrate mmu_interval_notifier for movable regions, implement handlers for HMM faults and memory invalidation, and update memory region mapping logic to support movable regions. While MMU notifiers are commonly used in virtualization drivers, this implementation leverages HMM (Heterogeneous Memory Management) for its specialized functionality. HMM provides a framework for mirroring, invalidation, and fault handling, reducing boilerplate and improving maintainability compared to generic MMU notifiers. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-12-05 23:20:49 +00:00
Stanislav Kinsburskii	c39dda0828	mshv: Add refcount and locking to mem regions Introduce kref-based reference counting and spinlock protection for memory regions in Hyper-V partition management. This change improves memory region lifecycle management and ensures thread-safe access to the region list. Previously, the regions list was protected by the partition mutex. However, this approach is too heavy for frequent fault and invalidation operations. Finer grained locking is now used to improve efficiency and concurrency. This is a precursor to supporting movable memory regions. Fault and invalidation handling for movable regions will require safe traversal of the region list and holding a region reference while performing invalidation or fault operations. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-12-05 23:20:28 +00:00
Stanislav Kinsburskii	abceb4297b	mshv: Fix huge page handling in memory region traversal The previous code assumed that if a region's first page was huge, the entire region consisted of huge pages and stored this in a large_pages flag. This premise is incorrect not only for movable regions (where pages can be split and merged on invalidate callbacks or page faults), but even for pinned regions: THPs can be split and merged during allocation, so a large, pinned region may contain a mix of huge and regular pages. This change removes the large_pages flag and replaces region-wide assumptions with per-chunk inspection of the actual page size when mapping, unmapping, sharing, and unsharing. This makes huge page handling correct for mixed-page regions and avoids relying on stale metadata that can easily become invalid as memory is remapped. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-12-05 23:20:25 +00:00
Stanislav Kinsburskii	e950c30a10	mshv: Move region management to mshv_regions.c Refactor memory region management functions from mshv_root_main.c into mshv_regions.c for better modularity and code organization. Adjust function calls and headers to use the new implementation. Improve maintainability and separation of concerns in the mshv_root module. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-12-05 23:20:20 +00:00
Stanislav Kinsburskii	6f6aed2c49	mshv: Centralize guest memory region destruction Centralize guest memory region destruction to prevent resource leaks and inconsistent cleanup across unmap and partition destruction paths. Unify region removal, encrypted partition access recovery, and region invalidation to improve maintainability and reliability. Reduce code duplication and make future updates less error-prone by encapsulating cleanup logic in a single helper. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-12-05 23:20:16 +00:00
Stanislav Kinsburskii	df4ff5f6cf	mshv: Refactor and rename memory region handling functions Simplify and unify memory region management to improve code clarity and reliability. Consolidate pinning and invalidation logic, adopt consistent naming, and remove redundant checks to reduce complexity. Enhance documentation and update call sites for maintainability. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-12-05 23:19:56 +00:00
Jinank Jain	9d70ef7a18	mshv: adjust interrupt control structure for ARM64 Interrupt control structure (union hv_interupt_control) has different fields when it comes to x86 vs ARM64. Bring in the correct structure from HyperV header files and adjust the existing interrupt routing code accordingly. Signed-off-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-12-05 23:16:49 +00:00
Gongwei Li	b5110eaf67	Drivers: hv: use kmalloc_array() instead of kmalloc() Replace kmalloc() with kmalloc_array() to prevent potential overflow, as recommended in Documentation/process/deprecated.rst. Signed-off-by: Gongwei Li <ligongwei@kylinos.cn> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-12-05 23:16:49 +00:00
Anirudh Rayabharam (Microsoft)	c720e6a873	mshv: Add ioctl for self targeted passthrough hvcalls Allow MSHV_ROOT_HVCALL IOCTL on the /dev/mshv fd. This IOCTL would execute a passthrough hypercall targeting the root/parent partition i.e. HV_PARTITION_ID_SELF. This will be useful for the VMM to query things like supported synthetic processor features, supported VMM capabiliites etc. Since hypercalls targeting the host partition could potentially perform privileged operations, allow only a limited set of hypercalls. To begin with, allow only: HVCALL_GET_PARTITION_PROPERTY HVCALL_GET_PARTITION_PROPERTY_EX Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-12-05 23:16:49 +00:00
Naman Jain	7bfe3b8ea6	Drivers: hv: Introduce mshv_vtl driver Provide an interface for Virtual Machine Monitor like OpenVMM and its use as OpenHCL paravisor to control VTL0 (Virtual trust Level). Expose devices and support IOCTLs for features like VTL creation, VTL0 memory management, context switch, making hypercalls, mapping VTL0 address space to VTL2 userspace, getting new VMBus messages and channel events in VTL2 etc. Co-developed-by: Roman Kisel <romank@linux.microsoft.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Co-developed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-12-05 23:16:26 +00:00
Linus Torvalds	2b09f480f0	Merge tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rseq updates from Thomas Gleixner: "A large overhaul of the restartable sequences and CID management: The recent enablement of RSEQ in glibc resulted in regressions which are caused by the related overhead. It turned out that the decision to invoke the exit to user work was not really a decision. More or less each context switch caused that. There is a long list of small issues which sums up nicely and results in a 3-4% regression in I/O benchmarks. The other detail which caused issues due to extra work in context switch and task migration is the CID (memory context ID) management. It also requires to use a task work to consolidate the CID space, which is executed in the context of an arbitrary task and results in sporadic uncontrolled exit latencies. The rewrite addresses this by: - Removing deprecated and long unsupported functionality - Moving the related data into dedicated data structures which are optimized for fast path processing. - Caching values so actual decisions can be made - Replacing the current implementation with a optimized inlined variant. - Separating fast and slow path for architectures which use the generic entry code, so that only fault and error handling goes into the TIF_NOTIFY_RESUME handler. - Rewriting the CID management so that it becomes mostly invisible in the context switch path. That moves the work of switching modes into the fork/exit path, which is a reasonable tradeoff. That work is only required when a process creates more threads than the cpuset it is allowed to run on or when enough threads exit after that. An artificial thread pool benchmarks which triggers this did not degrade, it actually improved significantly. The main effect in migration heavy scenarios is that runqueue lock held time and therefore contention goes down significantly" * tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched/mmcid: Switch over to the new mechanism sched/mmcid: Implement deferred mode change irqwork: Move data struct to a types header sched/mmcid: Provide CID ownership mode fixup functions sched/mmcid: Provide new scheduler CID mechanism sched/mmcid: Introduce per task/CPU ownership infrastructure sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex sched/mmcid: Provide precomputed maximal value sched/mmcid: Move initialization out of line signal: Move MMCID exit out of sighand lock sched/mmcid: Convert mm CID mask to a bitmap cpumask: Cache num_possible_cpus() sched/mmcid: Use cpumask_weighted_or() cpumask: Introduce cpumask_weighted_or() sched/mmcid: Prevent pointless work in mm_update_cpus_allowed() sched/mmcid: Move scheduler code out of global header sched: Fixup whitespace damage sched/mmcid: Cacheline align MM CID storage sched/mmcid: Use proper data structures sched/mmcid: Revert the complex CID management ...	2025-12-02 08:48:53 -08:00
Christian Brauner	c99dc44562	hv: convert mshv_ioctl_create_partition() to FD_ADD() Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-39-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-28 12:42:35 +01:00
Naman Jain	cffe9f58de	Drivers: hv: Export some symbols for mshv_vtl MSHV_VTL driver is going to be introduced, which is supposed to provide interface for Virtual Machine Monitors (VMMs) to control Virtual Trust Level (VTL). Export the symbols needed to make it work (vmbus_isr, hv_context and hv_post_message). Co-developed-by: Roman Kisel <romank@linux.microsoft.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Co-developed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506110544.q0NDMQVc-lkp@intel.com/ Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:17 +00:00
Muminul Islam	c91fe5f162	mshv: Extend create partition ioctl to support cpu features The existing mshv create partition ioctl does not provide a way to specify which cpu features are enabled in the guest. Instead, it attempts to enable all features and those that are not supported are silently disabled by the hypervisor. This was done to reduce unnecessary complexity and is sufficient for many cases. However, new scenarios require fine-grained control over these features. Define a new mshv_create_partition_v2 structure which supports passing the disabled processor and xsave feature bits through to the create partition hypercall directly. Introduce a new flag MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES which enables the new structure. If unset, the original mshv_create_partition struct is used, with the old behavior of enabling all features. Co-developed-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Muminul Islam <muislam@microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:17 +00:00
Magnus Kulke	f91bc8f61a	mshv: Allow mappings that overlap in uaddr Currently the MSHV driver rejects mappings that would overlap in userspace. Some VMMs require the same memory to be mapped to different parts of the guest's address space, and so working around this restriction is difficult. The hypervisor itself doesn't prohibit mappings that overlap in uaddr, (really in SPA; system physical addresses), so supporting this in the driver doesn't require any extra work: only the checks need to be removed. Since no userspace code until now has been able to overlap regions in userspace, relaxing this constraint can't break any existing code. Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:17 +00:00
Nuno Das Neves	ba9eb9b86d	mshv: Fix create memory region overlap check The current check is incorrect; it only checks if the beginning or end of a region is within an existing region. This doesn't account for userspace specifying a region that begins before and ends after an existing region. Change the logic to a range intersection check against gfns and uaddrs for each region. Remove mshv_partition_region_by_uaddr() as it is no longer used. Fixes: `621191d709` ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs") Reported-by: Michael Kelley <mhklinux@outlook.com> Closes: https://lore.kernel.org/linux-hyperv/SN6PR02MB41575BE0406D3AB22E1D7DB5D4C2A@SN6PR02MB4157.namprd02.prod.outlook.com/ Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:17 +00:00
Marco Crivellari	8ec6070fc8	mshv: add WQ_PERCPU to alloc_workqueue users Currently if a user enqueues a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:17 +00:00
Rahul Kumar	22cb2f06fa	Drivers: hv: Use kmalloc_array() instead of kmalloc() Documentation/process/deprecated.rst recommends against the use of kmalloc with dynamic size calculations due to the risk of overflow and smaller allocation being made than the caller was expecting. Replace kmalloc() with kmalloc_array() in hv_common.c to make the intended allocation size clearer and avoid potential overflow issues. The number of pages (pgcount) is bounded, so overflow is not a practical concern here. However, using kmalloc_array() better reflects the intent to allocate an array and improves consistency with other allocations. No functional change intended. Signed-off-by: Rahul Kumar <rk0006818@gmail.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:17 +00:00
Stanislav Kinsburskii	5f4b5edcb1	Drivers: hv: Resolve ambiguity in hypervisor version log Update the log message in hv_common_init to explicitly state that the reported version is for the Microsoft Hypervisor, not the host OS. Previously, this message was accurate for guests running on Windows hosts, where the host and hypervisor versions matched. With support for Linux hosts running the Hyper-V hypervisor, the host OS and hypervisor versions may differ. This change avoids confusion by making it clear that the version refers to the Microsoft Hypervisor regardless of the host operating system. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:17 +00:00
Kriish Sharma	6626f815a1	Drivers: hv: fix missing kernel-doc description for 'size' in request_arr_init() Add missing kernel-doc entry for the @size parameter in request_arr_init(), fixing the following documentation warning reported by the kernel test robot and detected via kernel-doc: Warning: drivers/hv/channel.c:595 function parameter 'size' not described in 'request_arr_init' Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503021934.wH1BERla-lkp@intel.com Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:16 +00:00
Jinank Jain	d62313bdf5	mshv: Introduce new hypercall to map stats page for L1VH partitions Introduce HVCALL_MAP_STATS_PAGE2 which provides a map location (GPFN) to map the stats to. This hypercall is required for L1VH partitions, depending on the hypervisor version. This uses the same check as the state page map location; mshv_use_overlay_gpfn(). Add mshv_map_vp_state_page() helpers to use this new hypercall or the old one depending on availability. For unmapping, the original HVCALL_UNMAP_STATS_PAGE works for both cases. Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:16 +00:00
Jinank Jain	19c515c27c	mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH Introduce mshv_use_overlay_gpfn() to check if a page needs to be allocated and passed to the hypervisor to map VP state pages. This is only needed on L1VH, and only on some (newer) versions of the hypervisor, hence the need to check vmm_capabilities. Introduce functions hv_map/unmap_vp_state_page() to handle the allocation and freeing. Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:16 +00:00
Purna Pavan Chandra Aekkaladevi	fd612d97a4	mshv: Get the vmm capabilities offered by the hypervisor Some hypervisor APIs are gated by feature bits in the "vmm capabilities" partition property. Store the capabilities on mshv_root module init, using HVCALL_GET_PARTITION_PROPERTY_EX. This is not supported on all hypervisors. In that case, just set the capabilities to 0 and proceed as normal. Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:16 +00:00
Purna Pavan Chandra Aekkaladevi	59aeea1959	mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall This hypercall can be used to fetch extended properties of a partition. Extended properties are properties with values larger than a u64. Some of these also need additional input arguments. Add helper function for using the hypercall in the mshv_root driver. Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:16 +00:00
Nuno Das Neves	9ebc528cfd	mshv: Only map vp->vp_stats_pages if on root scheduler This mapping is only used for checking if the dispatch thread is blocked. This is only relevant for the root scheduler, so check the scheduler type to determine whether to map/unmap these pages, instead of the current check, which is incorrect. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:15 +00:00
Roman Kisel	2647c96649	Drivers: hv: Support establishing the confidential VMBus connection To establish the confidential VMBus connection the CoCo VM, the guest first checks on the confidential VMBus availability, and then proceeds to initializing the communication stack. Implement that in the VMBus driver initialization. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:15 +00:00
Roman Kisel	b537794bc2	Drivers: hv: Set the default VMBus version to 6.0 The confidential VMBus is supported by the protocol version 6.0 onwards. Attempt to establish the VMBus 6.0 connection thus enabling the confidential VMBus features when available. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:15 +00:00
Roman Kisel	bf35d298bb	Drivers: hv: Support confidential VMBus channels To make use of Confidential VMBus channels, initialize the co_ring_buffers and co_external_memory fields of the channel structure. Advertise support upon negotiating the version and compute values for those fields and initialize them. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:15 +00:00
Roman Kisel	510164539f	Drivers: hv: Free msginfo when the buffer fails to decrypt The early failure path in __vmbus_establish_gpadl() doesn't deallocate msginfo if the buffer fails to decrypt. Fix the leak by breaking out the cleanup code into a separate function and calling it where required. Fixes: `d4dccf353d` ("Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VM") Reported-by: Michael Kelley <mkhlinux@outlook.com> Closes: https://lore.kernel.org/linux-hyperv/SN6PR02MB41573796F9787F67E0E97049D472A@SN6PR02MB4157.namprd02.prod.outlook.com Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:15 +00:00
Roman Kisel	0a4534bdf2	Drivers: hv: Allocate encrypted buffers when requested Confidential VMBus is built around using buffers not shared with the host. Support allocating encrypted buffers when requested. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:15 +00:00
Roman Kisel	e096fe2bd6	Drivers: hv: Functions for setting up and tearing down the paravisor SynIC The confidential VMBus runs with the paravisor SynIC and requires configuring it with the paravisor. Add the functions for configuring the paravisor SynIC. Update overall SynIC initialization logic to initialize the SynIC if it is present. Finally, break out SynIC interrupt enable/disable code into separate functions so that SynIC interrupts can be enabled or disabled via the paravisor instead of the hypervisor if the paravisor SynIC is present. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:15 +00:00
Roman Kisel	74fa5d7e5f	Drivers: hv: Rename the SynIC enable and disable routines The confidential VMBus requires support for the both hypervisor facing SynIC and the paravisor one. Rename the functions that enable and disable SynIC with the hypervisor. No functional changes. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:15 +00:00
Roman Kisel	09406f2f84	Drivers: hv: Check message and event pages for non-NULL before iounmap() It might happen that some hyp SynIC pages aren't allocated. Check for that and only then call iounmap(). Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:15 +00:00
Roman Kisel	1bb15327d5	Drivers: hv: remove stale comment The comment about the x2v shim is ancient and long since incorrect. Remove the incorrect comment. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:15 +00:00
Roman Kisel	25059d5e4c	Drivers: hv: Post messages through the confidential VMBus if available When the confidential VMBus is available, the guest should post messages to the paravisor. Update hv_post_message() to post messages to the paravisor rather than through GHCB or TD calls. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:15 +00:00
Roman Kisel	226494e5ee	Drivers: hv: Allocate the paravisor SynIC pages when required Confidential VMBus requires interacting with two SynICs -- one provided by the host hypervisor, and one provided by the paravisor. Each SynIC requires its own message and event pages. Refactor and extend the existing code to add allocating and freeing the message and event pages for the paravisor SynIC when it is present. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:15 +00:00
Roman Kisel	163224c189	Drivers: hv: Rename fields for SynIC message and event pages Confidential VMBus requires interacting with two SynICs -- one provided by the host hypervisor, and one provided by the paravisor. Each SynIC requires its own message and event pages. Rename the existing host-accessible SynIC message and event pages with the "hyp_" prefix to clearly distinguish them from the paravisor ones. The field name is also changed in mshv_root.* for consistency. No functional changes. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:14 +00:00
Roman Kisel	a156ad8c50	arch/x86: mshyperv: Trap on access for some synthetic MSRs hv_set_non_nested_msr() has special handling for SINT MSRs when a paravisor is present. In addition to updating the MSR on the host, the mirror MSR in the paravisor is updated, including with the proxy bit. But with Confidential VMBus, the proxy bit must not be used, so add a special case to skip it. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:14 +00:00
Roman Kisel	e6eeb3c782	arch: hyperv: Get/set SynIC synth.registers via paravisor The existing Hyper-V wrappers for getting and setting MSRs are hv_get/set_msr(). Via hv_get/set_non_nested_msr(), they detect when running in a CoCo VM with a paravisor, and use the TDX or SNP guest-host communication protocol to bypass the paravisor and go directly to the host hypervisor for SynIC MSRs. The "set" function also implements the required special handling for the SINT MSRs. Provide functions that allow manipulating the SynIC registers through the paravisor. Move vmbus_signal_eom() to a more appropriate location (which also avoids breaking KVM). Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:14 +00:00
Roman Kisel	6802d8af47	Drivers: hv: VMBus protocol version 6.0 The confidential VMBus is supported starting from the protocol version 6.0 onwards. Provide the required definitions. No functional changes. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:14 +00:00
Tianyu Lan	3e1b611515	drivers: hv: Allow vmbus message synic interrupt injected from Hyper-V When Secure AVIC is enabled, VMBus driver should call x2apic Secure AVIC interface to allow Hyper-V to inject VMBus message interrupt. Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:14 +00:00
Nuno Das Neves	4cc1aa469c	mshv: Fix deposit memory in MSHV_ROOT_HVCALL When the MSHV_ROOT_HVCALL ioctl is executing a hypercall, and gets HV_STATUS_INSUFFICIENT_MEMORY, it deposits memory and then returns -EAGAIN to userspace. The expectation is that the VMM will retry. However, some VMM code in the wild doesn't do this and simply fails. Rather than force the VMM to retry, change the ioctl to deposit memory on demand and immediately retry the hypercall as is done with all the other hypercall helper functions. In addition to making the ioctl easier to use, removing the need for multiple syscalls improves performance. There is a complication: unlike the other hypercall helper functions, in MSHV_ROOT_HVCALL the input is opaque to the kernel. This is problematic for rep hypercalls, because the next part of the input list can't be copied on each loop after depositing pages (this was the original reason for returning -EAGAIN in this case). Introduce hv_do_rep_hypercall_ex(), which adds a 'rep_start' parameter. This solves the issue, allowing the deposit loop in MSHV_ROOT_HVCALL to restart a rep hypercall after depositing pages partway through. Fixes: `621191d709` ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs") Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:14 +00:00
Nuno Das Neves	7563d021e2	mshv: Fix VpRootDispatchThreadBlocked value This value in the VP stats page is used to track if the VP can be dispatched for execution when there are no fast interrupts injected. The original value of 201 was used in a version of the hypervisor which did not ship. It was subsequently changed to 202 so that is the correct value. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:14 +00:00
Thierry Reding	a97fbc3ee3	syscore: Pass context data to callbacks Several drivers can benefit from registering per-instance data along with the syscore operations. To achieve this, move the modifiable fields out of the syscore_ops structure and into a separate struct syscore that can be registered with the framework. Add a void * driver data field for drivers to store contextual data that will be passed to the syscore ops. Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>	2025-11-14 10:01:52 +01:00
Thomas Gleixner	83409986f4	rseq, virt: Retrigger RSEQ after vcpu_run() Hypervisors invoke resume_user_mode_work() before entering the guest, which clears TIF_NOTIFY_RESUME. The @regs argument is NULL as there is no user space context available to them, so the rseq notify handler skips inspecting the critical section, but updates the CPU/MM CID values unconditionally so that the eventual pending rseq event is not lost on the way to user space. This is a pointless exercise as the task might be rescheduled before actually returning to user space and it creates unnecessary work in the vcpu_run() loops. It's way more efficient to ignore that invocation based on @regs == NULL and let the hypervisors re-raise TIF_NOTIFY_RESUME after returning from the vcpu_run() loop before returning from the ioctl(). This ensures that a pending RSEQ update is not lost and the IDs are updated before returning to user space. Once the RSEQ handling is decoupled from TIF_NOTIFY_RESUME, this turns into a NOOP. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20251027084306.399495855@linutronix.de	2025-11-04 08:30:23 +01:00
Mukesh Rathor	e3ec97c3ab	Drivers: hv: Make CONFIG_HYPERV bool With CONFIG_HYPERV and CONFIG_HYPERV_VMBUS separated, change CONFIG_HYPERV to bool from tristate. CONFIG_HYPERV now becomes the core Hyper-V hypervisor support, such as hypercalls, clocks/timers, Confidential Computing setup, PCI passthru, etc. that doesn't involve VMBus or VMBus devices. Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-10-01 00:00:45 +00:00

1 2 3 4 5 ...

1132 Commits