linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-22 03:44:51 -04:00

Author	SHA1	Message	Date
Arnd Bergmann	8de446fd49	Merge tag 'apple-soc-dt-6.18-part2' of https://git.kernel.org/pub/scm/linux/kernel/git/sven/linux into soc/dt Apple SoC DTS updates for 6.18, part 2 - New device trees for all M2 Pro, Max and Ultra models are added. This is responsible for most of the changed lines since we already need 2000+ lines just to describe all the power domains inside t602x-pmgr.dtsi for these SoCs. - Missing WiFi properties for t600x are added. - Bluetooth nodes are added for all t600x machines. - The PCIe ethernet iommu-map was fixed for the Apple M1 iMac to account for a disabled PCIe port. - SPMI, NVMe, SART and mailbox nodes for Apple's T2 and A11. * tag 'apple-soc-dt-6.18-part2' of https://git.kernel.org/pub/scm/linux/kernel/git/sven/linux: arm64: dts: apple: t8015: Add SPMI node arm64: dts: apple: t8012: Add SPMI node arm64: dts: apple: Add J180d (Mac Pro, M2 Ultra, 2023) device tree arm64: dts: apple: Add J474s, J475c and J475d device trees arm64: dts: apple: Add J414 and J416 Macbook Pro device trees arm64: dts: apple: Add initial t6020/t6021/t6022 DTs arm64: dts: apple: Add ethernet0 alias for J375 template dt-bindings: arm: apple: Add t6020x compatibles arm64: dts: apple: t8015: Add NVMe nodes arm64: dts: apple: t8015: Fix PCIE power domains dependencies arm64: dts: apple: Add devicetreee for t8112-j415 dt-bindings: arm: apple: Add t8112 j415 compatible arm64: dts: apple: t600x: Add bluetooth device nodes arm64: dts: apple: t600x: Add missing WiFi properties arm64: dts: apple: t8103-j457: Fix PCIe ethernet iommu-map Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-09-23 22:44:22 +02:00
Arnd Bergmann	26116b98d6	Merge tag 'omap-for-v6.18/dt-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap into soc/dt ARM: dts: ti: omap updates for v6.18 These are all minor corrections to the dts files. * tag 'omap-for-v6.18/dt-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap: ARM: dts: omap: am335x-cm-t335: Remove unused mcasp num-serializer property ARM: dts: ti: omap: omap3-devkit8000-lcd: Fix ti,keep-vref-on property to use correct boolean syntax in DTS ARM: dts: ti: omap: am335x-baltos: Fix ti,en-ck32k-xtal property in DTS to use correct boolean syntax ARM: dts: omap: Minor whitespace cleanup ARM: dts: omap: dm816x: Split 'reg' per entry ARM: dts: omap: dm814x: Split 'reg' per entry ARM: dts: am33xx-l4: fix UART compatible ARM: dts: ti: omap4: Use generic "ethernet" as node name Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-09-23 22:41:53 +02:00
Rob Herring (Arm)	345518c00b	arm64: dts: apm-shadowcat: Drop "apm,xgene2-pcie" compatible The "apm,xgene2-pcie" compatible is unused, undocumented, and in the wrong position in the compatible list. Given this is a mature and little used platform, just remove the compatible rather than fix the order and document it. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20250919161529.1293151-1-robh@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-09-23 22:39:18 +02:00
Rob Herring (Arm)	676af08386	arm64: dts: apm-shadowcat: Move slimpro nodes out of "simple-bus" node The slimpro nodes are not MMIO devices, so they don't belong under a "simple-bus" node. Move them to the top level. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20250919161509.1292227-1-robh@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-09-23 22:39:02 +02:00
Arnd Bergmann	6866b78566	Merge tag 'v6.17-rockchip-dtsfixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes Another missing supply and a wrong headphone gpio level. * tag 'v6.17-rockchip-dtsfixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: Fix the headphone detection on the orangepi 5 arm64: dts: rockchip: Add vcc supply for SPI Flash on NanoPC-T6	2025-09-23 22:32:48 +02:00
Arnd Bergmann	5eba504bb2	Merge tag 'sunxi-fixes-for-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes Allwinner fixes for 6.17 Two device tree style cleanups from the device tree maintainers. * tag 'sunxi-fixes-for-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: riscv: dts: allwinner: rename devterm i2c-gpio node to comply with binding ARM: dts: allwinner: Minor whitespace cleanup Link: https://lore.kernel.org/r/aMrsUfkTWx8g3bJ7@wens.tw Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-09-23 22:30:57 +02:00
Arnd Bergmann	abfbfb98ac	Merge tag 'amlogic-arm64-dt-for-v6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux into soc/dt Amlogic ARM64 DT for v6.18: - Add cache information to the Amlogic SoCs - Add RTC node for Amlogic C3 SoC - Fix PWM node for Amlogic C3 SoC - Remove UHS capability for Odroid-C2 SDCard * tag 'amlogic-arm64-dt-for-v6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux: arm64: dts: amlogic: gxbb-odroidc2: remove UHS capability for SD card dts: arm: amlogic: fix pwm node for c3 arm64: dts: amlogic: sm1-bananapi: lower SD card speed for stability arm64: dts: amlogic: Add cache information to the Amlogic T7 SoC arm64: dts: amlogic: Add cache information to the Amlogic S922X SoC arm64: dts: amlogic: Add cache information to the Amlogic S7 SoC arm64: dts: amlogic: Add cache information to the Amlogic C3 SoC arm64: dts: amlogic: Add cache information to the Amlogic A4 SoC arm64: dts: amlogic: Add cache information to the Amlogic A1 SoC arm64: dts: amlogic: Add cache information to the Amlogic GXM SoCS arm64: dts: amlogic: Add cache information to the Amlogic AXG SoCS arm64: dts: amlogic: Add cache information to the Amlogic G12A SoCS arm64: dts: amlogic: Add cache information to the Amlogic SM1 SoC arm64: dts: amlogic: Add cache information to the Amlogic GXBB and GXL SoC arm64: dts: amlogic: C3: Add RTC controller node Link: https://lore.kernel.org/r/d40e7e96-4a7c-4e4f-b36f-750c6525b95c@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-09-23 22:28:44 +02:00
Arnd Bergmann	9f1bbcc46e	Merge tag 'v6.17-rockchip-dtsfixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt Another missing supply and a wrong headphone gpio level. * tag 'v6.17-rockchip-dtsfixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: Fix the headphone detection on the orangepi 5 arm64: dts: rockchip: Add vcc supply for SPI Flash on NanoPC-T6 arm64: dts: rockchip: fix second M.2 slot on ROCK 5T arm64: dts: rockchip: fix USB on RADXA ROCK 5T arm64: dts: rockchip: Add vcc-supply to SPI flash on Pinephone Pro arm64: dts: rockchip: fix es8388 address on rk3588s-roc-pc arm64: dts: rockchip: Fix Bluetooth interrupts flag on Neardi LBA3368 arm64: dts: rockchip: correct network description on Sige5 arm64: dts: rockchip: Minor whitespace cleanup ARM: dts: rockchip: Minor whitespace cleanup arm64: dts: rockchip: Add supplies for eMMC on rk3588-orangepi-5 arm64: dts: rockchip: Fix the headphone detection on the orangepi 5 plus arm64: dts: rockchip: Add vcc-supply to SPI flash on rk3399-pinebook-pro arm64: dts: rockchip: mark eeprom as read-only for Radxa E52C	2025-09-23 22:26:59 +02:00
Arnd Bergmann	ec1ede181e	Merge tag 'spacemit-dt-for-6.18-1' of https://github.com/spacemit-com/linux into soc/dt RISC-V SpacemiT DT changes for 6.18 - Add OrangePi RV2 board support - Add reset support to UART driver - Add PDMA driver support - Remove sec_uart1 node * tag 'spacemit-dt-for-6.18-1' of https://github.com/spacemit-com/linux: riscv: dts: spacemit: uart: remove sec_uart1 device node riscv: dts: spacemit: Enable PDMA on Banana Pi F3 and Milkv Jupiter riscv: dts: spacemit: Add PDMA node for K1 SoC riscv: dts: spacemit: add UART resets for Soc K1 riscv: dts: spacemit: Add OrangePi RV2 board device tree dt-bindings: riscv: spacemit: Add OrangePi RV2 board Link: https://lore.kernel.org/r/20250919055525-GYC5766558@gentoo.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-09-23 22:22:24 +02:00
Arnd Bergmann	17752efeca	Merge tag 'sunxi-dt-for-6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/dt Allwinner Device Tree changes for 6.18 This tag contains two DT binding header changes that are shared with the clk tree. In this cycle we gained support for the MCU PRCM clock and reset controller on the A523/A527/T527 family of SoCs, the NPU which is a Vivante GC9000 IP block, and the NPU clock that was missing. The other PRCM clock controller gained default bus clock rate settings. These were not configured in the upstream U-boot bootloader, leading to them running at slower rates. The assigned rates are from the user manual. There is also a new board, the NetCube Systems Nagami SoM and two of its carrier boards. The A523 family development boards now have their internal RTC clocks configured correctly, so that the RTC does not drift wildly. The missing functions for the AXP717 on these boards are added. Missing reset GPIOs and delays for Ethernet PHYs are added. Last, the Cubie A5E now has its LEDs described and usable. An overlay for the Orange Pi Zero interface (addon) board was added. This can be used with the Orange Pi Zero and Zero Plus 2. Default audio routing for these two boards (to be used with the addon) were added to complement the overlay. * tag 'sunxi-dt-for-6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: arm64: dts: allwinner: sun55i: Complete AXP717A sub-functions arm64: dts: allwinner: t527: orangepi-4a: hook up external 32k crystal arm64: dts: allwinner: t527: avaota-a1: hook up external 32k crystal arm64: dts: allwinner: a527: cubie-a5e: Drop external 32.768 KHz crystal arm64: dts: sun55i: a523: Assign standard clock rates to PRCM bus clocks ARM: dts: sunxi: add support for NetCube Systems Nagami Keypad Carrier ARM: dts: sunxi: add support for NetCube Systems Nagami Basic Carrier ARM: dts: sunxi: add support for NetCube Systems Nagami SoM riscv: dts: allwinner: d1s-t113: Add pinctrl's required by NetCube Systems Nagami SoM dt-bindings: arm: sunxi: Add NetCube Systems Nagami SoM and carrier board bindings ARM: dts: allwinner: Add Orange Pi Zero Interface Board overlay ARM: dts: allwinner: orangepi-zero-plus2: Add default audio routing ARM: dts: allwinner: orangepi-zero: Add default audio routing arm64: dts: allwinner: a523: Add NPU device node arm64: dts: allwinner: a523: Add MCU PRCM CCU node dt-bindings: clock: sun55i-a523-ccu: Add A523 MCU CCU clock controller dt-bindings: clock: sun55i-a523-ccu: Add missing NPU module clock arm64: dts: allwinner: t527: avaota-a1: Add ethernet PHY reset setting arm64: dts: allwinner: a527: cubie-a5e: Add ethernet PHY reset setting arm64: dts: allwinner: a527: cubie-a5e: Add LEDs Link: https://lore.kernel.org/r/aMrtuZg8HlR--TAt@wens.tw Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-09-23 22:17:56 +02:00
Arnd Bergmann	83ae575d6f	Merge tag 'v6.17-next-dts64.2' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into soc/dt mt8188: - change efuse compatible fallback to make GPU DVFS work - enable SCP core for video decoding and encoding mt8186: - add correct touchscreen compatible for tentacruel and krabby Fixes of DT warnings for many different SoCs and boards. * tag 'v6.17-next-dts64.2' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux: (24 commits) arm64: dts: mediatek: mt8516-pumpkin: Fix machine compatible arm64: dts: mediatek: mt8395-kontron-i1200: Fix MT6360 regulator nodes arm64: dts: mediatek: mt8195-cherry: Add missing regulators to rt5682 arm64: dts: mediatek: mt8195-cherry: Move VBAT-supply to Tomato R1/R2 arm64: dts: mediatek: mt8195: Fix ranges for jpeg enc/decoder nodes arm64: dts: mediatek: mt8183-kukui: Move DSI panel node to machine dtsis arm64: dts: mediatek: mt8183: Migrate to display controller OF graph arm64: dts: mediatek: mt8183-pumpkin: Add power supply for CCI arm64: dts: mediatek: pumpkin-common: Fix pinctrl node names arm64: dts: mediatek: mt8183: Fix pinctrl node names arm64: dts: mediatek: acelink-ew-7886cax: Remove unnecessary cells in spi-nand arm64: dts: mediatek: mt7986a-bpi-r3: Set interrupt-parent to mdio switch arm64: dts: mediatek: mt7986a-bpi-r3: Fix SFP I2C node names arm64: dts: mediatek: mt7986a: Fix PCI-Express T-PHY node address arm64: dts: mediatek: Fix node name for SYSIRQ controller on all SoCs arm64: dts: mediatek: mt6795-sony-xperia-m5: Add pinctrl for mmc1/mmc2 arm64: dts: mediatek: mt6795-xperia-m5: Fix mmc0 latch-ck value arm64: dts: mediatek: mt6795: Add mediatek,infracfg to iommu node arm64: dts: mediatek: mt6797: Remove bogus id property in i2c nodes arm64: dts: mediatek: mt6797: Fix pinctrl node names ... Link: https://lore.kernel.org/r/c0e2e902-2a10-44a7-9592-491ba7382df0@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-09-23 22:15:57 +02:00
Arnd Bergmann	0651061855	Merge tag 'riscv-sophgo-dt-for-v6.18' of https://github.com/sophgo/linux into soc/dt RISC-V Devicetrees for v6.18 Sophgo: Minor changes here only for SG2042. Enable numa and we can see significant performance improvements, for example in the STREAM test. Signed-off-by: Chen Wang <unicorn_wang@outlook.com> * tag 'riscv-sophgo-dt-for-v6.18' of https://github.com/sophgo/linux: dts: sophgo: sg2042: added numa id description Link: https://lore.kernel.org/r/MAUPR01MB11072ABA02A18CC7AA9B88874FE17A@MAUPR01MB11072.INDPRD01.PROD.OUTLOOK.COM Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-09-23 22:15:16 +02:00
Arnd Bergmann	57cff2159b	Merge tag 'ti-k3-dt-for-v6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux into soc/dt TI K3 device tree updates for v6.18 Generic fixes and cleanups: * k3-pinctrl: Fix incorrect macro usage, add missing DeepSleep/drive strength macros * k3: Rename rproc reserved-mem nodes to 'memory@addr' and add labels for reserved-memory * Long time pending major remoteproc firmware refactoring to allow flexibility for downstream variants: - am62x/am62ax: Move Mailbox/Remoteproc nodes to board-level DTS files - am64/am65/j721e/j721s2/j784s4/j742s2/j7200: Move Remoteproc enablement to board-level DTS - am62a/am62/am62p/j722s: Similarly restructure Mailbox/Remoteproc configs - am65/am64: Refactor IPC firmware carveouts/mailboxes into new SoC family-specific dtsi files - j721e/j721s2/j784s4/j742s2/am62/am62p/am62a/am64/am65/j7200/j722s: Refactor IPC firmware configs into new board-independent dtsi files - Various boards: Add missing or corrected carveouts/timers/mailbox configs for IPC firmware alignment * Multiple-boards: Bootph-all property added for USB PHYs to support DFU boot. New Boards/SoM/SiP: * Variscite VAR-SOM-AM62P SoM and carrier boards * AM6254atl SiP package and SK SoC specific changes: AM62P: * Update eMMC HS400 STRB tuning value * Split HS400 support away from J722S due to errata * Add Variscite VAR-SOM-AM62P SoM and Symphony carrier board support AM62: * Remove unused DeepSleep USB1 pin config on SK * Add CSI2 interrupts property on main CSI2RX * Enable Mailbox & Remoteproc at board level * PocketBeagle2 + Verdin variants: Add missing IPC firmware carveouts, enable R5F/M4F AM62A: * Fix padcfg length in pad configuration registers * Remove unused DeepSleep USB1 pin config on SK * Add CSI2 interrupts property * Add 1.4GHz OPP entry for phyCORE-AM62Ax * Enable Mailbox & Remoteproc at board level * Add missing IPC firmware carveouts for PocketBeagle2 and other boards AM62D2: * Add Octal SPI NOR flash (OSPI) support for EVM * Enable USB0/USB1 interface on EVM AM625: * Introduce AM6254atl SiP base SoC support * Add SK-AM6254atl board AM64: * Refactor IPC firmware configs into new dtsi * Enable Remoteproc at board level * Add PA stats property for PEB-C-010 expansion Ethernet card * phyCORE SoM + SR SoM/Electra board: Add missing IPC firmware configs AM65: * Refactor IPC firmware configs into new dtsi * Enable Remoteproc at board level AM69: * Switch SERDES0 config to PCIe Multilink + USB mode, enabling independent PCIe1 & PCIe3 link speeds J7200: * Refactor IPC firmware configs into new dtsi * Enable R5F Remoteproc at board level J721E: * Add DSI + DPHY-TX nodes * Add CSI2 interrupts property * BeagleBone AI64: Switch R5 clusters to split mode, add timer reserves for IPC FW, Correct carveouts (revert mistaken reordering of C6x carveouts) * Refactor IPC firmware configs into new dtsi * Enable Remoteproc at board level J721S2: * Add DSI + DSI PHY nodes * Add USB0 Type-A overlay for EVM * Add CSI2 interrupts property * Ensure PCIe node has proper interrupt-controller #address-cells fixes dtbs_check warning. * Refactor IPC firmware configs into new dtsi * Enable Remoteproc at board level * Common processor board: Add DisplayPort-1 enable, I2C4 instance for display connector J722S: * Add bootph-all to usb0_phy_ctrl node (DFU) * Add JPEG Encoder node (E5010) * Add CSI2 interrupts properties on main/J722S/AM62P common main * Refactor IPC firmware configs into new dtsi * Enable Remoteproc at board level J784S4/J742S2: * Add CSI2 interrupts properties on main-common * Add DSI & PHY support * Enable DisplayPort-1 on EVM * Refactor IPC firmware configs into new dtsi (common & SoC-specific) * Enable Remoteproc at board level * J742S2: Override MCU R5 firmware names in dedicated dtsi Board specific changes: AM62P Variscite Symphony Board: * Add support with USB, Eth, Camera, CAN, GPIO expander AM642-phyBOARD-Electra * Add PEB-C-010 Ethernet expansion board overlay * Add PA stats handle AM642-sr/phyCORE * Add missing IPC carveouts for R5F/M4F AM62-Verdin/AM62P-Verdin * Add missing IPC carveouts for R5F/M4F, mailboxes * tag 'ti-k3-dt-for-v6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux: (78 commits) arm64: dts: ti: k3-j721s2-evm: Add overlay to enable USB0 Type-A arm64: dts: ti: k3-am642-phyboard-electra: Add PEB-C-010 Overlay arm64: dts: ti: var-som-am62p: Add support for Variscite Symphony Board arm64: dts: ti: Add support for Variscite VAR-SOM-AM62P dt-bindings: arm: ti: Add bindings for Variscite VAR-SOM-AM62P arm64: dts: ti: k3-j722s-evm: Add bootph-all tag to usb0_phy_ctrl node arm64: dts: ti: k3-am62x-sk-common: Add bootph-all tag to usb0_phy_ctrl node arm64: dts: ti: k3-am62p5-sk: Add bootph-all tag to usb0_phy_ctrl node arm64: dts: ti: k3-am62a7-sk: Add bootph-all tag to usb0_phy_ctrl node arm64: dts: ti: k3-j721e-main: Add DSI and DPHY-TX arm64: dts: ti: k3-pinctrl: Fix the bug in existing macros arm64: dts: ti: k3-pinctrl: Add the remaining macros arm64: dts: ti: k3-am62x-sk-common: Remove the unused cfg in USB1_DRVVBUS arm64: dts: ti: k3-am62p5-sk: Remove the unused cfg in USB1_DRVVBUS arm64: dts: ti: k3-am62d2-evm: Add support for OSPI flash arm64: dts: ti: k3-am62d2-evm: Enable USB support arm64: dts: ti: k3-am62a-main: Fix main padcfg length arm64: dts: ti: k3-am62p: Update eMMC HS400 STRB value arm64: dts: ti: k3-am62p/j722s: Remove HS400 support from common arm64: dts: ti: Add support for AM6254atl SiP SK ... Link: https://lore.kernel.org/r/20250916175349.pxg6gxd4vg5vfmhx@overvalue Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-09-23 22:11:51 +02:00
Chenghao Duan	d0bf7cd5df	riscv: bpf: Fix uninitialized symbol 'retval_off' In the __arch_prepare_bpf_trampoline() function, retval_off is only meaningful when save_ret is true, so the current logic is correct. However, in the original logic, retval_off is only initialized under certain conditions; for example, in the fmod_ret logic, the compiler is not aware that the flags of the fmod_ret program (prog) have set BPF_TRAMP_F_CALL_ORIG, which results in an uninitialized symbol compilation warning. So initialize retval_off unconditionally to fix it. Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20250922062244.822937-2-duanchenghao@kylinos.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:29:03 -07:00
Puranjay Mohan	eab2a71f3a	bpf, arm64: Add support for signed arena loads Add support for signed loads from arena which are internally converted to loads with mode set BPF_PROBE_MEM32SX by the verifier. The implementation is similar to BPF_PROBE_MEMSX and BPF_MEMSX but for BPF_PROBE_MEM32SX, arena_vm_base is added to the src register to form the address. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20250923110157.18326-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:00:22 -07:00
Kumar Kartikeya Dwivedi	a91ae3c893	bpf, x86: Add support for signed arena loads Currently, signed load instructions into arena memory are unsupported. The compiler is free to generate these, and on GCC-14 we see a corresponding error when it happens. The hurdle in supporting them is deciding which unused opcode to use to mark them for the JIT's own consumption. After much thinking, it appears 0xc0 / BPF_NOSPEC can be combined with load instructions to identify signed arena loads. Use this to recognize and JIT them appropriately, and remove the verifier side limitation on the program if the JIT supports them. Co-developed-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20250923110157.18326-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:00:22 -07:00
Mathias Krause	d292035fb5	KVM: VMX: Make CR4.CET a guest owned bit Make CR4.CET a guest-owned bit under VMX by extending KVM_POSSIBLE_CR4_GUEST_BITS accordingly. There's no need to intercept changes to CR4.CET, as it's neither included in KVM's MMU role bits, nor does KVM specifically care about the actual value of a (nested) guest's CR4.CET value, beside for enforcing architectural constraints, i.e. make sure that CR0.WP=1 if CR4.CET=1. Intercepting writes to CR4.CET is particularly bad for grsecurity kernels with KERNEXEC or, even worse, KERNSEAL enabled. These features heavily make use of read-only kernel objects and use a cpu-local CR0.WP toggle to override it, when needed. Under a CET-enabled kernel, this also requires toggling CR4.CET, hence the motivation to make it guest-owned. Using the old test from [1] gives the following runtime numbers (perf stat -r 5 ssdd 10 50000): * grsec guest on linux-6.16-rc5 + cet patches: 2.4647 +- 0.0706 seconds time elapsed ( +- 2.86% ) * grsec guest on linux-6.16-rc5 + cet patches + CR4.CET guest-owned: 1.5648 +- 0.0240 seconds time elapsed ( +- 1.53% ) Not only does not intercepting CR4.CET make the test run ~35% faster, it's also more stable with less fluctuation due to fewer VMEXITs. Therefore, make CR4.CET a guest-owned bit where possible. This change is VMX-specific, as SVM has no such fine-grained control register intercept control. If KVM's assumptions regarding MMU role handling wrt. a guest's CR4.CET value ever change, the BUILD_BUG_ON()s related to KVM_MMU_CR4_ROLE_BITS and KVM_POSSIBLE_CR4_GUEST_BITS will catch that early. Link: https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/ [1] Reviewed-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-52-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 10:03:09 -07:00
Sean Christopherson	fddd07626b	KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors Add {HV,CP,SX}_VECTOR definitions for AMD's Hypervisor Injection Exception, VMM Communication Exception, and SVM Security Exception vectors, along with human friendly formatting for trace_kvm_inj_exception(). Note, KVM is all but guaranteed to never observe or inject #SX, and #HV is also unlikely to go unused. Add the architectural collateral mostly for completeness, and on the off chance that hardware goes off the rails. Link: https://lore.kernel.org/r/20250919223258.1604852-44-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:29:03 -07:00
Sean Christopherson	f2f5519aa4	KVM: x86: Define Control Protection Exception (#CP) vector Add a CP_VECTOR definition for CET's Control Protection Exception (#CP), along with human friendly formatting for trace_kvm_inj_exception(). Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-43-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:28:56 -07:00
Sean Christopherson	d37cc4819a	KVM: x86: Add human friendly formatting for #XM, and #VE Add XM_VECTOR and VE_VECTOR pretty-printing for trace_kvm_inj_exception(). Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-42-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:28:45 -07:00
John Allen	8db428fd52	KVM: SVM: Enable shadow stack virtualization for SVM Remove the explicit clearing of shadow stack CPU capabilities. Reviewed-by: Chao Gao <chao.gao@intel.com> Signed-off-by: John Allen <john.allen@amd.com> Link: https://lore.kernel.org/r/20250919223258.1604852-41-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:28:37 -07:00
Sean Christopherson	b5fa221f7b	KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid Synchronize XSS from the GHCB to KVM's internal tracking if the guest marks XSS as valid on a #VMGEXIT. Like XCR0, KVM needs an up-to-date copy of XSS in order to compute the required XSTATE size when emulating CPUID.0xD.0x1 for the guest. Treat the incoming XSS change as an emulated write, i.e. validatate the guest-provided value, to avoid letting the guest load garbage into KVM's tracking. Simply ignore bad values, as either the guest managed to get an unsupported value into hardware, or the guest is misbehaving and providing pure garbage. In either case, KVM can't fix the broken guest. Explicitly allow access to XSS at all times, as KVM needs to ensure its copy of XSS stays up-to-date. E.g. KVM supports migration of SEV-ES guests and so needs to allow the host to save/restore XSS, otherwise a guest that knows its XSS hasn't change could get stale/bad CPUID emulation if the guest doesn't provide XSS in the GHCB on every exit. This creates a hypothetical problem where a guest could request emulation of RDMSR or WRMSR on XSS, but arguably that's not even a problem, e.g. it would be entirely reasonable for a guest to request "emulation" as a way to inform the hypervisor that its XSS value has been modified. Note, emulating the change as an MSR write also takes care of side effects, e.g. marking dynamic CPUID bits as dirty. Suggested-by: John Allen <john.allen@amd.com> base-commit: 14298d819d5a6b7180a4089e7d2121ca3551dc6c Link: https://lore.kernel.org/r/20250919223258.1604852-40-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:28:31 -07:00
John Allen	38c46bdbf9	KVM: SVM: Pass through shadow stack MSRs as appropriate Pass through XSAVE managed CET MSRs on SVM when KVM supports shadow stack. These cannot be intercepted without also intercepting XSAVE which would likely cause unacceptable performance overhead. MSR_IA32_INT_SSP_TAB is not managed by XSAVE, so it is intercepted. Reviewed-by: Chao Gao <chao.gao@intel.com> Signed-off-by: John Allen <john.allen@amd.com> Link: https://lore.kernel.org/r/20250919223258.1604852-39-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:28:27 -07:00
John Allen	c7586aa3be	KVM: SVM: Update dump_vmcb with shadow stack save area additions Add shadow stack VMCB fields to dump_vmcb. PL0_SSP, PL1_SSP, PL2_SSP, PL3_SSP, and U_CET are part of the SEV-ES save area and are encrypted, but can be decrypted and dumped if the guest policy allows debugging. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: John Allen <john.allen@amd.com> Link: https://lore.kernel.org/r/20250919223258.1604852-38-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:28:23 -07:00
Sean Christopherson	c5ba494585	KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02 Transfer the three CET Shadow Stack VMCB fields (S_CET, ISST_ADDR, and SSP) on VMRUN, #VMEXIT, and loading nested state (saving nested state simply copies the entire save area). SVM doesn't provide a way to disallow L1 from enabling Shadow Stacks for L2, i.e. KVM must provide nested support before advertising SHSTK to userspace. Link: https://lore.kernel.org/r/20250919223258.1604852-37-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:27:06 -07:00
John Allen	48b2ec0d54	KVM: SVM: Emulate reads and writes to shadow stack MSRs Emulate shadow stack MSR access by reading and writing to the corresponding fields in the VMCB. Signed-off-by: John Allen <john.allen@amd.com> [sean: mark VMCB_CET dirty/clean as appropriate] Link: https://lore.kernel.org/r/20250919223258.1604852-36-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:26:51 -07:00
Chao Gao	42ae644853	KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state Advertise the LOAD_CET_STATE VM-Entry/Exit control bits in the nested VMX MSRS, as all nested support for CET virtualization, including consistency checks, is in place. Advertise support if and only if KVM supports at least one of IBT or SHSTK. While it's userspace's responsibility to provide a consistent CPU model to the guest, that doesn't mean KVM should set userspace up to fail. Note, the existing {CLEAR,LOAD}_BNDCFGS behavior predates KVM_X86_QUIRK_STUFF_FEATURE_MSRS, i.e. KVM "solved" the inconsistent CPU model problem by overwriting the VMX MSRs provided by userspace. Signed-off-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-35-seanjc@google.com Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:26:30 -07:00
Chao Gao	62f7533a6b	KVM: nVMX: Add consistency checks for CET states Introduce consistency checks for CET states during nested VM-entry. A VMCS contains both guest and host CET states, each comprising the IA32_S_CET MSR, SSP, and IA32_INTERRUPT_SSP_TABLE_ADDR MSR. Various checks are applied to CET states during VM-entry as documented in SDM Vol3 Chapter "VM ENTRIES". Implement all these checks during nested VM-entry to emulate the architectural behavior. In summary, there are three kinds of checks on guest/host CET states during VM-entry: A. Checks applied to both guest states and host states: * The IA32_S_CET field must not set any reserved bits; bits 10 (SUPPRESS) and 11 (TRACKER) cannot both be set. * SSP should not have bits 1:0 set. * The IA32_INTERRUPT_SSP_TABLE_ADDR field must be canonical. B. Checks applied to host states only * IA32_S_CET MSR and SSP must be canonical if the CPU enters 64-bit mode after VM-exit. Otherwise, IA32_S_CET and SSP must have their higher 32 bits cleared. C. Checks applied to guest states only: * IA32_S_CET MSR and SSP are not required to be canonical (i.e., 63:N-1 are identical, where N is the CPU's maximum linear-address width). But, bits 63:N of SSP must be identical. Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-34-seanjc@google.com [sean: have common helper return 0/-EINVAL, not true/false] Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:25:02 -07:00
Chao Gao	8060b2bd2d	KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Add consistency checks for CR4.CET and CR0.WP in guest-state or host-state area in the VMCS12. This ensures that configurations with CR4.CET set and CR0.WP not set result in VM-entry failure, aligning with architectural behavior. Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-33-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:24:35 -07:00
Yang Weijiang	625884996b	KVM: nVMX: Prepare for enabling CET support for nested guest Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed CR4 setting to enable CET for nested VM. vmcs12 and vmcs02 needs to be synced when L2 exits to L1 or when L1 wants to resume L2, that way correct CET states can be observed by one another. Please note that consistency checks regarding CET state during VM-Entry will be added later to prevent this patch from becoming too large. Advertising the new CET VM_ENTRY/EXIT control bits are also be deferred until after the consistency checks are added. Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Xin Li (Intel) <xin@zytor.com> Tested-by: Xin Li (Intel) <xin@zytor.com> Link: https://lore.kernel.org/r/20250919223258.1604852-32-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:24:30 -07:00
Yang Weijiang	033cc166f0	KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Per SDM description(Vol.3D, Appendix A.1): "If bit 56 is read as 1, software can use VM entry to deliver a hardware exception with or without an error code, regardless of vector" Modify has_error_code check before inject events to nested guest. Only enforce the check when guest is in real mode, the exception is not hard exception and the platform doesn't enumerate bit56 in VMX_BASIC, in all other case ignore the check to make the logic consistent with SDM. Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-31-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:24:11 -07:00
Sean Christopherson	f7336d47be	KVM: VMX: Configure nested capabilities after CPU capabilities Swap the order between configuring nested VMX capabilities and base CPU capabilities, so that nested VMX support can be conditioned on core KVM support, e.g. to allow conditioning support for LOAD_CET_STATE on the presence of IBT or SHSTK. Because the sanity checks on nested VMX config performed by vmx_check_processor_compat() run _after_ vmx_hardware_setup(), any use of kvm_cpu_cap_has() when configuring nested VMX support will lead to failures in vmx_check_processor_compat(). While swapping the order of two (or more) configuration flows can lead to a game of whack-a-mole, in this case nested support inarguably should be done after base support. KVM should never condition base support on nested support, because nested support is fully optional, while obviously it's desirable to condition nested support on base support. And there's zero evidence the current ordering was intentional, e.g. commit `66a6950f99` ("KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking") likely placed the call to kvm_set_cpu_caps() after nested setup because it looked pretty. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-30-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:23:10 -07:00
Yang Weijiang	e140467bbd	KVM: x86: Enable CET virtualization for VMX and advertise to userspace Add support for the LOAD_CET_STATE VM-Enter and VM-Exit controls, the CET XFEATURE bits in XSS, and advertise support for IBT and SHSTK to userspace. Explicitly clear IBT and SHSTK onn SVM, as additional work is needed to enable CET on SVM, e.g. to context switch S_CET and other state. Disable KVM CET feature if unrestricted_guest is unsupported/disabled as KVM does not support emulating CET, as running without Unrestricted Guest can result in KVM emulating large swaths of guest code. While it's highly unlikely any guest will trigger emulation while also utilizing IBT or SHSTK, there's zero reason to allow CET without Unrestricted Guest as that combination should only be possible when explicitly disabling unrestricted_guest for testing purposes. Disable CET if VMX_BASIC[bit56] == 0, i.e. if hardware strictly enforces the presence of an Error Code based on exception vector, as attempting to inject a #CP with an Error Code (#CP architecturally has an Error Code) will fail due to the #CP vector historically not having an Error Code. Clear S_CET and SSP-related VMCS on "reset" to emulate the architectural of CET MSRs and SSP being reset to 0 after RESET, power-up and INIT. Note, KVM already clears guest CET state that is managed via XSTATE in kvm_xstate_reset(). Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> [sean: move some bits to separate patches, massage changelog] Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-29-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:22:32 -07:00
Sean Christopherson	343acdd158	KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true Make IBT and SHSTK virtualization mutually exclusive with "officially" supporting setups with guest.MAXPHYADDR < host.MAXPHYADDR, i.e. if the allow_smaller_maxphyaddr module param is set. Running a guest with a smaller MAXPHYADDR requires intercepting #PF, and can also trigger emulation of arbitrary instructions. Intercepting and reacting to #PFs doesn't play nice with SHSTK, as KVM's MMU hasn't been taught to handle Shadow Stack accesses, and emulating arbitrary instructions doesn't play nice with IBT or SHSTK, as KVM's emulator doesn't handle the various side effects, e.g. doesn't enforce end-branch markers or model Shadow Stack updates. Note, hiding IBT and SHSTK based solely on allow_smaller_maxphyaddr is overkill, as allow_smaller_maxphyaddr is only problematic if the guest is actually configured to have a smaller MAXPHYADDR. However, KVM's ABI doesn't provide a way to express that IBT and SHSTK may break if enabled in conjunction with guest.MAXPHYADDR < host.MAXPHYADDR. I.e. the alternative is to do nothing in KVM and instead update documentation and hope KVM users are thorough readers. Go with the conservative-but-correct approach; worst case scenario, this restriction can be dropped if there's a strong use case for enabling CET on hosts with allow_smaller_maxphyaddr. Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-28-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:21:34 -07:00
Sean Christopherson	f705de12a2	KVM: x86: Initialize allow_smaller_maxphyaddr earlier in setup Initialize allow_smaller_maxphyaddr during hardware setup as soon as KVM knows whether or not TDP will be utilized. To avoid having to teach KVM's emulator all about CET, KVM's upcoming CET virtualization support will be mutually exclusive with allow_smaller_maxphyaddr, i.e. will disable SHSTK and IBT if allow_smaller_maxphyaddr is enabled. In general, allow_smaller_maxphyaddr should be initialized as soon as possible since it's globally visible while its only input is whether or not EPT/NPT is enabled. I.e. there's effectively zero risk of setting allow_smaller_maxphyaddr too early, and substantial risk of setting it too late. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250922184743.1745778-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:21:29 -07:00
Sean Christopherson	1f6f68fcfe	KVM: x86: Disable support for Shadow Stacks if TDP is disabled Make TDP a hard requirement for Shadow Stacks, as there are no plans to add Shadow Stack support to the Shadow MMU. E.g. KVM hasn't been taught to understand the magic Writable=0,Dirty=1 combination that is required for Shadow Stack accesses, and so enabling Shadow Stacks when using shadow paging will put the guest into an infinite #PF loop (KVM thinks the shadow page tables have a valid mapping, hardware says otherwise). Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-27-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:19:29 -07:00
Yang Weijiang	69cc3e8865	KVM: x86: Add XSS support for CET_KERNEL and CET_USER Add CET_KERNEL and CET_USER to KVM's set of supported XSS bits when IBT or SHSTK is supported. Like CR4.CET, XFEATURE support for IBT and SHSTK are bundle together under the CET umbrella, and thus prone to virtualization holes if KVM or the guest supports only one of IBT or SHSTK, but hardware supports both. However, again like CR4.CET, such virtualization holes are benign from the host's perspective so long as KVM takes care to always honor the "or" logic. Require CET_KERNEL and CET_USER to come as a pair, and refuse to support IBT or SHSTK if one (or both) features is missing, as the (host) kernel expects them to come as a pair, i.e. may get confused and corrupt state if only one of CET_KERNEL or CET_USER is supported. Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> [sean: split to separate patch, write changelog, add XFEATURE_MASK_CET_ALL] Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-26-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:18:54 -07:00
Sean Christopherson	19e6e083f3	KVM: nVMX: Always forward XSAVES/XRSTORS exits from L2 to L1 Unconditionally forward XSAVES/XRSTORS VM-Exits from L2 to L1, as KVM doesn't utilize the XSS-bitmap (KVM relies on controlling the XSS value in hardware to prevent unauthorized access to XSAVES state). KVM always loads vmcs02 with vmcs12's bitmap, and so any exit _must_ be due to vmcs12's XSS-bitmap. Drop the comment about XSS never being non-zero in anticipation of enabling CET_KERNEL and CET_USER support. Opportunistically WARN if XSAVES is not enabled for L2, as the CPU is supposed to generate #UD before checking the XSS-bitmap. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-25-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:18:28 -07:00
Yang Weijiang	b3744c59eb	KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported Drop X86_CR4_CET from CR4_RESERVED_BITS and instead mark CET as reserved if and only if IBT and SHSTK are unsupported, i.e. allow CR4.CET to be set if IBT or SHSTK is supported. This creates a virtualization hole if the CPU supports both IBT and SHSTK, but the kernel or vCPU model only supports one of the features. However, it's entirely legal for a CPU to have only one of IBT or SHSTK, i.e. the hole is a flaw in the architecture, not in KVM. More importantly, so long as KVM is careful to initialize and context switch both IBT and SHSTK state (when supported in hardware) if either feature is exposed to the guest, a misbehaving guest can only harm itself. E.g. VMX initializes host CET VMCS fields based solely on hardware capabilities. Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> [sean: split to separate patch, write changelog] Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-24-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:17:48 -07:00
Sean Christopherson	843af0f2e4	KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints Add PK (Protection Keys), SS (Shadow Stacks), and SGX (Software Guard Extensions) to the set of #PF error flags handled via kvm_mmu_trace_pferr_flags. While KVM doesn't expect PK or SS #PFs in particular, pretty print their names instead of the raw hex value saves the user from having to go spelunking in the SDM to figure out what's going on. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-23-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:17:32 -07:00
Sean Christopherson	296599346c	KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack #PF Add PFERR_SS_MASK, a.k.a. Shadow Stack access, and WARN if KVM attempts to check permissions for a Shadow Stack access as KVM hasn't been taught to understand the magic Writable=0,Dirty=1 combination that is required for Shadow Stack accesses, and likely will never learn. There are no plans to support Shadow Stacks with the Shadow MMU, and the emulator rejects all instructions that affect Shadow Stacks, i.e. it should be impossible for KVM to observe a #PF due to a shadow stack access. Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-22-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:16:53 -07:00
Sean Christopherson	d4c03f6395	KVM: x86: Emulate SSP[63:32]!=0 #GP(0) for FAR JMP to 32-bit mode Emulate the Shadow Stack restriction that the current SSP must be a 32-bit value on a FAR JMP from 64-bit mode to compatibility mode. From the SDM's pseudocode for FAR JMP: IF ShadowStackEnabled(CPL) IF (IA32_EFER.LMA and DEST(segment selector).L) = 0 (* If target is legacy or compatibility mode then the SSP must be in low 4GB *) IF (SSP & 0xFFFFFFFF00000000 != 0); THEN #GP(0); FI; FI; FI; Note, only the current CPL needs to be considered, as FAR JMP can't be used for inter-privilege level transfers, and KVM rejects emulation of all other far branch instructions when Shadow Stacks are enabled. To give the emulator access to GUEST_SSP, special case handling MSR_KVM_INTERNAL_GUEST_SSP in emulator_get_msr() to treat the access as a host access (KVM doesn't allow guest accesses to internal "MSRs"). The ->get_msr() API is only used for implicit accesses from the emulator, i.e. is only used with hardcoded MSR indices, and so any access to MSR_KVM_INTERNAL_GUEST_SSP is guaranteed to be from KVM, i.e. not from the guest via RDMSR. Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-21-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:16:25 -07:00
Sean Christopherson	82c0ec0282	KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if the guest triggers task switch emulation with Indirect Branch Tracking or Shadow Stacks enabled, as attempting to do the right thing would require non-trivial effort and complexity, KVM doesn't support emulating CET generally, and it's extremely unlikely that any guest will do task switches while also utilizing CET. Defer taking on the complexity until someone cares enough to put in the time and effort to add support. Per the SDM: If shadow stack is enabled, then the SSP of the task is located at the 4 bytes at offset 104 in the 32-bit TSS and is used by the processor to establish the SSP when a task switch occurs from a task associated with this TSS. Note that the processor does not write the SSP of the task initiating the task switch to the TSS of that task, and instead the SSP of the previous task is pushed onto the shadow stack of the new task. Note, per the SDM's pseudocode on TASK SWITCHING, IBT state for the new privilege level is updated. To keep things simple, check both S_CET and U_CET (again, anyone that wants more precise checking can have the honor of implementing support). Reported-by: Binbin Wu <binbin.wu@linux.intel.com> Closes: https://lore.kernel.org/all/819bd98b-2a60-4107-8e13-41f1e4c706b1@linux.intel.com Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:15:49 -07:00
Sean Christopherson	57c3db7e2e	KVM: x86: Don't emulate instructions affected by CET features Don't emulate branch instructions, e.g. CALL/RET/JMP etc., that are affected by Shadow Stacks and/or Indirect Branch Tracking when said features are enabled in the guest, as fully emulating CET would require significant complexity for no practical benefit (KVM shouldn't need to emulate branch instructions on modern hosts). Simply doing nothing isn't an option as that would allow a malicious entity to subvert CET protections via the emulator. To detect instructions that are subject to IBT or affect IBT state, use the existing IsBranch flag along with the source operand type to detect indirect branches, and the existing NearBranch flag to detect far JMPs and CALLs, all of which are effectively indirect. Explicitly check for emulation of IRET, FAR RET (IMM), and SYSEXIT (the ret-like far branches) instead of adding another flag, e.g. IsRet, as it's unlikely the emulator will ever need to check for return-like instructions outside of this one specific flow. Use an allow-list instead of a deny-list because (a) it's a shorter list and (b) so that a missed entry gets a false positive, not a false negative (i.e. reject emulation instead of clobbering CET state). For Shadow Stacks, explicitly track instructions that directly affect the current SSP, as KVM's emulator doesn't have existing flags that can be used to precisely detect such instructions. Alternatively, the em_xxx() helpers could directly check for ShadowStack interactions, but using a dedicated flag is arguably easier to audit, and allows for handling both IBT and SHSTK in one fell swoop. Note! On far transfers, do NOT consult the current privilege level and instead treat SHSTK/IBT as being enabled if they're enabled for User or Supervisor mode. On inter-privilege level far transfers, SHSTK and IBT can be in play for the target privilege level, i.e. checking the current privilege could get a false negative, and KVM doesn't know the target privilege level until emulation gets under way. Note #2, FAR JMP from 64-bit mode to compatibility mode interacts with the current SSP, but only to ensure SSP[63:32] == 0. Don't tag FAR JMP as SHSTK, which would be rather confusing and would result in FAR JMP being rejected unnecessarily the vast majority of the time (ignoring that it's unlikely to ever be emulated). A future commit will add the #GP(0) check for the specific FAR JMP scenario. Note #3, task switches also modify SSP and so need to be rejected. That too will be addressed in a future commit. Suggested-by: Chao Gao <chao.gao@intel.com> Originally-by: Yang Weijiang <weijiang.yang@intel.com> Cc: Mathias Krause <minipli@grsecurity.net> Cc: John Allen <john.allen@amd.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:14:33 -07:00
Yang Weijiang	584ba3ffb9	KVM: VMX: Set host constant supervisor states to VMCS fields Save constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} field explicitly. Kernel IBT is supported and the setting in MSR_IA32_S_CET is static after post-boot(The exception is BIOS call case but vCPU thread never across it) and KVM doesn't need to refresh HOST_S_CET field before every VM-Enter/ VM-Exit sequence. Host supervisor shadow stack is not enabled now and SSP is not accessible to kernel mode, thus it's safe to set host IA32_INT_SSP_TAB/SSP VMCS field to 0s. When shadow stack is enabled for CPL3, SSP is reloaded from PL3_SSP before it exits to userspace. Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/ SYSRET/SYSENTER SYSEXIT/RDSSP/CALL etc. Prevent KVM module loading if host supervisor shadow stack SHSTK_EN is set in MSR_IA32_S_CET as KVM cannot co-exit with it correctly. Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> [sean: snapshot host S_CET if SHSTK or IBT is supported] Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:11:49 -07:00
Yang Weijiang	25f3840483	KVM: VMX: Set up interception for CET MSRs Disable interception for CET MSRs that can be accessed via XSAVES/XRSTORS, and exist accordingly to CPUID, as accesses through XSTATE aren't subject to MSR interception checks, i.e. can't be intercepted without intercepting and emulating XSAVES/XRSTORS, and KVM doesn't support emulating XSAVE/XRSTOR instructions. Don't condition interception on the guest actually having XSAVES as there is no benefit to intercepting the accesses (when the MSRs exist). The MSRs in question are either context switched by the CPU on VM-Enter/VM-Exit or by KVM via XSAVES/XRSTORS (KVM requires XSAVES to virtualization SHSTK), i.e. KVM is going to load guest values into hardware irrespective of guest XSAVES support. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Xin Li (Intel) <xin@zytor.com> Link: https://lore.kernel.org/r/20250919223258.1604852-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:11:26 -07:00
Yang Weijiang	1a61bd0d12	KVM: x86: Save and reload SSP to/from SMRAM Save CET SSP to SMRAM on SMI and reload it on RSM. KVM emulates HW arch behavior when guest enters/leaves SMM mode,i.e., save registers to SMRAM at the entry of SMM and reload them at the exit to SMM. Per SDM, SSP is one of such registers on 64-bit Arch, and add the support for SSP. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:11:22 -07:00
Yang Weijiang	8b59d0275c	KVM: VMX: Emulate read and write to CET MSRs Add emulation interface for CET MSR access. The emulation code is split into common part and vendor specific part. The former does common checks for MSRs, e.g., accessibility, data validity etc., then passes operation to either XSAVE-managed MSRs via the helpers or CET VMCS fields. SSP can only be read via RDSSP. Writing even requires destructive and potentially faulting operations such as SAVEPREVSSP/RSTORSSP or SETSSBSY/CLRSSBSY. Let the host use a pseudo-MSR that is just a wrapper for the GUEST_SSP field of the VMCS. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> [sean: drop call to kvm_set_xstate_msr() for S_CET, consolidate code] Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:10:47 -07:00
Yang Weijiang	9d6812d415	KVM: x86: Enable guest SSP read/write interface with new uAPIs Add a KVM-defined ONE_REG register, KVM_REG_GUEST_SSP, to let userspace save and restore the guest's Shadow Stack Pointer (SSP). On both Intel and AMD, SSP is a hardware register that can only be accessed by software via dedicated ISA (e.g. RDSSP) or via VMCS/VMCB fields (used by hardware to context switch SSP at entry/exit). As a result, SSP doesn't fit in any of KVM's existing interfaces for saving/restoring state. Internally, treat SSP as a fake/synthetic MSR, as the semantics of writes to SSP follow that of several other Shadow Stack MSRs, e.g. the PLx_SSP MSRs. Use a translation layer to hide the KVM-internal MSR index so that the arbitrary index doesn't become ABI, e.g. so that KVM can rework its implementation as needed, so long as the ONE_REG ABI is maintained. Explicitly reject accesses to SSP if the vCPU doesn't have Shadow Stack support to avoid running afoul of ignore_msrs, which unfortunately applies to host-initiated accesses (which is a discussion for another day). I.e. ensure consistent behavior for KVM-defined registers irrespective of ignore_msrs. Link: https://lore.kernel.org/all/aca9d389-f11e-4811-90cf-d98e345a5cc2@intel.com Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-14-seanjc@google.com Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:10:33 -07:00
Yang Weijiang	d6c387fc39	KVM: VMX: Introduce CET VMCS fields and control bits Control-flow Enforcement Technology (CET) is a kind of CPU feature used to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks. It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP style control-flow subversion attacks. Shadow Stack (SHSTK): A shadow stack is a second stack used exclusively for control transfer operations. The shadow stack is separate from the data/normal stack and can be enabled individually in user and kernel mode. When shadow stack is enabled, CALL pushes the return address on both the data and shadow stack. RET pops the return address from both stacks and compares them. If the return addresses from the two stacks do not match, the processor generates a #CP. Indirect Branch Tracking (IBT): IBT introduces instruction(ENDBRANCH)to mark valid target addresses of indirect branches (CALL, JMP etc...). If an indirect branch is executed and the next instruction is _not_ an ENDBRANCH, the processor generates a #CP. These instruction behaves as a NOP on platforms that have no CET. Several new CET MSRs are defined to support CET: MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} CET respectively. MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}. MSR_IA32_INT_SSP_TAB: Linear address of SHSTK pointer table, whose entry is indexed by IST of interrupt gate desc. Two XSAVES state bits are introduced for CET: IA32_XSS:[bit 11]: Control saving/restoring user mode CET states IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states. Six VMCS fields are introduced for CET: {HOST,GUEST}_S_CET: Stores CET settings for kernel mode. {HOST,GUEST}_SSP: Stores current active SSP. {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB. On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY control fields: If VM_EXIT_LOAD_CET_STATE = 1, host CET states are loaded from following VMCS fields at VM-Exit: HOST_S_CET HOST_SSP HOST_INTR_SSP_TABLE If VM_ENTRY_LOAD_CET_STATE = 1, guest CET states are loaded from following VMCS fields at VM-Entry: GUEST_S_CET GUEST_SSP GUEST_INTR_SSP_TABLE Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:00:49 -07:00

... 7 8 9 10 11 ...

238443 Commits