Commit Graph

238443 Commits

Author SHA1 Message Date
Arnd Bergmann
8de446fd49 Merge tag 'apple-soc-dt-6.18-part2' of https://git.kernel.org/pub/scm/linux/kernel/git/sven/linux into soc/dt
Apple SoC DTS updates for 6.18, part 2

- New device trees for all M2 Pro, Max and Ultra models are added.
  This is responsible for most of the changed lines since we already
  need 2000+ lines just to describe all the power domains inside
  t602x-pmgr.dtsi for these SoCs.
- Missing WiFi properties for t600x are added.
- Bluetooth nodes are added for all t600x machines.
- The PCIe ethernet iommu-map was fixed for the Apple M1 iMac
  to account for a disabled PCIe port.
- SPMI, NVMe, SART and mailbox nodes for Apple's T2 and A11.

* tag 'apple-soc-dt-6.18-part2' of https://git.kernel.org/pub/scm/linux/kernel/git/sven/linux:
  arm64: dts: apple: t8015: Add SPMI node
  arm64: dts: apple: t8012: Add SPMI node
  arm64: dts: apple: Add J180d (Mac Pro, M2 Ultra, 2023) device tree
  arm64: dts: apple: Add J474s, J475c and J475d device trees
  arm64: dts: apple: Add J414 and J416 Macbook Pro device trees
  arm64: dts: apple: Add initial t6020/t6021/t6022 DTs
  arm64: dts: apple: Add ethernet0 alias for J375 template
  dt-bindings: arm: apple: Add t6020x compatibles
  arm64: dts: apple: t8015: Add NVMe nodes
  arm64: dts: apple: t8015: Fix PCIE power domains dependencies
  arm64: dts: apple: Add devicetreee for t8112-j415
  dt-bindings: arm: apple: Add t8112 j415 compatible
  arm64: dts: apple: t600x: Add bluetooth device nodes
  arm64: dts: apple: t600x: Add missing WiFi properties
  arm64: dts: apple: t8103-j457: Fix PCIe ethernet iommu-map

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-23 22:44:22 +02:00
Arnd Bergmann
26116b98d6 Merge tag 'omap-for-v6.18/dt-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap into soc/dt
ARM: dts: ti: omap updates for v6.18

These are all minor corrections to the dts files.

* tag 'omap-for-v6.18/dt-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap:
  ARM: dts: omap: am335x-cm-t335: Remove unused mcasp num-serializer property
  ARM: dts: ti: omap: omap3-devkit8000-lcd: Fix ti,keep-vref-on property to use correct boolean syntax in DTS
  ARM: dts: ti: omap: am335x-baltos: Fix ti,en-ck32k-xtal property in DTS to use correct boolean syntax
  ARM: dts: omap: Minor whitespace cleanup
  ARM: dts: omap: dm816x: Split 'reg' per entry
  ARM: dts: omap: dm814x: Split 'reg' per entry
  ARM: dts: am33xx-l4: fix UART compatible
  ARM: dts: ti: omap4: Use generic "ethernet" as node name

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-23 22:41:53 +02:00
Rob Herring (Arm)
345518c00b arm64: dts: apm-shadowcat: Drop "apm,xgene2-pcie" compatible
The "apm,xgene2-pcie" compatible is unused, undocumented, and in the
wrong position in the compatible list. Given this is a mature and little
used platform, just remove the compatible rather than fix the order and
document it.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20250919161529.1293151-1-robh@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-23 22:39:18 +02:00
Rob Herring (Arm)
676af08386 arm64: dts: apm-shadowcat: Move slimpro nodes out of "simple-bus" node
The slimpro nodes are not MMIO devices, so they don't belong under a
"simple-bus" node. Move them to the top level.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20250919161509.1292227-1-robh@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-23 22:39:02 +02:00
Arnd Bergmann
6866b78566 Merge tag 'v6.17-rockchip-dtsfixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes
Another missing supply and a wrong headphone gpio level.

* tag 'v6.17-rockchip-dtsfixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: dts: rockchip: Fix the headphone detection on the orangepi 5
  arm64: dts: rockchip: Add vcc supply for SPI Flash on NanoPC-T6
2025-09-23 22:32:48 +02:00
Arnd Bergmann
5eba504bb2 Merge tag 'sunxi-fixes-for-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes
Allwinner fixes for 6.17

Two device tree style cleanups from the device tree maintainers.

* tag 'sunxi-fixes-for-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  riscv: dts: allwinner: rename devterm i2c-gpio node to comply with binding
  ARM: dts: allwinner: Minor whitespace cleanup

Link: https://lore.kernel.org/r/aMrsUfkTWx8g3bJ7@wens.tw
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-23 22:30:57 +02:00
Arnd Bergmann
abfbfb98ac Merge tag 'amlogic-arm64-dt-for-v6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux into soc/dt
Amlogic ARM64 DT for v6.18:
- Add cache information to the Amlogic SoCs
- Add RTC node for Amlogic C3 SoC
- Fix PWM node for Amlogic C3 SoC
- Remove UHS capability for Odroid-C2 SDCard

* tag 'amlogic-arm64-dt-for-v6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux:
  arm64: dts: amlogic: gxbb-odroidc2: remove UHS capability for SD card
  dts: arm: amlogic: fix pwm node for c3
  arm64: dts: amlogic: sm1-bananapi: lower SD card speed for stability
  arm64: dts: amlogic: Add cache information to the Amlogic T7 SoC
  arm64: dts: amlogic: Add cache information to the Amlogic S922X SoC
  arm64: dts: amlogic: Add cache information to the Amlogic S7 SoC
  arm64: dts: amlogic: Add cache information to the Amlogic C3 SoC
  arm64: dts: amlogic: Add cache information to the Amlogic A4 SoC
  arm64: dts: amlogic: Add cache information to the Amlogic A1 SoC
  arm64: dts: amlogic: Add cache information to the Amlogic GXM SoCS
  arm64: dts: amlogic: Add cache information to the Amlogic AXG SoCS
  arm64: dts: amlogic: Add cache information to the Amlogic G12A SoCS
  arm64: dts: amlogic: Add cache information to the Amlogic SM1 SoC
  arm64: dts: amlogic: Add cache information to the Amlogic GXBB and GXL SoC
  arm64: dts: amlogic: C3: Add RTC controller node

Link: https://lore.kernel.org/r/d40e7e96-4a7c-4e4f-b36f-750c6525b95c@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-23 22:28:44 +02:00
Arnd Bergmann
9f1bbcc46e Merge tag 'v6.17-rockchip-dtsfixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt
Another missing supply and a wrong headphone gpio level.

* tag 'v6.17-rockchip-dtsfixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: dts: rockchip: Fix the headphone detection on the orangepi 5
  arm64: dts: rockchip: Add vcc supply for SPI Flash on NanoPC-T6
  arm64: dts: rockchip: fix second M.2 slot on ROCK 5T
  arm64: dts: rockchip: fix USB on RADXA ROCK 5T
  arm64: dts: rockchip: Add vcc-supply to SPI flash on Pinephone Pro
  arm64: dts: rockchip: fix es8388 address on rk3588s-roc-pc
  arm64: dts: rockchip: Fix Bluetooth interrupts flag on Neardi LBA3368
  arm64: dts: rockchip: correct network description on Sige5
  arm64: dts: rockchip: Minor whitespace cleanup
  ARM: dts: rockchip: Minor whitespace cleanup
  arm64: dts: rockchip: Add supplies for eMMC on rk3588-orangepi-5
  arm64: dts: rockchip: Fix the headphone detection on the orangepi 5 plus
  arm64: dts: rockchip: Add vcc-supply to SPI flash on rk3399-pinebook-pro
  arm64: dts: rockchip: mark eeprom as read-only for Radxa E52C
2025-09-23 22:26:59 +02:00
Arnd Bergmann
ec1ede181e Merge tag 'spacemit-dt-for-6.18-1' of https://github.com/spacemit-com/linux into soc/dt
RISC-V SpacemiT DT changes for 6.18

- Add OrangePi RV2 board support
- Add reset support to UART driver
- Add PDMA driver support
- Remove sec_uart1 node

* tag 'spacemit-dt-for-6.18-1' of https://github.com/spacemit-com/linux:
  riscv: dts: spacemit: uart: remove sec_uart1 device node
  riscv: dts: spacemit: Enable PDMA on Banana Pi F3 and Milkv Jupiter
  riscv: dts: spacemit: Add PDMA node for K1 SoC
  riscv: dts: spacemit: add UART resets for Soc K1
  riscv: dts: spacemit: Add OrangePi RV2 board device tree
  dt-bindings: riscv: spacemit: Add OrangePi RV2 board

Link: https://lore.kernel.org/r/20250919055525-GYC5766558@gentoo.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-23 22:22:24 +02:00
Arnd Bergmann
17752efeca Merge tag 'sunxi-dt-for-6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/dt
Allwinner Device Tree changes for 6.18

This tag contains two DT binding header changes that are shared with
the clk tree.

In this cycle we gained support for the MCU PRCM clock and reset
controller on the A523/A527/T527 family of SoCs, the NPU which is a
Vivante GC9000 IP block, and the NPU clock that was missing. The other
PRCM clock controller gained default bus clock rate settings. These
were not configured in the upstream U-boot bootloader, leading to them
running at slower rates. The assigned rates are from the user manual.

There is also a new board, the NetCube Systems Nagami SoM and two of
its carrier boards.

The A523 family development boards now have their internal RTC clocks
configured correctly, so that the RTC does not drift wildly. The missing
functions for the AXP717 on these boards are added. Missing reset GPIOs
and delays for Ethernet PHYs are added. Last, the Cubie A5E now has its
LEDs described and usable.

An overlay for the Orange Pi Zero interface (addon) board was added.
This can be used with the Orange Pi Zero and Zero Plus 2. Default audio
routing for these two boards (to be used with the addon) were added to
complement the overlay.

* tag 'sunxi-dt-for-6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  arm64: dts: allwinner: sun55i: Complete AXP717A sub-functions
  arm64: dts: allwinner: t527: orangepi-4a: hook up external 32k crystal
  arm64: dts: allwinner: t527: avaota-a1: hook up external 32k crystal
  arm64: dts: allwinner: a527: cubie-a5e: Drop external 32.768 KHz crystal
  arm64: dts: sun55i: a523: Assign standard clock rates to PRCM bus clocks
  ARM: dts: sunxi: add support for NetCube Systems Nagami Keypad Carrier
  ARM: dts: sunxi: add support for NetCube Systems Nagami Basic Carrier
  ARM: dts: sunxi: add support for NetCube Systems Nagami SoM
  riscv: dts: allwinner: d1s-t113: Add pinctrl's required by NetCube Systems Nagami SoM
  dt-bindings: arm: sunxi: Add NetCube Systems Nagami SoM and carrier board bindings
  ARM: dts: allwinner: Add Orange Pi Zero Interface Board overlay
  ARM: dts: allwinner: orangepi-zero-plus2: Add default audio routing
  ARM: dts: allwinner: orangepi-zero: Add default audio routing
  arm64: dts: allwinner: a523: Add NPU device node
  arm64: dts: allwinner: a523: Add MCU PRCM CCU node
  dt-bindings: clock: sun55i-a523-ccu: Add A523 MCU CCU clock controller
  dt-bindings: clock: sun55i-a523-ccu: Add missing NPU module clock
  arm64: dts: allwinner: t527: avaota-a1: Add ethernet PHY reset setting
  arm64: dts: allwinner: a527: cubie-a5e: Add ethernet PHY reset setting
  arm64: dts: allwinner: a527: cubie-a5e: Add LEDs

Link: https://lore.kernel.org/r/aMrtuZg8HlR--TAt@wens.tw
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-23 22:17:56 +02:00
Arnd Bergmann
83ae575d6f Merge tag 'v6.17-next-dts64.2' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into soc/dt
mt8188:
- change efuse compatible fallback to make GPU DVFS work
- enable SCP core for video decoding and encoding

mt8186:
- add correct touchscreen compatible for tentacruel and krabby

Fixes of DT warnings for many different SoCs and boards.

* tag 'v6.17-next-dts64.2' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux: (24 commits)
  arm64: dts: mediatek: mt8516-pumpkin: Fix machine compatible
  arm64: dts: mediatek: mt8395-kontron-i1200: Fix MT6360 regulator nodes
  arm64: dts: mediatek: mt8195-cherry: Add missing regulators to rt5682
  arm64: dts: mediatek: mt8195-cherry: Move VBAT-supply to Tomato R1/R2
  arm64: dts: mediatek: mt8195: Fix ranges for jpeg enc/decoder nodes
  arm64: dts: mediatek: mt8183-kukui: Move DSI panel node to machine dtsis
  arm64: dts: mediatek: mt8183: Migrate to display controller OF graph
  arm64: dts: mediatek: mt8183-pumpkin: Add power supply for CCI
  arm64: dts: mediatek: pumpkin-common: Fix pinctrl node names
  arm64: dts: mediatek: mt8183: Fix pinctrl node names
  arm64: dts: mediatek: acelink-ew-7886cax: Remove unnecessary cells in spi-nand
  arm64: dts: mediatek: mt7986a-bpi-r3: Set interrupt-parent to mdio switch
  arm64: dts: mediatek: mt7986a-bpi-r3: Fix SFP I2C node names
  arm64: dts: mediatek: mt7986a: Fix PCI-Express T-PHY node address
  arm64: dts: mediatek: Fix node name for SYSIRQ controller on all SoCs
  arm64: dts: mediatek: mt6795-sony-xperia-m5: Add pinctrl for mmc1/mmc2
  arm64: dts: mediatek: mt6795-xperia-m5: Fix mmc0 latch-ck value
  arm64: dts: mediatek: mt6795: Add mediatek,infracfg to iommu node
  arm64: dts: mediatek: mt6797: Remove bogus id property in i2c nodes
  arm64: dts: mediatek: mt6797: Fix pinctrl node names
  ...

Link: https://lore.kernel.org/r/c0e2e902-2a10-44a7-9592-491ba7382df0@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-23 22:15:57 +02:00
Arnd Bergmann
0651061855 Merge tag 'riscv-sophgo-dt-for-v6.18' of https://github.com/sophgo/linux into soc/dt
RISC-V Devicetrees for v6.18

Sophgo:

Minor changes here only for SG2042. Enable numa
and we can see significant performance improvements,
for example in the STREAM test.

Signed-off-by: Chen Wang <unicorn_wang@outlook.com>

* tag 'riscv-sophgo-dt-for-v6.18' of https://github.com/sophgo/linux:
  dts: sophgo: sg2042: added numa id description

Link: https://lore.kernel.org/r/MAUPR01MB11072ABA02A18CC7AA9B88874FE17A@MAUPR01MB11072.INDPRD01.PROD.OUTLOOK.COM
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-23 22:15:16 +02:00
Arnd Bergmann
57cff2159b Merge tag 'ti-k3-dt-for-v6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux into soc/dt
TI K3 device tree updates for v6.18

Generic fixes and cleanups:
* k3-pinctrl: Fix incorrect macro usage, add missing DeepSleep/drive strength
  macros
* k3: Rename rproc reserved-mem nodes to 'memory@addr' and add labels for
  reserved-memory
* Long time pending major remoteproc firmware refactoring to allow flexibility
  for downstream variants:
  - am62x/am62ax: Move Mailbox/Remoteproc nodes to board-level DTS files
  - am64/am65/j721e/j721s2/j784s4/j742s2/j7200: Move Remoteproc enablement to
    board-level DTS
  - am62a/am62/am62p/j722s: Similarly restructure Mailbox/Remoteproc configs
  - am65/am64: Refactor IPC firmware carveouts/mailboxes into new SoC
    family-specific dtsi files
  - j721e/j721s2/j784s4/j742s2/am62/am62p/am62a/am64/am65/j7200/j722s: Refactor
    IPC firmware configs into new board-independent dtsi files
  - Various boards: Add missing or corrected carveouts/timers/mailbox configs
    for IPC firmware alignment
* Multiple-boards: Bootph-all property added for USB PHYs to support DFU boot.

New Boards/SoM/SiP:
* Variscite VAR-SOM-AM62P SoM and carrier boards
* AM6254atl SiP package and SK

SoC specific changes:
AM62P:
* Update eMMC HS400 STRB tuning value
* Split HS400 support away from J722S due to errata
* Add Variscite VAR-SOM-AM62P SoM and Symphony carrier board support

AM62:
* Remove unused DeepSleep USB1 pin config on SK
* Add CSI2 interrupts property on main CSI2RX
* Enable Mailbox & Remoteproc at board level
* PocketBeagle2 + Verdin variants: Add missing IPC firmware carveouts, enable
  R5F/M4F

AM62A:
* Fix padcfg length in pad configuration registers
* Remove unused DeepSleep USB1 pin config on SK
* Add CSI2 interrupts property
* Add 1.4GHz OPP entry for phyCORE-AM62Ax
* Enable Mailbox & Remoteproc at board level
* Add missing IPC firmware carveouts for PocketBeagle2 and other boards

AM62D2:
* Add Octal SPI NOR flash (OSPI) support for EVM
* Enable USB0/USB1 interface on EVM

AM625:
* Introduce AM6254atl SiP base SoC support
* Add SK-AM6254atl board

AM64:
* Refactor IPC firmware configs into new dtsi
* Enable Remoteproc at board level
* Add PA stats property for PEB-C-010 expansion Ethernet card
* phyCORE SoM + SR SoM/Electra board: Add missing IPC firmware configs

AM65:
* Refactor IPC firmware configs into new dtsi
* Enable Remoteproc at board level

AM69:
* Switch SERDES0 config to PCIe Multilink + USB mode, enabling independent
  PCIe1 & PCIe3 link speeds

J7200:
* Refactor IPC firmware configs into new dtsi
* Enable R5F Remoteproc at board level

J721E:
* Add DSI + DPHY-TX nodes
* Add CSI2 interrupts property
* BeagleBone AI64: Switch R5 clusters to split mode, add timer reserves for
  IPC FW, Correct carveouts (revert mistaken reordering of C6x carveouts)
* Refactor IPC firmware configs into new dtsi
* Enable Remoteproc at board level

J721S2:
* Add DSI + DSI PHY nodes
* Add USB0 Type-A overlay for EVM
* Add CSI2 interrupts property
* Ensure PCIe node has proper interrupt-controller #address-cells
  fixes dtbs_check warning.
* Refactor IPC firmware configs into new dtsi
* Enable Remoteproc at board level
* Common processor board: Add DisplayPort-1 enable, I2C4 instance for
  display connector

J722S:
* Add bootph-all to usb0_phy_ctrl node (DFU)
* Add JPEG Encoder node (E5010)
* Add CSI2 interrupts properties on main/J722S/AM62P common main
* Refactor IPC firmware configs into new dtsi
* Enable Remoteproc at board level

J784S4/J742S2:
* Add CSI2 interrupts properties on main-common
* Add DSI & PHY support
* Enable DisplayPort-1 on EVM
* Refactor IPC firmware configs into new dtsi (common & SoC-specific)
* Enable Remoteproc at board level
* J742S2: Override MCU R5 firmware names in dedicated dtsi

Board specific changes:
AM62P Variscite Symphony Board:
* Add support with USB, Eth, Camera, CAN, GPIO expander

AM642-phyBOARD-Electra
* Add PEB-C-010 Ethernet expansion board overlay
* Add PA stats handle

AM642-sr/phyCORE
* Add missing IPC carveouts for R5F/M4F

AM62-Verdin/AM62P-Verdin
* Add missing IPC carveouts for R5F/M4F, mailboxes

* tag 'ti-k3-dt-for-v6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux: (78 commits)
  arm64: dts: ti: k3-j721s2-evm: Add overlay to enable USB0 Type-A
  arm64: dts: ti: k3-am642-phyboard-electra: Add PEB-C-010 Overlay
  arm64: dts: ti: var-som-am62p: Add support for Variscite Symphony Board
  arm64: dts: ti: Add support for Variscite VAR-SOM-AM62P
  dt-bindings: arm: ti: Add bindings for Variscite VAR-SOM-AM62P
  arm64: dts: ti: k3-j722s-evm: Add bootph-all tag to usb0_phy_ctrl node
  arm64: dts: ti: k3-am62x-sk-common: Add bootph-all tag to usb0_phy_ctrl node
  arm64: dts: ti: k3-am62p5-sk: Add bootph-all tag to usb0_phy_ctrl node
  arm64: dts: ti: k3-am62a7-sk: Add bootph-all tag to usb0_phy_ctrl node
  arm64: dts: ti: k3-j721e-main: Add DSI and DPHY-TX
  arm64: dts: ti: k3-pinctrl: Fix the bug in existing macros
  arm64: dts: ti: k3-pinctrl: Add the remaining macros
  arm64: dts: ti: k3-am62x-sk-common: Remove the unused cfg in USB1_DRVVBUS
  arm64: dts: ti: k3-am62p5-sk: Remove the unused cfg in USB1_DRVVBUS
  arm64: dts: ti: k3-am62d2-evm: Add support for OSPI flash
  arm64: dts: ti: k3-am62d2-evm: Enable USB support
  arm64: dts: ti: k3-am62a-main: Fix main padcfg length
  arm64: dts: ti: k3-am62p: Update eMMC HS400 STRB value
  arm64: dts: ti: k3-am62p/j722s: Remove HS400 support from common
  arm64: dts: ti: Add support for AM6254atl SiP SK
  ...

Link: https://lore.kernel.org/r/20250916175349.pxg6gxd4vg5vfmhx@overvalue
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-23 22:11:51 +02:00
Chenghao Duan
d0bf7cd5df riscv: bpf: Fix uninitialized symbol 'retval_off'
In the __arch_prepare_bpf_trampoline() function, retval_off is only
meaningful when save_ret is true, so the current logic is correct.
However, in the original logic, retval_off is only initialized under
certain conditions; for example, in the fmod_ret logic, the compiler is
not aware that the flags of the fmod_ret program (prog) have set
BPF_TRAMP_F_CALL_ORIG, which results in an uninitialized symbol
compilation warning.

So initialize retval_off unconditionally to fix it.

Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20250922062244.822937-2-duanchenghao@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-23 12:29:03 -07:00
Puranjay Mohan
eab2a71f3a bpf, arm64: Add support for signed arena loads
Add support for signed loads from arena which are internally converted
to loads with mode set BPF_PROBE_MEM32SX by the verifier. The
implementation is similar to BPF_PROBE_MEMSX and BPF_MEMSX but for
BPF_PROBE_MEM32SX, arena_vm_base is added to the src register to form
the address.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20250923110157.18326-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-23 12:00:22 -07:00
Kumar Kartikeya Dwivedi
a91ae3c893 bpf, x86: Add support for signed arena loads
Currently, signed load instructions into arena memory are unsupported.
The compiler is free to generate these, and on GCC-14 we see a
corresponding error when it happens. The hurdle in supporting them is
deciding which unused opcode to use to mark them for the JIT's own
consumption. After much thinking, it appears 0xc0 / BPF_NOSPEC can be
combined with load instructions to identify signed arena loads. Use
this to recognize and JIT them appropriately, and remove the verifier
side limitation on the program if the JIT supports them.

Co-developed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20250923110157.18326-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-23 12:00:22 -07:00
Mathias Krause
d292035fb5 KVM: VMX: Make CR4.CET a guest owned bit
Make CR4.CET a guest-owned bit under VMX by extending
KVM_POSSIBLE_CR4_GUEST_BITS accordingly.

There's no need to intercept changes to CR4.CET, as it's neither
included in KVM's MMU role bits, nor does KVM specifically care about
the actual value of a (nested) guest's CR4.CET value, beside for
enforcing architectural constraints, i.e. make sure that CR0.WP=1 if
CR4.CET=1.

Intercepting writes to CR4.CET is particularly bad for grsecurity
kernels with KERNEXEC or, even worse, KERNSEAL enabled. These features
heavily make use of read-only kernel objects and use a cpu-local CR0.WP
toggle to override it, when needed. Under a CET-enabled kernel, this
also requires toggling CR4.CET, hence the motivation to make it
guest-owned.

Using the old test from [1] gives the following runtime numbers (perf
stat -r 5 ssdd 10 50000):

* grsec guest on linux-6.16-rc5 + cet patches:
  2.4647 +- 0.0706 seconds time elapsed  ( +-  2.86% )

* grsec guest on linux-6.16-rc5 + cet patches + CR4.CET guest-owned:
  1.5648 +- 0.0240 seconds time elapsed  ( +-  1.53% )

Not only does not intercepting CR4.CET make the test run ~35% faster,
it's also more stable with less fluctuation due to fewer VMEXITs.

Therefore, make CR4.CET a guest-owned bit where possible.

This change is VMX-specific, as SVM has no such fine-grained control
register intercept control.

If KVM's assumptions regarding MMU role handling wrt. a guest's CR4.CET
value ever change, the BUILD_BUG_ON()s related to KVM_MMU_CR4_ROLE_BITS
and KVM_POSSIBLE_CR4_GUEST_BITS will catch that early.

Link: https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/ [1]
Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-52-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 10:03:09 -07:00
Sean Christopherson
fddd07626b KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
Add {HV,CP,SX}_VECTOR definitions for AMD's Hypervisor Injection Exception,
VMM Communication Exception, and SVM Security Exception vectors, along with
human friendly formatting for trace_kvm_inj_exception().

Note, KVM is all but guaranteed to never observe or inject #SX, and #HV is
also unlikely to go unused.  Add the architectural collateral mostly for
completeness, and on the off chance that hardware goes off the rails.

Link: https://lore.kernel.org/r/20250919223258.1604852-44-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:29:03 -07:00
Sean Christopherson
f2f5519aa4 KVM: x86: Define Control Protection Exception (#CP) vector
Add a CP_VECTOR definition for CET's Control Protection Exception (#CP),
along with human friendly formatting for trace_kvm_inj_exception().

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-43-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:28:56 -07:00
Sean Christopherson
d37cc4819a KVM: x86: Add human friendly formatting for #XM, and #VE
Add XM_VECTOR and VE_VECTOR pretty-printing for
trace_kvm_inj_exception().

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-42-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:28:45 -07:00
John Allen
8db428fd52 KVM: SVM: Enable shadow stack virtualization for SVM
Remove the explicit clearing of shadow stack CPU capabilities.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: John Allen <john.allen@amd.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-41-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:28:37 -07:00
Sean Christopherson
b5fa221f7b KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
Synchronize XSS from the GHCB to KVM's internal tracking if the guest
marks XSS as valid on a #VMGEXIT.  Like XCR0, KVM needs an up-to-date copy
of XSS in order to compute the required XSTATE size when emulating
CPUID.0xD.0x1 for the guest.

Treat the incoming XSS change as an emulated write, i.e. validatate the
guest-provided value, to avoid letting the guest load garbage into KVM's
tracking.  Simply ignore bad values, as either the guest managed to get an
unsupported value into hardware, or the guest is misbehaving and providing
pure garbage.  In either case, KVM can't fix the broken guest.

Explicitly allow access to XSS at all times, as KVM needs to ensure its
copy of XSS stays up-to-date.  E.g. KVM supports migration of SEV-ES guests
and so needs to allow the host to save/restore XSS, otherwise a guest
that *knows* its XSS hasn't change could get stale/bad CPUID emulation if
the guest doesn't provide XSS in the GHCB on every exit.  This creates a
hypothetical problem where a guest could request emulation of RDMSR or
WRMSR on XSS, but arguably that's not even a problem, e.g. it would be
entirely reasonable for a guest to request "emulation" as a way to inform
the hypervisor that its XSS value has been modified.

Note, emulating the change as an MSR write also takes care of side effects,
e.g. marking dynamic CPUID bits as dirty.

Suggested-by: John Allen <john.allen@amd.com>
base-commit: 14298d819d5a6b7180a4089e7d2121ca3551dc6c
Link: https://lore.kernel.org/r/20250919223258.1604852-40-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:28:31 -07:00
John Allen
38c46bdbf9 KVM: SVM: Pass through shadow stack MSRs as appropriate
Pass through XSAVE managed CET MSRs on SVM when KVM supports shadow
stack. These cannot be intercepted without also intercepting XSAVE which
would likely cause unacceptable performance overhead.
MSR_IA32_INT_SSP_TAB is not managed by XSAVE, so it is intercepted.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: John Allen <john.allen@amd.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-39-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:28:27 -07:00
John Allen
c7586aa3be KVM: SVM: Update dump_vmcb with shadow stack save area additions
Add shadow stack VMCB fields to dump_vmcb. PL0_SSP, PL1_SSP, PL2_SSP,
PL3_SSP, and U_CET are part of the SEV-ES save area and are encrypted,
but can be decrypted and dumped if the guest policy allows debugging.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: John Allen <john.allen@amd.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-38-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:28:23 -07:00
Sean Christopherson
c5ba494585 KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
Transfer the three CET Shadow Stack VMCB fields (S_CET, ISST_ADDR, and
SSP) on VMRUN, #VMEXIT, and loading nested state (saving nested state
simply copies the entire save area).  SVM doesn't provide a way to
disallow L1 from enabling Shadow Stacks for L2, i.e. KVM *must* provide
nested support before advertising SHSTK to userspace.

Link: https://lore.kernel.org/r/20250919223258.1604852-37-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:27:06 -07:00
John Allen
48b2ec0d54 KVM: SVM: Emulate reads and writes to shadow stack MSRs
Emulate shadow stack MSR access by reading and writing to the
corresponding fields in the VMCB.

Signed-off-by: John Allen <john.allen@amd.com>
[sean: mark VMCB_CET dirty/clean as appropriate]
Link: https://lore.kernel.org/r/20250919223258.1604852-36-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:26:51 -07:00
Chao Gao
42ae644853 KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state
Advertise the LOAD_CET_STATE VM-Entry/Exit control bits in the nested VMX
MSRS, as all nested support for CET virtualization, including consistency
checks, is in place.

Advertise support if and only if KVM supports at least one of IBT or SHSTK.
While it's userspace's responsibility to provide a consistent CPU model to
the guest, that doesn't mean KVM should set userspace up to fail.

Note, the existing {CLEAR,LOAD}_BNDCFGS behavior predates
KVM_X86_QUIRK_STUFF_FEATURE_MSRS, i.e. KVM "solved" the inconsistent CPU
model problem by overwriting the VMX MSRs provided by userspace.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-35-seanjc@google.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:26:30 -07:00
Chao Gao
62f7533a6b KVM: nVMX: Add consistency checks for CET states
Introduce consistency checks for CET states during nested VM-entry.

A VMCS contains both guest and host CET states, each comprising the
IA32_S_CET MSR, SSP, and IA32_INTERRUPT_SSP_TABLE_ADDR MSR. Various
checks are applied to CET states during VM-entry as documented in SDM
Vol3 Chapter "VM ENTRIES". Implement all these checks during nested
VM-entry to emulate the architectural behavior.

In summary, there are three kinds of checks on guest/host CET states
during VM-entry:

A. Checks applied to both guest states and host states:

 * The IA32_S_CET field must not set any reserved bits; bits 10 (SUPPRESS)
   and 11 (TRACKER) cannot both be set.
 * SSP should not have bits 1:0 set.
 * The IA32_INTERRUPT_SSP_TABLE_ADDR field must be canonical.

B. Checks applied to host states only

 * IA32_S_CET MSR and SSP must be canonical if the CPU enters 64-bit mode
   after VM-exit. Otherwise, IA32_S_CET and SSP must have their higher 32
   bits cleared.

C. Checks applied to guest states only:

 * IA32_S_CET MSR and SSP are not required to be canonical (i.e., 63:N-1
   are identical, where N is the CPU's maximum linear-address width). But,
   bits 63:N of SSP must be identical.

Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-34-seanjc@google.com
[sean: have common helper return 0/-EINVAL, not true/false]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:25:02 -07:00
Chao Gao
8060b2bd2d KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET
Add consistency checks for CR4.CET and CR0.WP in guest-state or host-state
area in the VMCS12. This ensures that configurations with CR4.CET set and
CR0.WP not set result in VM-entry failure, aligning with architectural
behavior.

Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-33-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:24:35 -07:00
Yang Weijiang
625884996b KVM: nVMX: Prepare for enabling CET support for nested guest
Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed CR4 setting
to enable CET for nested VM.

vmcs12 and vmcs02 needs to be synced when L2 exits to L1 or when L1 wants
to resume L2, that way correct CET states can be observed by one another.

Please note that consistency checks regarding CET state during VM-Entry
will be added later to prevent this patch from becoming too large.
Advertising the new CET VM_ENTRY/EXIT control bits are also be deferred
until after the consistency checks are added.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
Tested-by: Xin Li (Intel) <xin@zytor.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-32-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:24:30 -07:00
Yang Weijiang
033cc166f0 KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
Per SDM description(Vol.3D, Appendix A.1):
"If bit 56 is read as 1, software can use VM entry to deliver a hardware
exception with or without an error code, regardless of vector"

Modify has_error_code check before inject events to nested guest. Only
enforce the check when guest is in real mode, the exception is not hard
exception and the platform doesn't enumerate bit56 in VMX_BASIC, in all
other case ignore the check to make the logic consistent with SDM.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-31-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:24:11 -07:00
Sean Christopherson
f7336d47be KVM: VMX: Configure nested capabilities after CPU capabilities
Swap the order between configuring nested VMX capabilities and base CPU
capabilities, so that nested VMX support can be conditioned on core KVM
support, e.g. to allow conditioning support for LOAD_CET_STATE on the
presence of IBT or SHSTK.  Because the sanity checks on nested VMX config
performed by vmx_check_processor_compat() run _after_ vmx_hardware_setup(),
any use of kvm_cpu_cap_has() when configuring nested VMX support will lead
to failures in vmx_check_processor_compat().

While swapping the order of two (or more) configuration flows can lead to
a game of whack-a-mole, in this case nested support inarguably should be
done after base support.  KVM should never condition base support on nested
support, because nested support is fully optional, while obviously it's
desirable to condition nested support on base support.  And there's zero
evidence the current ordering was intentional, e.g. commit 66a6950f99
("KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking")
likely placed the call to kvm_set_cpu_caps() after nested setup because it
looked pretty.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-30-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:23:10 -07:00
Yang Weijiang
e140467bbd KVM: x86: Enable CET virtualization for VMX and advertise to userspace
Add support for the LOAD_CET_STATE VM-Enter and VM-Exit controls, the
CET XFEATURE bits in XSS, and  advertise support for IBT and SHSTK to
userspace.  Explicitly clear IBT and SHSTK onn SVM, as additional work is
needed to enable CET on SVM, e.g. to context switch S_CET and other state.

Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
KVM does not support emulating CET, as running without Unrestricted Guest
can result in KVM emulating large swaths of guest code.  While it's highly
unlikely any guest will trigger emulation while also utilizing IBT or
SHSTK, there's zero reason to allow CET without Unrestricted Guest as that
combination should only be possible when explicitly disabling
unrestricted_guest for testing purposes.

Disable CET if VMX_BASIC[bit56] == 0, i.e. if hardware strictly enforces
the presence of an Error Code based on exception vector, as attempting to
inject a #CP with an Error Code (#CP architecturally has an Error Code)
will fail due to the #CP vector historically not having an Error Code.

Clear S_CET and SSP-related VMCS on "reset" to emulate the architectural
of CET MSRs and SSP being reset to 0 after RESET, power-up and INIT.  Note,
KVM already clears guest CET state that is managed via XSTATE in
kvm_xstate_reset().

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: move some bits to separate patches, massage changelog]
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-29-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:22:32 -07:00
Sean Christopherson
343acdd158 KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true
Make IBT and SHSTK virtualization mutually exclusive with "officially"
supporting setups with guest.MAXPHYADDR < host.MAXPHYADDR, i.e. if the
allow_smaller_maxphyaddr module param is set.  Running a guest with a
smaller MAXPHYADDR requires intercepting #PF, and can also trigger
emulation of arbitrary instructions.  Intercepting and reacting to #PFs
doesn't play nice with SHSTK, as KVM's MMU hasn't been taught to handle
Shadow Stack accesses, and emulating arbitrary instructions doesn't play
nice with IBT or SHSTK, as KVM's emulator doesn't handle the various side
effects, e.g. doesn't enforce end-branch markers or model Shadow Stack
updates.

Note, hiding IBT and SHSTK based solely on allow_smaller_maxphyaddr is
overkill, as allow_smaller_maxphyaddr is only problematic if the guest is
actually configured to have a smaller MAXPHYADDR.  However, KVM's ABI
doesn't provide a way to express that IBT and SHSTK may break if enabled
in conjunction with guest.MAXPHYADDR < host.MAXPHYADDR.  I.e. the
alternative is to do nothing in KVM and instead update documentation and
hope KVM users are thorough readers.  Go with the conservative-but-correct
approach; worst case scenario, this restriction can be dropped if there's
a strong use case for enabling CET on hosts with allow_smaller_maxphyaddr.

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-28-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:21:34 -07:00
Sean Christopherson
f705de12a2 KVM: x86: Initialize allow_smaller_maxphyaddr earlier in setup
Initialize allow_smaller_maxphyaddr during hardware setup as soon as KVM
knows whether or not TDP will be utilized.  To avoid having to teach KVM's
emulator all about CET, KVM's upcoming CET virtualization support will be
mutually exclusive with allow_smaller_maxphyaddr, i.e. will disable SHSTK
and IBT if allow_smaller_maxphyaddr is enabled.

In general, allow_smaller_maxphyaddr should be initialized as soon as
possible since it's globally visible while its only input is whether or
not EPT/NPT is enabled.  I.e. there's effectively zero risk of setting
allow_smaller_maxphyaddr too early, and substantial risk of setting it
too late.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250922184743.1745778-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:21:29 -07:00
Sean Christopherson
1f6f68fcfe KVM: x86: Disable support for Shadow Stacks if TDP is disabled
Make TDP a hard requirement for Shadow Stacks, as there are no plans to
add Shadow Stack support to the Shadow MMU.  E.g. KVM hasn't been taught
to understand the magic Writable=0,Dirty=1 combination that is required
for Shadow Stack accesses, and so enabling Shadow Stacks when using
shadow paging will put the guest into an infinite #PF loop (KVM thinks the
shadow page tables have a valid mapping, hardware says otherwise).

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-27-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:19:29 -07:00
Yang Weijiang
69cc3e8865 KVM: x86: Add XSS support for CET_KERNEL and CET_USER
Add CET_KERNEL and CET_USER to KVM's set of supported XSS bits when IBT
*or* SHSTK is supported.  Like CR4.CET, XFEATURE support for IBT and SHSTK
are bundle together under the CET umbrella, and thus prone to
virtualization holes if KVM or the guest supports only one of IBT or SHSTK,
but hardware supports both.  However, again like CR4.CET, such
virtualization holes are benign from the host's perspective so long as KVM
takes care to always honor the "or" logic.

Require CET_KERNEL and CET_USER to come as a pair, and refuse to support
IBT or SHSTK if one (or both) features is missing, as the (host) kernel
expects them to come as a pair, i.e. may get confused and corrupt state if
only one of CET_KERNEL or CET_USER is supported.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: split to separate patch, write changelog, add XFEATURE_MASK_CET_ALL]
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-26-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:18:54 -07:00
Sean Christopherson
19e6e083f3 KVM: nVMX: Always forward XSAVES/XRSTORS exits from L2 to L1
Unconditionally forward XSAVES/XRSTORS VM-Exits from L2 to L1, as KVM
doesn't utilize the XSS-bitmap (KVM relies on controlling the XSS value
in hardware to prevent unauthorized access to XSAVES state).  KVM always
loads vmcs02 with vmcs12's bitmap, and so any exit _must_ be due to
vmcs12's XSS-bitmap.

Drop the comment about XSS never being non-zero in anticipation of
enabling CET_KERNEL and CET_USER support.

Opportunistically WARN if XSAVES is not enabled for L2, as the CPU is
supposed to generate #UD before checking the XSS-bitmap.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-25-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:18:28 -07:00
Yang Weijiang
b3744c59eb KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported
Drop X86_CR4_CET from CR4_RESERVED_BITS and instead mark CET as reserved
if and only if IBT *and* SHSTK are unsupported, i.e. allow CR4.CET to be
set if IBT or SHSTK is supported.  This creates a virtualization hole if
the CPU supports both IBT and SHSTK, but the kernel or vCPU model only
supports one of the features.  However, it's entirely legal for a CPU to
have only one of IBT or SHSTK, i.e. the hole is a flaw in the architecture,
not in KVM.

More importantly, so long as KVM is careful to initialize and context
switch both IBT and SHSTK state (when supported in hardware) if either
feature is exposed to the guest, a misbehaving guest can only harm itself.
E.g. VMX initializes host CET VMCS fields based solely on hardware
capabilities.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: split to separate patch, write changelog]
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-24-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:17:48 -07:00
Sean Christopherson
843af0f2e4 KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints
Add PK (Protection Keys), SS (Shadow Stacks), and SGX (Software Guard
Extensions) to the set of #PF error flags handled via
kvm_mmu_trace_pferr_flags.  While KVM doesn't expect PK or SS #PFs in
particular, pretty print their names instead of the raw hex value saves
the user from having to go spelunking in the SDM to figure out what's
going on.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-23-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:17:32 -07:00
Sean Christopherson
296599346c KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack #PF
Add PFERR_SS_MASK, a.k.a. Shadow Stack access, and WARN if KVM attempts to
check permissions for a Shadow Stack access as KVM hasn't been taught to
understand the magic Writable=0,Dirty=1 combination that is required for
Shadow Stack accesses, and likely will never learn.  There are no plans to
support Shadow Stacks with the Shadow MMU, and the emulator rejects all
instructions that affect Shadow Stacks, i.e. it should be impossible for
KVM to observe a #PF due to a shadow stack access.

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-22-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:16:53 -07:00
Sean Christopherson
d4c03f6395 KVM: x86: Emulate SSP[63:32]!=0 #GP(0) for FAR JMP to 32-bit mode
Emulate the Shadow Stack restriction that the current SSP must be a 32-bit
value on a FAR JMP from 64-bit mode to compatibility mode.  From the SDM's
pseudocode for FAR JMP:

  IF ShadowStackEnabled(CPL)
    IF (IA32_EFER.LMA and DEST(segment selector).L) = 0
      (* If target is legacy or compatibility mode then the SSP must be in low 4GB *)
      IF (SSP & 0xFFFFFFFF00000000 != 0); THEN
        #GP(0);
      FI;
    FI;
  FI;

Note, only the current CPL needs to be considered, as FAR JMP can't be
used for inter-privilege level transfers, and KVM rejects emulation of all
other far branch instructions when Shadow Stacks are enabled.

To give the emulator access to GUEST_SSP, special case handling
MSR_KVM_INTERNAL_GUEST_SSP in emulator_get_msr() to treat the access as a
host access (KVM doesn't allow guest accesses to internal "MSRs").  The
->get_msr() API is only used for implicit accesses from the emulator, i.e.
is only used with hardcoded MSR indices, and so any access to
MSR_KVM_INTERNAL_GUEST_SSP is guaranteed to be from KVM, i.e. not from the
guest via RDMSR.

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-21-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:16:25 -07:00
Sean Christopherson
82c0ec0282 KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled
Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if the guest triggers
task switch emulation with Indirect Branch Tracking or Shadow Stacks
enabled, as attempting to do the right thing would require non-trivial
effort and complexity, KVM doesn't support emulating CET generally, and
it's extremely unlikely that any guest will do task switches while also
utilizing CET.  Defer taking on the complexity until someone cares enough
to put in the time and effort to add support.

Per the SDM:

  If shadow stack is enabled, then the SSP of the task is located at the
  4 bytes at offset 104 in the 32-bit TSS and is used by the processor to
  establish the SSP when a task switch occurs from a task associated with
  this TSS. Note that the processor does not write the SSP of the task
  initiating the task switch to the TSS of that task, and instead the SSP
  of the previous task is pushed onto the shadow stack of the new task.

Note, per the SDM's pseudocode on TASK SWITCHING, IBT state for the new
privilege level is updated.  To keep things simple, check both S_CET and
U_CET (again, anyone that wants more precise checking can have the honor
of implementing support).

Reported-by: Binbin Wu <binbin.wu@linux.intel.com>
Closes: https://lore.kernel.org/all/819bd98b-2a60-4107-8e13-41f1e4c706b1@linux.intel.com
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:15:49 -07:00
Sean Christopherson
57c3db7e2e KVM: x86: Don't emulate instructions affected by CET features
Don't emulate branch instructions, e.g. CALL/RET/JMP etc., that are
affected by Shadow Stacks and/or Indirect Branch Tracking when said
features are enabled in the guest, as fully emulating CET would require
significant complexity for no practical benefit (KVM shouldn't need to
emulate branch instructions on modern hosts).  Simply doing nothing isn't
an option as that would allow a malicious entity to subvert CET
protections via the emulator.

To detect instructions that are subject to IBT or affect IBT state, use
the existing IsBranch flag along with the source operand type to detect
indirect branches, and the existing NearBranch flag to detect far JMPs
and CALLs, all of which are effectively indirect.  Explicitly check for
emulation of IRET, FAR RET (IMM), and SYSEXIT (the ret-like far branches)
instead of adding another flag, e.g. IsRet, as it's unlikely the emulator
will ever need to check for return-like instructions outside of this one
specific flow.  Use an allow-list instead of a deny-list because (a) it's
a shorter list and (b) so that a missed entry gets a false positive, not a
false negative (i.e. reject emulation instead of clobbering CET state).

For Shadow Stacks, explicitly track instructions that directly affect the
current SSP, as KVM's emulator doesn't have existing flags that can be
used to precisely detect such instructions.  Alternatively, the em_xxx()
helpers could directly check for ShadowStack interactions, but using a
dedicated flag is arguably easier to audit, and allows for handling both
IBT and SHSTK in one fell swoop.

Note!  On far transfers, do NOT consult the current privilege level and
instead treat SHSTK/IBT as being enabled if they're enabled for User *or*
Supervisor mode.  On inter-privilege level far transfers, SHSTK and IBT
can be in play for the target privilege level, i.e. checking the current
privilege could get a false negative, and KVM doesn't know the target
privilege level until emulation gets under way.

Note #2, FAR JMP from 64-bit mode to compatibility mode interacts with
the current SSP, but only to ensure SSP[63:32] == 0.  Don't tag FAR JMP
as SHSTK, which would be rather confusing and would result in FAR JMP
being rejected unnecessarily the vast majority of the time (ignoring that
it's unlikely to ever be emulated).  A future commit will add the #GP(0)
check for the specific FAR JMP scenario.

Note #3, task switches also modify SSP and so need to be rejected.  That
too will be addressed in a future commit.

Suggested-by: Chao Gao <chao.gao@intel.com>
Originally-by: Yang Weijiang <weijiang.yang@intel.com>
Cc: Mathias Krause <minipli@grsecurity.net>
Cc: John Allen <john.allen@amd.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:14:33 -07:00
Yang Weijiang
584ba3ffb9 KVM: VMX: Set host constant supervisor states to VMCS fields
Save constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} field explicitly.
Kernel IBT is supported and the setting in MSR_IA32_S_CET is static after
post-boot(The exception is BIOS call case but vCPU thread never across it)
and KVM doesn't need to refresh HOST_S_CET field before every VM-Enter/
VM-Exit sequence.

Host supervisor shadow stack is not enabled now and SSP is not accessible
to kernel mode, thus it's safe to set host IA32_INT_SSP_TAB/SSP VMCS field
to 0s. When shadow stack is enabled for CPL3, SSP is reloaded from PL3_SSP
before it exits to userspace. Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/
SYSRET/SYSENTER SYSEXIT/RDSSP/CALL etc.

Prevent KVM module loading if host supervisor shadow stack SHSTK_EN is set
in MSR_IA32_S_CET as KVM cannot co-exit with it correctly.

Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: snapshot host S_CET if SHSTK *or* IBT is supported]
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:11:49 -07:00
Yang Weijiang
25f3840483 KVM: VMX: Set up interception for CET MSRs
Disable interception for CET MSRs that can be accessed via XSAVES/XRSTORS,
and exist accordingly to CPUID, as accesses through XSTATE aren't subject
to MSR interception checks, i.e. can't be intercepted without intercepting
and emulating XSAVES/XRSTORS, and KVM doesn't support emulating
XSAVE/XRSTOR instructions.

Don't condition interception on the guest actually having XSAVES as there
is no benefit to intercepting the accesses (when the MSRs exist).  The
MSRs in question are either context switched by the CPU on VM-Enter/VM-Exit
or by KVM via XSAVES/XRSTORS (KVM requires XSAVES to virtualization SHSTK),
i.e. KVM is going to load guest values into hardware irrespective of guest
XSAVES support.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-17-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:11:26 -07:00
Yang Weijiang
1a61bd0d12 KVM: x86: Save and reload SSP to/from SMRAM
Save CET SSP to SMRAM on SMI and reload it on RSM. KVM emulates HW arch
behavior when guest enters/leaves SMM mode,i.e., save registers to SMRAM
at the entry of SMM and reload them at the exit to SMM. Per SDM, SSP is
one of such registers on 64-bit Arch, and add the support for SSP.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:11:22 -07:00
Yang Weijiang
8b59d0275c KVM: VMX: Emulate read and write to CET MSRs
Add emulation interface for CET MSR access. The emulation code is split
into common part and vendor specific part. The former does common checks
for MSRs, e.g., accessibility, data validity etc., then passes operation
to either XSAVE-managed MSRs via the helpers or CET VMCS fields.

SSP can only be read via RDSSP. Writing even requires destructive and
potentially faulting operations such as SAVEPREVSSP/RSTORSSP or
SETSSBSY/CLRSSBSY. Let the host use a pseudo-MSR that is just a wrapper
for the GUEST_SSP field of the VMCS.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: drop call to kvm_set_xstate_msr() for S_CET, consolidate code]
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:10:47 -07:00
Yang Weijiang
9d6812d415 KVM: x86: Enable guest SSP read/write interface with new uAPIs
Add a KVM-defined ONE_REG register, KVM_REG_GUEST_SSP, to let userspace
save and restore the guest's Shadow Stack Pointer (SSP).  On both Intel
and AMD, SSP is a hardware register that can only be accessed by software
via dedicated ISA (e.g. RDSSP) or via VMCS/VMCB fields (used by hardware
to context switch SSP at entry/exit).  As a result, SSP doesn't fit in
any of KVM's existing interfaces for saving/restoring state.

Internally, treat SSP as a fake/synthetic MSR, as the semantics of writes
to SSP follow that of several other Shadow Stack MSRs, e.g. the PLx_SSP
MSRs.  Use a translation layer to hide the KVM-internal MSR index so that
the arbitrary index doesn't become ABI, e.g. so that KVM can rework its
implementation as needed, so long as the ONE_REG ABI is maintained.

Explicitly reject accesses to SSP if the vCPU doesn't have Shadow Stack
support to avoid running afoul of ignore_msrs, which unfortunately applies
to host-initiated accesses (which is a discussion for another day).  I.e.
ensure consistent behavior for KVM-defined registers irrespective of
ignore_msrs.

Link: https://lore.kernel.org/all/aca9d389-f11e-4811-90cf-d98e345a5cc2@intel.com
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-14-seanjc@google.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:10:33 -07:00
Yang Weijiang
d6c387fc39 KVM: VMX: Introduce CET VMCS fields and control bits
Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.

Shadow Stack (SHSTK):
  A shadow stack is a second stack used exclusively for control transfer
  operations. The shadow stack is separate from the data/normal stack and
  can be enabled individually in user and kernel mode. When shadow stack
  is enabled, CALL pushes the return address on both the data and shadow
  stack. RET pops the return address from both stacks and compares them.
  If the return addresses from the two stacks do not match, the processor
  generates a #CP.

Indirect Branch Tracking (IBT):
  IBT introduces instruction(ENDBRANCH)to mark valid target addresses of
  indirect branches (CALL, JMP etc...). If an indirect branch is executed
  and the next instruction is _not_ an ENDBRANCH, the processor generates
  a #CP. These instruction behaves as a NOP on platforms that have no CET.

Several new CET MSRs are defined to support CET:
  MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} CET respectively.

  MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}.

  MSR_IA32_INT_SSP_TAB: Linear address of SHSTK pointer table, whose entry
			is indexed by IST of interrupt gate desc.

Two XSAVES state bits are introduced for CET:
  IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
  IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states.

Six VMCS fields are introduced for CET:
  {HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
  {HOST,GUEST}_SSP: Stores current active SSP.
  {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.

On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY
control fields:
If VM_EXIT_LOAD_CET_STATE = 1, host CET states are loaded from following
VMCS fields at VM-Exit:
  HOST_S_CET
  HOST_SSP
  HOST_INTR_SSP_TABLE

If VM_ENTRY_LOAD_CET_STATE = 1, guest CET states are loaded from following
VMCS fields at VM-Entry:
  GUEST_S_CET
  GUEST_SSP
  GUEST_INTR_SSP_TABLE

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:00:49 -07:00