Merge tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 FRED support from Thomas Gleixner:
 "Support for x86 Fast Return and Event Delivery (FRED).

  FRED is a replacement for IDT event delivery on x86 and addresses most
  of the technical nightmares which IDT exposes:

   1) Exception cause registers like CR2 need to be manually preserved
      in nested exception scenarios.

   2) Hardware interrupt stack switching is suboptimal for nested
      exceptions as the interrupt stack mechanism rewinds the stack on
      each entry which requires a massive effort in the low level entry
      of #NMI code to handle this.

   3) No hardware distinction between entry from kernel or from user
      which makes establishing kernel context more complex than it needs
      to be especially for unconditionally nestable exceptions like NMI.

   4) NMI nesting caused by IRET unconditionally reenabling NMIs, which
      is a problem when the perf NMI takes a fault when collecting a
      stack trace.

   5) Partial restore of ESP when returning to a 16-bit segment

   6) Limitation of the vector space which can cause vector exhaustion
      on large systems.

   7) Inability to differentiate NMI sources

  FRED addresses these shortcomings by:

   1) An extended exception stack frame which the CPU uses to save
      exception cause registers. This ensures that the meta information
      for each exception is preserved on stack and avoids the extra
      complexity of preserving it in software.

   2) Hardware interrupt stack switching is non-rewinding if a nested
      exception uses the currently interrupt stack.

   3) The entry points for kernel and user context are separate and GS
      BASE handling which is required to establish kernel context for
      per CPU variable access is done in hardware.

   4) NMIs are now nesting protected. They are only reenabled on the
      return from NMI.

   5) FRED guarantees full restore of ESP

   6) FRED does not put a limitation on the vector space by design
      because it uses a central entry points for kernel and user space
      and the CPUstores the entry type (exception, trap, interrupt,
      syscall) on the entry stack along with the vector number. The
      entry code has to demultiplex this information, but this removes
      the vector space restriction.

      The first hardware implementations will still have the current
      restricted vector space because lifting this limitation requires
      further changes to the local APIC.

   7) FRED stores the vector number and meta information on stack which
      allows having more than one NMI vector in future hardware when the
      required local APIC changes are in place.

  The series implements the initial FRED support by:

   - Reworking the existing entry and IDT handling infrastructure to
     accomodate for the alternative entry mechanism.

   - Expanding the stack frame to accomodate for the extra 16 bytes FRED
     requires to store context and meta information

   - Providing FRED specific C entry points for events which have
     information pushed to the extended stack frame, e.g. #PF and #DB.

   - Providing FRED specific C entry points for #NMI and #MCE

   - Implementing the FRED specific ASM entry points and the C code to
     demultiplex the events

   - Providing detection and initialization mechanisms and the necessary
     tweaks in context switching, GS BASE handling etc.

  The FRED integration aims for maximum code reuse vs the existing IDT
  implementation to the extent possible and the deviation in hot paths
  like context switching are handled with alternatives to minimalize the
  impact. The low level entry and exit paths are seperate due to the
  extended stack frame and the hardware based GS BASE swichting and
  therefore have no impact on IDT based systems.

  It has been extensively tested on existing systems and on the FRED
  simulation and as of now there are no outstanding problems"

* tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/fred: Fix init_task thread stack pointer initialization
  MAINTAINERS: Add a maintainer entry for FRED
  x86/fred: Fix a build warning with allmodconfig due to 'inline' failing to inline properly
  x86/fred: Invoke FRED initialization code to enable FRED
  x86/fred: Add FRED initialization functions
  x86/syscall: Split IDT syscall setup code into idt_syscall_init()
  KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling
  x86/entry: Add fred_entry_from_kvm() for VMX to handle IRQ/NMI
  x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual entry code
  x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user
  x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled
  x86/traps: Add sysvec_install() to install a system interrupt handler
  x86/fred: FRED entry/exit and dispatch code
  x86/fred: Add a machine check entry stub for FRED
  x86/fred: Add a NMI entry stub for FRED
  x86/fred: Add a debug fault entry stub for FRED
  x86/idtentry: Incorporate definitions/declarations of the FRED entries
  x86/fred: Make exc_page_fault() work for FRED
  x86/fred: Allow single-step trap and NMI when starting a new task
  x86/fred: No ESPFIX needed when FRED is enabled
  ...
This commit is contained in:
Linus Torvalds
2024-03-11 16:00:17 -07:00
56 changed files with 1372 additions and 121 deletions

View File

@@ -1525,6 +1525,12 @@
Warning: use of this parameter will taint the kernel
and may cause unknown problems.
fred= [X86-64]
Enable/disable Flexible Return and Event Delivery.
Format: { on | off }
on: enable FRED when it's present.
off: disable FRED, the default setting.
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early

View File

@@ -0,0 +1,96 @@
.. SPDX-License-Identifier: GPL-2.0
=========================================
Flexible Return and Event Delivery (FRED)
=========================================
Overview
========
The FRED architecture defines simple new transitions that change
privilege level (ring transitions). The FRED architecture was
designed with the following goals:
1) Improve overall performance and response time by replacing event
delivery through the interrupt descriptor table (IDT event
delivery) and event return by the IRET instruction with lower
latency transitions.
2) Improve software robustness by ensuring that event delivery
establishes the full supervisor context and that event return
establishes the full user context.
The new transitions defined by the FRED architecture are FRED event
delivery and, for returning from events, two FRED return instructions.
FRED event delivery can effect a transition from ring 3 to ring 0, but
it is used also to deliver events incident to ring 0. One FRED
instruction (ERETU) effects a return from ring 0 to ring 3, while the
other (ERETS) returns while remaining in ring 0. Collectively, FRED
event delivery and the FRED return instructions are FRED transitions.
In addition to these transitions, the FRED architecture defines a new
instruction (LKGS) for managing the state of the GS segment register.
The LKGS instruction can be used by 64-bit operating systems that do
not use the new FRED transitions.
Furthermore, the FRED architecture is easy to extend for future CPU
architectures.
Software based event dispatching
================================
FRED operates differently from IDT in terms of event handling. Instead
of directly dispatching an event to its handler based on the event
vector, FRED requires the software to dispatch an event to its handler
based on both the event's type and vector. Therefore, an event dispatch
framework must be implemented to facilitate the event-to-handler
dispatch process. The FRED event dispatch framework takes control
once an event is delivered, and employs a two-level dispatch.
The first level dispatching is event type based, and the second level
dispatching is event vector based.
Full supervisor/user context
============================
FRED event delivery atomically save and restore full supervisor/user
context upon event delivery and return. Thus it avoids the problem of
transient states due to %cr2 and/or %dr6, and it is no longer needed
to handle all the ugly corner cases caused by half baked entry states.
FRED allows explicit unblock of NMI with new event return instructions
ERETS/ERETU, avoiding the mess caused by IRET which unconditionally
unblocks NMI, e.g., when an exception happens during NMI handling.
FRED always restores the full value of %rsp, thus ESPFIX is no longer
needed when FRED is enabled.
LKGS
====
LKGS behaves like the MOV to GS instruction except that it loads the
base address into the IA32_KERNEL_GS_BASE MSR instead of the GS
segments descriptor cache. With LKGS, it ends up with avoiding
mucking with kernel GS, i.e., an operating system can always operate
with its own GS base address.
Because FRED event delivery from ring 3 and ERETU both swap the value
of the GS base address and that of the IA32_KERNEL_GS_BASE MSR, plus
the introduction of LKGS instruction, the SWAPGS instruction is no
longer needed when FRED is enabled, thus is disallowed (#UD).
Stack levels
============
4 stack levels 0~3 are introduced to replace the nonreentrant IST for
event handling, and each stack level should be configured to use a
dedicated stack.
The current stack level could be unchanged or go higher upon FRED
event delivery. If unchanged, the CPU keeps using the current event
stack. If higher, the CPU switches to a new event stack specified by
the MSR of the new stack level, i.e., MSR_IA32_FRED_RSP[123].
Only execution of a FRED return instruction ERET[US], could lower the
current stack level, causing the CPU to switch back to the stack it was
on before a previous event delivery that promoted the stack level.

View File

@@ -15,3 +15,4 @@ x86_64 Support
cpu-hotplug-spec
machinecheck
fsgs
fred