Documentation: add documentation for KHO

With KHO in place, let's add documentation that describes what it is and
how to use it.

Link: https://lkml.kernel.org/r/20250509074635.3187114-17-changyuanl@google.com
Signed-off-by: Alexander Graf <graf@amazon.com>
Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Co-developed-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ashish Kalra <ashish.kalra@amd.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Gowans <jgowans@amazon.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Lendacky <thomas.lendacky@amd.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
Alexander Graf
2025-05-09 00:46:34 -07:00
committed by Andrew Morton
parent f992307802
commit 3498209ff6
10 changed files with 381 additions and 0 deletions

View File

@@ -2725,6 +2725,31 @@
kgdbwait [KGDB,EARLY] Stop kernel execution and enter the
kernel debugger at the earliest opportunity.
kho= [KEXEC,EARLY]
Format: { "0" | "1" | "off" | "on" | "y" | "n" }
Enables or disables Kexec HandOver.
"0" | "off" | "n" - kexec handover is disabled
"1" | "on" | "y" - kexec handover is enabled
kho_scratch= [KEXEC,EARLY]
Format: ll[KMG],mm[KMG],nn[KMG] | nn%
Defines the size of the KHO scratch region. The KHO
scratch regions are physically contiguous memory
ranges that can only be used for non-kernel
allocations. That way, even when memory is heavily
fragmented with handed over memory, the kexeced
kernel will always have enough contiguous ranges to
bootstrap itself.
It is possible to specify the exact amount of
memory in the form of "ll[KMG],mm[KMG],nn[KMG]"
where the first parameter defines the size of a low
memory scratch area, the second parameter defines
the size of a global scratch area and the third
parameter defines the size of additional per-node
scratch areas. The form "nn%" defines scale factor
(in percents) of memory that was used during boot.
kmac= [MIPS] Korina ethernet MAC address.
Configure the RouterBoard 532 series on-chip
Ethernet adapter MAC address.

View File

@@ -42,3 +42,4 @@ the Linux memory management.
transhuge
userfaultfd
zswap
kho

View File

@@ -0,0 +1,115 @@
.. SPDX-License-Identifier: GPL-2.0-or-later
====================
Kexec Handover Usage
====================
Kexec HandOver (KHO) is a mechanism that allows Linux to preserve memory
regions, which could contain serialized system states, across kexec.
This document expects that you are familiar with the base KHO
:ref:`concepts <kho-concepts>`. If you have not read
them yet, please do so now.
Prerequisites
=============
KHO is available when the kernel is compiled with ``CONFIG_KEXEC_HANDOVER``
set to y. Every KHO producer may have its own config option that you
need to enable if you would like to preserve their respective state across
kexec.
To use KHO, please boot the kernel with the ``kho=on`` command line
parameter. You may use ``kho_scratch`` parameter to define size of the
scratch regions. For example ``kho_scratch=16M,512M,256M`` will reserve a
16 MiB low memory scratch area, a 512 MiB global scratch region, and 256 MiB
per NUMA node scratch regions on boot.
Perform a KHO kexec
===================
First, before you perform a KHO kexec, you need to move the system into
the :ref:`KHO finalization phase <kho-finalization-phase>` ::
$ echo 1 > /sys/kernel/debug/kho/out/finalize
After this command, the KHO FDT is available in
``/sys/kernel/debug/kho/out/fdt``. Other subsystems may also register
their own preserved sub FDTs under
``/sys/kernel/debug/kho/out/sub_fdts/``.
Next, load the target payload and kexec into it. It is important that you
use the ``-s`` parameter to use the in-kernel kexec file loader, as user
space kexec tooling currently has no support for KHO with the user space
based file loader ::
# kexec -l /path/to/bzImage --initrd /path/to/initrd -s
# kexec -e
The new kernel will boot up and contain some of the previous kernel's state.
For example, if you used ``reserve_mem`` command line parameter to create
an early memory reservation, the new kernel will have that memory at the
same physical address as the old kernel.
Abort a KHO exec
================
You can move the system out of KHO finalization phase again by calling ::
$ echo 0 > /sys/kernel/debug/kho/out/active
After this command, the KHO FDT is no longer available in
``/sys/kernel/debug/kho/out/fdt``.
debugfs Interfaces
==================
Currently KHO creates the following debugfs interfaces. Notice that these
interfaces may change in the future. They will be moved to sysfs once KHO is
stabilized.
``/sys/kernel/debug/kho/out/finalize``
Kexec HandOver (KHO) allows Linux to transition the state of
compatible drivers into the next kexec'ed kernel. To do so,
device drivers will instruct KHO to preserve memory regions,
which could contain serialized kernel state.
While the state is serialized, they are unable to perform
any modifications to state that was serialized, such as
handed over memory allocations.
When this file contains "1", the system is in the transition
state. When contains "0", it is not. To switch between the
two states, echo the respective number into this file.
``/sys/kernel/debug/kho/out/fdt``
When KHO state tree is finalized, the kernel exposes the
flattened device tree blob that carries its current KHO
state in this file. Kexec user space tooling can use this
as input file for the KHO payload image.
``/sys/kernel/debug/kho/out/scratch_len``
Lengths of KHO scratch regions, which are physically contiguous
memory regions that will always stay available for future kexec
allocations. Kexec user space tools can use this file to determine
where it should place its payload images.
``/sys/kernel/debug/kho/out/scratch_phys``
Physical locations of KHO scratch regions. Kexec user space tools
can use this file in conjunction to scratch_phys to determine where
it should place its payload images.
``/sys/kernel/debug/kho/out/sub_fdts/``
In the KHO finalization phase, KHO producers register their own
FDT blob under this directory.
``/sys/kernel/debug/kho/in/fdt``
When the kernel was booted with Kexec HandOver (KHO),
the state tree that carries metadata about the previous
kernel's state is in this file in the format of flattened
device tree. This file may disappear when all consumers of
it finished to interpret their metadata.
``/sys/kernel/debug/kho/in/sub_fdts/``
Similar to ``kho/out/sub_fdts/``, but contains sub FDT blobs
of KHO producers passed from the old kernel.

View File

@@ -115,6 +115,7 @@ more memory-management documentation in Documentation/mm/index.rst.
pin_user_pages
boot-time-mm
gfp_mask-from-fs-io
kho/index
Interfaces for kernel debugging
===============================

View File

@@ -0,0 +1,43 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
title: Kexec HandOver (KHO) root tree
maintainers:
- Mike Rapoport <rppt@kernel.org>
- Changyuan Lyu <changyuanl@google.com>
description: |
System memory preserved by KHO across kexec.
properties:
compatible:
enum:
- kho-v1
preserved-memory-map:
description: |
physical address (u64) of an in-memory structure describing all preserved
folios and memory ranges.
patternProperties:
"$[0-9a-f_]+^":
$ref: sub-fdt.yaml#
description: physical address of a KHO user's own FDT.
required:
- compatible
- preserved-memory-map
additionalProperties: false
examples:
- |
kho {
compatible = "kho-v1";
preserved-memory-map = <0xf0be16 0x1000000>;
memblock {
fdt = <0x80cc16 0x1000000>;
};
};

View File

@@ -0,0 +1,27 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
title: KHO users' FDT address
maintainers:
- Mike Rapoport <rppt@kernel.org>
- Changyuan Lyu <changyuanl@google.com>
description: |
Physical address of an FDT blob registered by a KHO user.
properties:
fdt:
description: |
physical address (u64) of an FDT blob.
required:
- fdt
additionalProperties: false
examples:
- |
memblock {
fdt = <0x80cc16 0x1000000>;
};

View File

@@ -0,0 +1,74 @@
.. SPDX-License-Identifier: GPL-2.0-or-later
.. _kho-concepts:
=======================
Kexec Handover Concepts
=======================
Kexec HandOver (KHO) is a mechanism that allows Linux to preserve memory
regions, which could contain serialized system states, across kexec.
It introduces multiple concepts:
KHO FDT
=======
Every KHO kexec carries a KHO specific flattened device tree (FDT) blob
that describes preserved memory regions. These regions contain either
serialized subsystem states, or in-memory data that shall not be touched
across kexec. After KHO, subsystems can retrieve and restore preserved
memory regions from KHO FDT.
KHO only uses the FDT container format and libfdt library, but does not
adhere to the same property semantics that normal device trees do: Properties
are passed in native endianness and standardized properties like ``regs`` and
``ranges`` do not exist, hence there are no ``#...-cells`` properties.
KHO is still under development. The FDT schema is unstable and would change
in the future.
Scratch Regions
===============
To boot into kexec, we need to have a physically contiguous memory range that
contains no handed over memory. Kexec then places the target kernel and initrd
into that region. The new kernel exclusively uses this region for memory
allocations before during boot up to the initialization of the page allocator.
We guarantee that we always have such regions through the scratch regions: On
first boot KHO allocates several physically contiguous memory regions. Since
after kexec these regions will be used by early memory allocations, there is a
scratch region per NUMA node plus a scratch region to satisfy allocations
requests that do not require particular NUMA node assignment.
By default, size of the scratch region is calculated based on amount of memory
allocated during boot. The ``kho_scratch`` kernel command line option may be
used to explicitly define size of the scratch regions.
The scratch regions are declared as CMA when page allocator is initialized so
that their memory can be used during system lifetime. CMA gives us the
guarantee that no handover pages land in that region, because handover pages
must be at a static physical memory location and CMA enforces that only
movable pages can be located inside.
After KHO kexec, we ignore the ``kho_scratch`` kernel command line option and
instead reuse the exact same region that was originally allocated. This allows
us to recursively execute any amount of KHO kexecs. Because we used this region
for boot memory allocations and as target memory for kexec blobs, some parts
of that memory region may be reserved. These reservations are irrelevant for
the next KHO, because kexec can overwrite even the original kernel.
.. _kho-finalization-phase:
KHO finalization phase
======================
To enable user space based kexec file loader, the kernel needs to be able to
provide the FDT that describes the current kernel's state before
performing the actual kexec. The process of generating that FDT is
called serialization. When the FDT is generated, some properties
of the system may become immutable because they are already written down
in the FDT. That state is called the KHO finalization phase.
Public API
==========
.. kernel-doc:: kernel/kexec_handover.c
:export:

View File

@@ -0,0 +1,80 @@
.. SPDX-License-Identifier: GPL-2.0-or-later
=======
KHO FDT
=======
KHO uses the flattened device tree (FDT) container format and libfdt
library to create and parse the data that is passed between the
kernels. The properties in KHO FDT are stored in native format.
It includes the physical address of an in-memory structure describing
all preserved memory regions, as well as physical addresses of KHO users'
own FDTs. Interpreting those sub FDTs is the responsibility of KHO users.
KHO nodes and properties
========================
Property ``preserved-memory-map``
---------------------------------
KHO saves a special property named ``preserved-memory-map`` under the root node.
This node contains the physical address of an in-memory structure for KHO to
preserve memory regions across kexec.
Property ``compatible``
-----------------------
The ``compatible`` property determines compatibility between the kernel
that created the KHO FDT and the kernel that attempts to load it.
If the kernel that loads the KHO FDT is not compatible with it, the entire
KHO process will be bypassed.
Property ``fdt``
----------------
Generally, a KHO user serialize its state into its own FDT and instructs
KHO to preserve the underlying memory, such that after kexec, the new kernel
can recover its state from the preserved FDT.
A KHO user thus can create a node in KHO root tree and save the physical address
of its own FDT in that node's property ``fdt`` .
Examples
========
The following example demonstrates KHO FDT that preserves two memory
regions created with ``reserve_mem`` kernel command line parameter::
/dts-v1/;
/ {
compatible = "kho-v1";
preserved-memory-map = <0x40be16 0x1000000>;
memblock {
fdt = <0x1517 0x1000000>;
};
};
where the ``memblock`` node contains an FDT that is requested by the
subsystem memblock for preservation. The FDT contains the following
serialized data::
/dts-v1/;
/ {
compatible = "memblock-v1";
n1 {
compatible = "reserve-mem-v1";
start = <0xc06b 0x4000000>;
size = <0x04 0x00>;
};
n2 {
compatible = "reserve-mem-v1";
start = <0xc067 0x4000000>;
size = <0x04 0x00>;
};
};

View File

@@ -0,0 +1,13 @@
.. SPDX-License-Identifier: GPL-2.0-or-later
========================
Kexec Handover Subsystem
========================
.. toctree::
:maxdepth: 1
concepts
fdt
.. only:: subproject and html

View File

@@ -13145,6 +13145,8 @@ M: Mike Rapoport <rppt@kernel.org>
M: Changyuan Lyu <changyuanl@google.com>
L: kexec@lists.infradead.org
S: Maintained
F: Documentation/admin-guide/mm/kho.rst
F: Documentation/core-api/kho/*
F: include/linux/kexec_handover.h
F: kernel/kexec_handover.c