mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-12-27 12:21:22 -05:00
Documentation: add documentation for KHO
With KHO in place, let's add documentation that describes what it is and how to use it. Link: https://lkml.kernel.org/r/20250509074635.3187114-17-changyuanl@google.com Signed-off-by: Alexander Graf <graf@amazon.com> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Co-developed-by: Changyuan Lyu <changyuanl@google.com> Signed-off-by: Changyuan Lyu <changyuanl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anthony Yznaga <anthony.yznaga@oracle.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: Ben Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Gowans <jgowans@amazon.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pratyush Yadav <ptyadav@amazon.de> Cc: Rob Herring <robh@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
committed by
Andrew Morton
parent
f992307802
commit
3498209ff6
@@ -2725,6 +2725,31 @@
|
||||
kgdbwait [KGDB,EARLY] Stop kernel execution and enter the
|
||||
kernel debugger at the earliest opportunity.
|
||||
|
||||
kho= [KEXEC,EARLY]
|
||||
Format: { "0" | "1" | "off" | "on" | "y" | "n" }
|
||||
Enables or disables Kexec HandOver.
|
||||
"0" | "off" | "n" - kexec handover is disabled
|
||||
"1" | "on" | "y" - kexec handover is enabled
|
||||
|
||||
kho_scratch= [KEXEC,EARLY]
|
||||
Format: ll[KMG],mm[KMG],nn[KMG] | nn%
|
||||
Defines the size of the KHO scratch region. The KHO
|
||||
scratch regions are physically contiguous memory
|
||||
ranges that can only be used for non-kernel
|
||||
allocations. That way, even when memory is heavily
|
||||
fragmented with handed over memory, the kexeced
|
||||
kernel will always have enough contiguous ranges to
|
||||
bootstrap itself.
|
||||
|
||||
It is possible to specify the exact amount of
|
||||
memory in the form of "ll[KMG],mm[KMG],nn[KMG]"
|
||||
where the first parameter defines the size of a low
|
||||
memory scratch area, the second parameter defines
|
||||
the size of a global scratch area and the third
|
||||
parameter defines the size of additional per-node
|
||||
scratch areas. The form "nn%" defines scale factor
|
||||
(in percents) of memory that was used during boot.
|
||||
|
||||
kmac= [MIPS] Korina ethernet MAC address.
|
||||
Configure the RouterBoard 532 series on-chip
|
||||
Ethernet adapter MAC address.
|
||||
|
||||
@@ -42,3 +42,4 @@ the Linux memory management.
|
||||
transhuge
|
||||
userfaultfd
|
||||
zswap
|
||||
kho
|
||||
|
||||
115
Documentation/admin-guide/mm/kho.rst
Normal file
115
Documentation/admin-guide/mm/kho.rst
Normal file
@@ -0,0 +1,115 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-or-later
|
||||
|
||||
====================
|
||||
Kexec Handover Usage
|
||||
====================
|
||||
|
||||
Kexec HandOver (KHO) is a mechanism that allows Linux to preserve memory
|
||||
regions, which could contain serialized system states, across kexec.
|
||||
|
||||
This document expects that you are familiar with the base KHO
|
||||
:ref:`concepts <kho-concepts>`. If you have not read
|
||||
them yet, please do so now.
|
||||
|
||||
Prerequisites
|
||||
=============
|
||||
|
||||
KHO is available when the kernel is compiled with ``CONFIG_KEXEC_HANDOVER``
|
||||
set to y. Every KHO producer may have its own config option that you
|
||||
need to enable if you would like to preserve their respective state across
|
||||
kexec.
|
||||
|
||||
To use KHO, please boot the kernel with the ``kho=on`` command line
|
||||
parameter. You may use ``kho_scratch`` parameter to define size of the
|
||||
scratch regions. For example ``kho_scratch=16M,512M,256M`` will reserve a
|
||||
16 MiB low memory scratch area, a 512 MiB global scratch region, and 256 MiB
|
||||
per NUMA node scratch regions on boot.
|
||||
|
||||
Perform a KHO kexec
|
||||
===================
|
||||
|
||||
First, before you perform a KHO kexec, you need to move the system into
|
||||
the :ref:`KHO finalization phase <kho-finalization-phase>` ::
|
||||
|
||||
$ echo 1 > /sys/kernel/debug/kho/out/finalize
|
||||
|
||||
After this command, the KHO FDT is available in
|
||||
``/sys/kernel/debug/kho/out/fdt``. Other subsystems may also register
|
||||
their own preserved sub FDTs under
|
||||
``/sys/kernel/debug/kho/out/sub_fdts/``.
|
||||
|
||||
Next, load the target payload and kexec into it. It is important that you
|
||||
use the ``-s`` parameter to use the in-kernel kexec file loader, as user
|
||||
space kexec tooling currently has no support for KHO with the user space
|
||||
based file loader ::
|
||||
|
||||
# kexec -l /path/to/bzImage --initrd /path/to/initrd -s
|
||||
# kexec -e
|
||||
|
||||
The new kernel will boot up and contain some of the previous kernel's state.
|
||||
|
||||
For example, if you used ``reserve_mem`` command line parameter to create
|
||||
an early memory reservation, the new kernel will have that memory at the
|
||||
same physical address as the old kernel.
|
||||
|
||||
Abort a KHO exec
|
||||
================
|
||||
|
||||
You can move the system out of KHO finalization phase again by calling ::
|
||||
|
||||
$ echo 0 > /sys/kernel/debug/kho/out/active
|
||||
|
||||
After this command, the KHO FDT is no longer available in
|
||||
``/sys/kernel/debug/kho/out/fdt``.
|
||||
|
||||
debugfs Interfaces
|
||||
==================
|
||||
|
||||
Currently KHO creates the following debugfs interfaces. Notice that these
|
||||
interfaces may change in the future. They will be moved to sysfs once KHO is
|
||||
stabilized.
|
||||
|
||||
``/sys/kernel/debug/kho/out/finalize``
|
||||
Kexec HandOver (KHO) allows Linux to transition the state of
|
||||
compatible drivers into the next kexec'ed kernel. To do so,
|
||||
device drivers will instruct KHO to preserve memory regions,
|
||||
which could contain serialized kernel state.
|
||||
While the state is serialized, they are unable to perform
|
||||
any modifications to state that was serialized, such as
|
||||
handed over memory allocations.
|
||||
|
||||
When this file contains "1", the system is in the transition
|
||||
state. When contains "0", it is not. To switch between the
|
||||
two states, echo the respective number into this file.
|
||||
|
||||
``/sys/kernel/debug/kho/out/fdt``
|
||||
When KHO state tree is finalized, the kernel exposes the
|
||||
flattened device tree blob that carries its current KHO
|
||||
state in this file. Kexec user space tooling can use this
|
||||
as input file for the KHO payload image.
|
||||
|
||||
``/sys/kernel/debug/kho/out/scratch_len``
|
||||
Lengths of KHO scratch regions, which are physically contiguous
|
||||
memory regions that will always stay available for future kexec
|
||||
allocations. Kexec user space tools can use this file to determine
|
||||
where it should place its payload images.
|
||||
|
||||
``/sys/kernel/debug/kho/out/scratch_phys``
|
||||
Physical locations of KHO scratch regions. Kexec user space tools
|
||||
can use this file in conjunction to scratch_phys to determine where
|
||||
it should place its payload images.
|
||||
|
||||
``/sys/kernel/debug/kho/out/sub_fdts/``
|
||||
In the KHO finalization phase, KHO producers register their own
|
||||
FDT blob under this directory.
|
||||
|
||||
``/sys/kernel/debug/kho/in/fdt``
|
||||
When the kernel was booted with Kexec HandOver (KHO),
|
||||
the state tree that carries metadata about the previous
|
||||
kernel's state is in this file in the format of flattened
|
||||
device tree. This file may disappear when all consumers of
|
||||
it finished to interpret their metadata.
|
||||
|
||||
``/sys/kernel/debug/kho/in/sub_fdts/``
|
||||
Similar to ``kho/out/sub_fdts/``, but contains sub FDT blobs
|
||||
of KHO producers passed from the old kernel.
|
||||
@@ -115,6 +115,7 @@ more memory-management documentation in Documentation/mm/index.rst.
|
||||
pin_user_pages
|
||||
boot-time-mm
|
||||
gfp_mask-from-fs-io
|
||||
kho/index
|
||||
|
||||
Interfaces for kernel debugging
|
||||
===============================
|
||||
|
||||
43
Documentation/core-api/kho/bindings/kho.yaml
Normal file
43
Documentation/core-api/kho/bindings/kho.yaml
Normal file
@@ -0,0 +1,43 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
title: Kexec HandOver (KHO) root tree
|
||||
|
||||
maintainers:
|
||||
- Mike Rapoport <rppt@kernel.org>
|
||||
- Changyuan Lyu <changyuanl@google.com>
|
||||
|
||||
description: |
|
||||
System memory preserved by KHO across kexec.
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
enum:
|
||||
- kho-v1
|
||||
|
||||
preserved-memory-map:
|
||||
description: |
|
||||
physical address (u64) of an in-memory structure describing all preserved
|
||||
folios and memory ranges.
|
||||
|
||||
patternProperties:
|
||||
"$[0-9a-f_]+^":
|
||||
$ref: sub-fdt.yaml#
|
||||
description: physical address of a KHO user's own FDT.
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- preserved-memory-map
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
kho {
|
||||
compatible = "kho-v1";
|
||||
preserved-memory-map = <0xf0be16 0x1000000>;
|
||||
|
||||
memblock {
|
||||
fdt = <0x80cc16 0x1000000>;
|
||||
};
|
||||
};
|
||||
27
Documentation/core-api/kho/bindings/sub-fdt.yaml
Normal file
27
Documentation/core-api/kho/bindings/sub-fdt.yaml
Normal file
@@ -0,0 +1,27 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
title: KHO users' FDT address
|
||||
|
||||
maintainers:
|
||||
- Mike Rapoport <rppt@kernel.org>
|
||||
- Changyuan Lyu <changyuanl@google.com>
|
||||
|
||||
description: |
|
||||
Physical address of an FDT blob registered by a KHO user.
|
||||
|
||||
properties:
|
||||
fdt:
|
||||
description: |
|
||||
physical address (u64) of an FDT blob.
|
||||
|
||||
required:
|
||||
- fdt
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
memblock {
|
||||
fdt = <0x80cc16 0x1000000>;
|
||||
};
|
||||
74
Documentation/core-api/kho/concepts.rst
Normal file
74
Documentation/core-api/kho/concepts.rst
Normal file
@@ -0,0 +1,74 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-or-later
|
||||
.. _kho-concepts:
|
||||
|
||||
=======================
|
||||
Kexec Handover Concepts
|
||||
=======================
|
||||
|
||||
Kexec HandOver (KHO) is a mechanism that allows Linux to preserve memory
|
||||
regions, which could contain serialized system states, across kexec.
|
||||
|
||||
It introduces multiple concepts:
|
||||
|
||||
KHO FDT
|
||||
=======
|
||||
|
||||
Every KHO kexec carries a KHO specific flattened device tree (FDT) blob
|
||||
that describes preserved memory regions. These regions contain either
|
||||
serialized subsystem states, or in-memory data that shall not be touched
|
||||
across kexec. After KHO, subsystems can retrieve and restore preserved
|
||||
memory regions from KHO FDT.
|
||||
|
||||
KHO only uses the FDT container format and libfdt library, but does not
|
||||
adhere to the same property semantics that normal device trees do: Properties
|
||||
are passed in native endianness and standardized properties like ``regs`` and
|
||||
``ranges`` do not exist, hence there are no ``#...-cells`` properties.
|
||||
|
||||
KHO is still under development. The FDT schema is unstable and would change
|
||||
in the future.
|
||||
|
||||
Scratch Regions
|
||||
===============
|
||||
|
||||
To boot into kexec, we need to have a physically contiguous memory range that
|
||||
contains no handed over memory. Kexec then places the target kernel and initrd
|
||||
into that region. The new kernel exclusively uses this region for memory
|
||||
allocations before during boot up to the initialization of the page allocator.
|
||||
|
||||
We guarantee that we always have such regions through the scratch regions: On
|
||||
first boot KHO allocates several physically contiguous memory regions. Since
|
||||
after kexec these regions will be used by early memory allocations, there is a
|
||||
scratch region per NUMA node plus a scratch region to satisfy allocations
|
||||
requests that do not require particular NUMA node assignment.
|
||||
By default, size of the scratch region is calculated based on amount of memory
|
||||
allocated during boot. The ``kho_scratch`` kernel command line option may be
|
||||
used to explicitly define size of the scratch regions.
|
||||
The scratch regions are declared as CMA when page allocator is initialized so
|
||||
that their memory can be used during system lifetime. CMA gives us the
|
||||
guarantee that no handover pages land in that region, because handover pages
|
||||
must be at a static physical memory location and CMA enforces that only
|
||||
movable pages can be located inside.
|
||||
|
||||
After KHO kexec, we ignore the ``kho_scratch`` kernel command line option and
|
||||
instead reuse the exact same region that was originally allocated. This allows
|
||||
us to recursively execute any amount of KHO kexecs. Because we used this region
|
||||
for boot memory allocations and as target memory for kexec blobs, some parts
|
||||
of that memory region may be reserved. These reservations are irrelevant for
|
||||
the next KHO, because kexec can overwrite even the original kernel.
|
||||
|
||||
.. _kho-finalization-phase:
|
||||
|
||||
KHO finalization phase
|
||||
======================
|
||||
|
||||
To enable user space based kexec file loader, the kernel needs to be able to
|
||||
provide the FDT that describes the current kernel's state before
|
||||
performing the actual kexec. The process of generating that FDT is
|
||||
called serialization. When the FDT is generated, some properties
|
||||
of the system may become immutable because they are already written down
|
||||
in the FDT. That state is called the KHO finalization phase.
|
||||
|
||||
Public API
|
||||
==========
|
||||
.. kernel-doc:: kernel/kexec_handover.c
|
||||
:export:
|
||||
80
Documentation/core-api/kho/fdt.rst
Normal file
80
Documentation/core-api/kho/fdt.rst
Normal file
@@ -0,0 +1,80 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-or-later
|
||||
|
||||
=======
|
||||
KHO FDT
|
||||
=======
|
||||
|
||||
KHO uses the flattened device tree (FDT) container format and libfdt
|
||||
library to create and parse the data that is passed between the
|
||||
kernels. The properties in KHO FDT are stored in native format.
|
||||
It includes the physical address of an in-memory structure describing
|
||||
all preserved memory regions, as well as physical addresses of KHO users'
|
||||
own FDTs. Interpreting those sub FDTs is the responsibility of KHO users.
|
||||
|
||||
KHO nodes and properties
|
||||
========================
|
||||
|
||||
Property ``preserved-memory-map``
|
||||
---------------------------------
|
||||
|
||||
KHO saves a special property named ``preserved-memory-map`` under the root node.
|
||||
This node contains the physical address of an in-memory structure for KHO to
|
||||
preserve memory regions across kexec.
|
||||
|
||||
Property ``compatible``
|
||||
-----------------------
|
||||
|
||||
The ``compatible`` property determines compatibility between the kernel
|
||||
that created the KHO FDT and the kernel that attempts to load it.
|
||||
If the kernel that loads the KHO FDT is not compatible with it, the entire
|
||||
KHO process will be bypassed.
|
||||
|
||||
Property ``fdt``
|
||||
----------------
|
||||
|
||||
Generally, a KHO user serialize its state into its own FDT and instructs
|
||||
KHO to preserve the underlying memory, such that after kexec, the new kernel
|
||||
can recover its state from the preserved FDT.
|
||||
|
||||
A KHO user thus can create a node in KHO root tree and save the physical address
|
||||
of its own FDT in that node's property ``fdt`` .
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
The following example demonstrates KHO FDT that preserves two memory
|
||||
regions created with ``reserve_mem`` kernel command line parameter::
|
||||
|
||||
/dts-v1/;
|
||||
|
||||
/ {
|
||||
compatible = "kho-v1";
|
||||
|
||||
preserved-memory-map = <0x40be16 0x1000000>;
|
||||
|
||||
memblock {
|
||||
fdt = <0x1517 0x1000000>;
|
||||
};
|
||||
};
|
||||
|
||||
where the ``memblock`` node contains an FDT that is requested by the
|
||||
subsystem memblock for preservation. The FDT contains the following
|
||||
serialized data::
|
||||
|
||||
/dts-v1/;
|
||||
|
||||
/ {
|
||||
compatible = "memblock-v1";
|
||||
|
||||
n1 {
|
||||
compatible = "reserve-mem-v1";
|
||||
start = <0xc06b 0x4000000>;
|
||||
size = <0x04 0x00>;
|
||||
};
|
||||
|
||||
n2 {
|
||||
compatible = "reserve-mem-v1";
|
||||
start = <0xc067 0x4000000>;
|
||||
size = <0x04 0x00>;
|
||||
};
|
||||
};
|
||||
13
Documentation/core-api/kho/index.rst
Normal file
13
Documentation/core-api/kho/index.rst
Normal file
@@ -0,0 +1,13 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-or-later
|
||||
|
||||
========================
|
||||
Kexec Handover Subsystem
|
||||
========================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
concepts
|
||||
fdt
|
||||
|
||||
.. only:: subproject and html
|
||||
@@ -13145,6 +13145,8 @@ M: Mike Rapoport <rppt@kernel.org>
|
||||
M: Changyuan Lyu <changyuanl@google.com>
|
||||
L: kexec@lists.infradead.org
|
||||
S: Maintained
|
||||
F: Documentation/admin-guide/mm/kho.rst
|
||||
F: Documentation/core-api/kho/*
|
||||
F: include/linux/kexec_handover.h
|
||||
F: kernel/kexec_handover.c
|
||||
|
||||
|
||||
Reference in New Issue
Block a user