mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-05-16 07:51:31 -04:00
doc: watchdog: document buddy detector
The current documentation generalizes the hardlockup detector as primarily NMI-perf-based and lacks details on the SMP "Buddy" detector. Update the documentation to add a detailed description of the Buddy detector, and also restructure the "Implementation" section to explicitly separate "Softlockup Detector", "Hardlockup Detector (NMI/Perf)", and "Hardlockup Detector (Buddy)". Clarify that the softlockup hrtimer acts as the heartbeat generator for both hardlockup mechanisms and centralize the configuration details in a "Frequency and Heartbeats" section. Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-5-45bd8a0cc7ed@google.com Signed-off-by: Mayank Rungta <mrungta@google.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Stephane Erainan <eranian@google.com> Cc: Wang Jinchao <wangjinchao600@gmail.com> Cc: Yunhui Cui <cuiyunhui@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
committed by
Andrew Morton
parent
077ba03600
commit
cb8615f3cb
@@ -30,22 +30,23 @@ timeout is set through the confusingly named "kernel.panic" sysctl),
|
||||
to cause the system to reboot automatically after a specified amount
|
||||
of time.
|
||||
|
||||
Configuration
|
||||
=============
|
||||
|
||||
A kernel knob is provided that allows administrators to configure
|
||||
this period. The "watchdog_thresh" parameter (default 10 seconds)
|
||||
controls the threshold. The right value for a particular environment
|
||||
is a trade-off between fast response to lockups and detection overhead.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
The soft and hard lockup detectors are built on top of the hrtimer and
|
||||
perf subsystems, respectively. A direct consequence of this is that,
|
||||
in principle, they should work in any architecture where these
|
||||
subsystems are present.
|
||||
The soft lockup detector is built on top of the hrtimer subsystem.
|
||||
The hard lockup detector is built on top of the perf subsystem
|
||||
(on architectures that support it) or uses an SMP "buddy" system.
|
||||
|
||||
A periodic hrtimer runs to generate interrupts and kick the watchdog
|
||||
job. An NMI perf event is generated every "watchdog_thresh"
|
||||
(compile-time initialized to 10 and configurable through sysctl of the
|
||||
same name) seconds to check for hardlockups. If any CPU in the system
|
||||
does not receive any hrtimer interrupt during that time the
|
||||
'hardlockup detector' (the handler for the NMI perf event) will
|
||||
generate a kernel warning or call panic, depending on the
|
||||
configuration.
|
||||
Softlockup Detector
|
||||
-------------------
|
||||
|
||||
The watchdog job runs in a stop scheduling thread that updates a
|
||||
timestamp every time it is scheduled. If that timestamp is not updated
|
||||
@@ -55,53 +56,105 @@ will dump useful debug information to the system log, after which it
|
||||
will call panic if it was instructed to do so or resume execution of
|
||||
other kernel code.
|
||||
|
||||
The period of the hrtimer is 2*watchdog_thresh/5, which means it has
|
||||
two or three chances to generate an interrupt before the hardlockup
|
||||
detector kicks in.
|
||||
Frequency and Heartbeats
|
||||
------------------------
|
||||
|
||||
As explained above, a kernel knob is provided that allows
|
||||
administrators to configure the period of the hrtimer and the perf
|
||||
event. The right value for a particular environment is a trade-off
|
||||
between fast response to lockups and detection overhead.
|
||||
The hrtimer used by the softlockup detector serves a dual purpose:
|
||||
it detects softlockups, and it also generates the interrupts
|
||||
(heartbeats) that the hardlockup detectors use to verify CPU liveness.
|
||||
|
||||
Detection Overhead
|
||||
------------------
|
||||
The period of this hrtimer is 2*watchdog_thresh/5. This means the
|
||||
hrtimer has two or three chances to generate an interrupt before the
|
||||
NMI hardlockup detector kicks in.
|
||||
|
||||
The hardlockup detector checks for lockups using a periodic NMI perf
|
||||
event. This means the time to detect a lockup can vary depending on
|
||||
when the lockup occurs relative to the NMI check window.
|
||||
Hardlockup Detector (NMI/Perf)
|
||||
------------------------------
|
||||
|
||||
**Best Case:**
|
||||
In the best case scenario, the lockup occurs just before the first
|
||||
heartbeat is due. The detector will notice the missing hrtimer
|
||||
interrupt almost immediately during the next check.
|
||||
On architectures that support NMI (Non-Maskable Interrupt) perf events,
|
||||
a periodic NMI is generated every "watchdog_thresh" seconds.
|
||||
|
||||
::
|
||||
If any CPU in the system does not receive any hrtimer interrupt
|
||||
(heartbeat) during the "watchdog_thresh" window, the 'hardlockup
|
||||
detector' (the handler for the NMI perf event) will generate a kernel
|
||||
warning or call panic.
|
||||
|
||||
Time 100.0: cpu 1 heartbeat
|
||||
Time 100.1: hardlockup_check, cpu1 stores its state
|
||||
Time 103.9: Hard Lockup on cpu1
|
||||
Time 104.0: cpu 1 heartbeat never comes
|
||||
Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
|
||||
**Detection Overhead (NMI):**
|
||||
|
||||
Time to detection: ~6 seconds
|
||||
The time to detect a lockup can vary depending on when the lockup
|
||||
occurs relative to the NMI check window. Examples below assume a watchdog_thresh of 10.
|
||||
|
||||
**Worst Case:**
|
||||
In the worst case scenario, the lockup occurs shortly after a valid
|
||||
interrupt (heartbeat) which itself happened just after the NMI check.
|
||||
The next NMI check sees that the interrupt count has changed (due to
|
||||
that one heartbeat), assumes the CPU is healthy, and resets the
|
||||
baseline. The lockup is only detected at the subsequent check.
|
||||
* **Best Case:** The lockup occurs just before the first heartbeat is
|
||||
due. The detector will notice the missing hrtimer interrupt almost
|
||||
immediately during the next check.
|
||||
|
||||
::
|
||||
::
|
||||
|
||||
Time 100.0: hardlockup_check, cpu1 stores its state
|
||||
Time 100.1: cpu 1 heartbeat
|
||||
Time 100.2: Hard Lockup on cpu1
|
||||
Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed)
|
||||
Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
|
||||
Time 100.0: cpu 1 heartbeat
|
||||
Time 100.1: hardlockup_check, cpu1 stores its state
|
||||
Time 103.9: Hard Lockup on cpu1
|
||||
Time 104.0: cpu 1 heartbeat never comes
|
||||
Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
|
||||
|
||||
Time to detection: ~20 seconds
|
||||
Time to detection: ~6 seconds
|
||||
|
||||
* **Worst Case:** The lockup occurs shortly after a valid interrupt
|
||||
(heartbeat) which itself happened just after the NMI check. The next
|
||||
NMI check sees that the interrupt count has changed (due to that one
|
||||
heartbeat), assumes the CPU is healthy, and resets the baseline. The
|
||||
lockup is only detected at the subsequent check.
|
||||
|
||||
::
|
||||
|
||||
Time 100.0: hardlockup_check, cpu1 stores its state
|
||||
Time 100.1: cpu 1 heartbeat
|
||||
Time 100.2: Hard Lockup on cpu1
|
||||
Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed)
|
||||
Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
|
||||
|
||||
Time to detection: ~20 seconds
|
||||
|
||||
Hardlockup Detector (Buddy)
|
||||
---------------------------
|
||||
|
||||
On architectures or configurations where NMI perf events are not
|
||||
available (or disabled), the kernel may use the "buddy" hardlockup
|
||||
detector. This mechanism requires SMP (Symmetric Multi-Processing).
|
||||
|
||||
In this mode, each CPU is assigned a "buddy" CPU to monitor. The
|
||||
monitoring CPU runs its own hrtimer (the same one used for softlockup
|
||||
detection) and checks if the buddy CPU's hrtimer interrupt count has
|
||||
increased.
|
||||
|
||||
To ensure timeliness and avoid false positives, the buddy system performs
|
||||
checks at every hrtimer interval (2*watchdog_thresh/5, which is 4 seconds
|
||||
by default). It uses a missed-interrupt threshold of 3. If the buddy's
|
||||
interrupt count has not changed for 3 consecutive checks, it is assumed
|
||||
that the buddy CPU is hardlocked (interrupts disabled). The monitoring
|
||||
CPU will then trigger the hardlockup response (warning or panic).
|
||||
|
||||
**Detection Overhead (Buddy):**
|
||||
|
||||
With a default check interval of 4 seconds (watchdog_thresh = 10):
|
||||
|
||||
* **Best case:** Lockup occurs just before a check.
|
||||
Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
|
||||
* **Worst case:** Lockup occurs just after a check.
|
||||
Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).
|
||||
|
||||
**Limitations of the Buddy Detector:**
|
||||
|
||||
1. **All-CPU Lockup:** If all CPUs lock up simultaneously, the buddy
|
||||
detector cannot detect the condition because the monitoring CPUs
|
||||
are also frozen.
|
||||
2. **Stack Traces:** Unlike the NMI detector, the buddy detector
|
||||
cannot directly interrupt the locked CPU to grab a stack trace.
|
||||
It relies on architecture-specific mechanisms (like NMI backtrace
|
||||
support) to try and retrieve the status of the locked CPU. If
|
||||
such support is missing, the log may only show that a lockup
|
||||
occurred without providing the locked CPU's stack.
|
||||
|
||||
Watchdog Core Exclusion
|
||||
=======================
|
||||
|
||||
By default, the watchdog runs on all online cores. However, on a
|
||||
kernel configured with NO_HZ_FULL, by default the watchdog runs only
|
||||
|
||||
Reference in New Issue
Block a user