mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-05-16 04:21:09 -04:00
The threat-model document says that only users with CAP_SYS_ADMIN can carry out a number of admin-level tasks, but there are numerous capabilities that can confer that sort of power. Generalize the text slightly to make it clear that CAP_SYS_ADMIN is not the only all-powerful capability. Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
236 lines
11 KiB
ReStructuredText
236 lines
11 KiB
ReStructuredText
The Linux Kernel threat model
|
|
=============================
|
|
|
|
There are a lot of assumptions regarding what the kernel does and does not
|
|
protect against. These assumptions tend to cause confusion for bug reports
|
|
(:doc:`security-related ones <security-bugs>` vs :doc:`non-security ones
|
|
<../admin-guide/reporting-issues>`), and can complicate security enforcement
|
|
when the responsibilities for some boundaries is not clear between the kernel,
|
|
distros, administrators and users.
|
|
|
|
This document tries to clarify the responsibilities of the kernel in this
|
|
domain.
|
|
|
|
The kernel's responsibilities
|
|
-----------------------------
|
|
|
|
The kernel abstracts access to local hardware resources and to remote systems
|
|
in a way that allows multiple local users to get a fair share of the available
|
|
resources granted to them, and, when the underlying hardware permits, to assign
|
|
a level of confidentiality to their communications and to the data they are
|
|
processing or storing.
|
|
|
|
The kernel assumes that the underlying hardware behaves according to its
|
|
specifications. This includes the integrity of the CPU's instruction set, the
|
|
transparency of the branch prediction unit and the cache units, the consistency
|
|
of the Memory Management Unit (MMU), the isolation of DMA-capable peripherals
|
|
(e.g., via IOMMU), state transitions in controllers, ranges of values read from
|
|
registers, the respect of documented hardware limitations, etc.
|
|
|
|
When hardware fails to maintain its specified isolation (e.g., CPU bugs,
|
|
side-channels, hardware response to unexpected inputs), the kernel will usually
|
|
attempt to implement reasonable mitigations. These are best-effort measures
|
|
intended to reduce the attack surface or elevate the cost of an attack within
|
|
the limits of the hardware's facilities; they do not constitute a
|
|
kernel-provided safety guarantee.
|
|
|
|
Users always perform their activities under the authority of an administrator
|
|
who is able to grant or deny various types of permissions that may affect how
|
|
users benefit from available resources, or the level of confidentiality of
|
|
their activities. Administrators may also delegate all or part of their own
|
|
permissions to some users, particularly via capabilities but not only. All this
|
|
is performed via configuration (sysctl, file-system permissions etc).
|
|
|
|
The Linux Kernel applies a certain collection of default settings that match
|
|
its threat model. Distros have their own threat model and will come with their
|
|
own configuration presets, that the administrator may have to adjust to better
|
|
suit their expectations (relax or restrict).
|
|
|
|
By default, the Linux Kernel guarantees the following protections when running
|
|
on common processors featuring privilege levels and memory management units:
|
|
|
|
* **User-based isolation**: an unprivileged user may restrict access to their
|
|
own data from other unprivileged users running on the same system. This
|
|
includes:
|
|
|
|
* stored data, via file system permissions
|
|
* in-memory data (pages are not accessible by default to other users)
|
|
* process activity (ptrace is not permitted to other users)
|
|
* inter-process communication (other users may not observe data exchanged via
|
|
UNIX domain sockets or other IPC mechanisms).
|
|
* network communications within the same or with other systems
|
|
|
|
* **Capability-based protection**:
|
|
|
|
* users not having elevated capabilities (including but not limited to
|
|
CAP_SYS_ADMIN) may not alter the
|
|
kernel's configuration, memory nor state, change other users' view of the
|
|
file system layout, grant any user capabilities they do not have, nor
|
|
affect the system's availability (shutdown, reboot, panic, hang, or making
|
|
the system unresponsive via unbounded resource exhaustion).
|
|
* users not having the ``CAP_NET_ADMIN`` capability may not alter the network
|
|
configuration, intercept nor spoof network communications from other users
|
|
nor systems.
|
|
* users not having ``CAP_SYS_PTRACE`` may not observe other users' processes
|
|
activities.
|
|
|
|
When ``CONFIG_USER_NS`` is set, the kernel also permits unprivileged users to
|
|
create their own user namespace in which they have all capabilities, but with a
|
|
number of restrictions (they may not perform actions that have impacts on the
|
|
initial user namespace, such as changing time, loading modules or mounting
|
|
block devices). Please refer to ``user_namespaces(7)`` for more details, the
|
|
possibilities of user namespaces are not covered in this document.
|
|
|
|
The kernel also offers a lot of troubleshooting and debugging facilities, which
|
|
can constitute attack vectors when placed in wrong hands. While some of them
|
|
are designed to be accessible to regular local users with a low risk (e.g.
|
|
kernel logs via ``/proc/kmsg``), some would expose enough information to
|
|
represent a risk in most places and the decision to expose them is under the
|
|
administrator's responsibility (perf events, traces), and others are not
|
|
designed to be accessed by non-privileged users (e.g. debugfs). Access to these
|
|
facilities by a user who has been explicitly granted permission by an
|
|
administrator does not constitute a security breach.
|
|
|
|
Bugs that permit to violate the principles above constitute security breaches.
|
|
However, bugs that permit one violation only once another one was already
|
|
achieved are only weaknesses. The kernel applies a number of self-protection
|
|
measures whose purpose is to avoid crossing a security boundary when certain
|
|
classes of bugs are found, but a failure of these extra protections do not
|
|
constitute a vulnerability alone.
|
|
|
|
What does not constitute a security bug
|
|
---------------------------------------
|
|
|
|
In the Linux kernel's threat model, the following classes of problems are
|
|
**NOT** considered as Linux Kernel security bugs. However, when it is believed
|
|
that the kernel could do better, they should be reported, so that they can be
|
|
reviewed and fixed where reasonably possible, but they will be handled as any
|
|
regular bug:
|
|
|
|
* **Configuration**:
|
|
|
|
* outdated kernels and particularly end-of-life branches are out of the scope
|
|
of the kernel's threat model: administrators are responsible for keeping
|
|
their system up to date. For a bug to qualify as a security bug, it must be
|
|
demonstrated that it affects actively maintained versions.
|
|
|
|
* build-level: changes to the kernel configuration that are explicitly
|
|
documented as lowering the security level (e.g. ``CONFIG_NOMMU``), or
|
|
targeted at developers only.
|
|
|
|
* OS-level: changes to command line parameters, sysctls, filesystem
|
|
permissions, user capabilities, exposure of privileged interfaces, that
|
|
explicitly increase exposure by either offering non-default access to
|
|
unprivileged users, or reduce the kernel's ability to enforce some
|
|
protections or mitigations. Example: write access to procfs or debugfs.
|
|
|
|
* issues triggered only when using features intended for development or
|
|
debugging (e.g., LOCKDEP, KASAN, FAULT_INJECTION): these features are known
|
|
to introduce overhead and potential instability and are not intended for
|
|
production use.
|
|
|
|
* issues affecting drivers exposed under CONFIG_STAGING, as well as features
|
|
marked EXPERIMENTAL in the configuration.
|
|
|
|
* loading of explicitly insecure/broken/staging modules, and generally any
|
|
using any subsystem marked as experimental or not intended for production
|
|
use.
|
|
|
|
* running out-of-tree modules or unofficial kernel forks; these should be
|
|
reported to the relevant vendor.
|
|
|
|
* **Excess of initial privileges**:
|
|
|
|
* actions performed by a user already possessing the privileges required to
|
|
perform that action or modify that state (e.g. ``CAP_SYS_ADMIN``,
|
|
``CAP_NET_ADMIN``, ``CAP_SYS_RAWIO``, ``CAP_SYS_MODULE`` with no further
|
|
boundary being crossed).
|
|
|
|
* actions performed in user namespace that do not bypass the restrictions
|
|
imposed to the initial user (e.g. ptrace usage, signal delivery, resource
|
|
usage, access to FS/device/sysctl/memory, network binding, system/network
|
|
configuration etc).
|
|
|
|
* anything performed by the root user in the initial namespace (e.g. kernel
|
|
oops when writing to a privileged device).
|
|
|
|
* **Out of production use**:
|
|
|
|
This covers theoretical/probabilistic attacks that rely on laboratory
|
|
conditions with zero system noise, or those requiring an unrealistic number
|
|
of attempts (e.g., billions of trials) that would be detected by standard
|
|
system monitoring long before success, such as:
|
|
|
|
* prediction of random numbers that only works in a totally silent
|
|
environment (such as IP ID, TCP ports or sequence numbers that can only be
|
|
guessed in a lab).
|
|
|
|
* activity observation and information leaks based on probabilistic
|
|
approaches that are prone to measurement noise and not realistically
|
|
reproducible on a production system.
|
|
|
|
* issues that can only be triggered by heavy attacks (e.g. brute force) whose
|
|
impact on the system makes it unlikely or impossible to remain undetected
|
|
before they succeed (e.g. consuming all memory before succeeding).
|
|
|
|
* problems seen only under development simulators, emulators, or combinations
|
|
that do not exist on real systems at the time of reporting (issues
|
|
involving tens of millions of threads, tens of thousands of CPUs,
|
|
unrealistic CPU frequencies, RAM sizes or disk capacities, network speeds.
|
|
|
|
* issues whose reproduction requires hardware modification or emulation,
|
|
including fake USB devices that pretend to be another one.
|
|
|
|
* as well as issues that can be triggered at a cost that is orders of
|
|
magnitude higher than the expected benefits (e.g. fully functional keyboard
|
|
emulator only to retrieve 7 uninitialized bytes in a structure, or
|
|
brute-force method involving millions of connection attempts to guess a
|
|
port number).
|
|
|
|
* **Hardening failures**:
|
|
|
|
* ability to bypass some of the kernel's hardening measures with no
|
|
demonstrable exploit path (e.g. ASLR bypass, events timing or probing with
|
|
no demonstrable consequence). These are just weaknesses, not
|
|
vulnerabilities.
|
|
|
|
* missing argument checks and failure to report certain errors with no
|
|
immediate consequence.
|
|
|
|
* **Random information leaks**:
|
|
|
|
This concerns information leaks of small data parts that happen to be there
|
|
and that cannot be chosen by the attacker, or face access restrictions:
|
|
|
|
* structure padding reported by syscalls or other interfaces.
|
|
|
|
* identifiers, partial data, non-terminated strings reported in error
|
|
messages.
|
|
|
|
* Leaks of kernel memory addresses/pointers do not constitute an immediately
|
|
exploitable vector and are not security bugs, though they must be reported
|
|
and fixed.
|
|
|
|
* **Crafted file system images**:
|
|
|
|
* bugs triggered by mounting a corrupted or maliciously crafted file system
|
|
image are generally not security bugs, as the kernel assumes the underlying
|
|
storage media is under the administrator's control, unless the filesystem
|
|
driver is specifically documented as being hardened against untrusted media.
|
|
|
|
* issues that are resolved, mitigated, or detected by running a filesystem
|
|
consistency check (fsck) on the image prior to mounting.
|
|
|
|
* **Physical access**:
|
|
|
|
Issues that require physical access to the machine, hardware modification, or
|
|
the use of specialized hardware (e.g., logic analyzers, DMA-attack tools over
|
|
PCI-E/Thunderbolt) are out of scope unless the system is explicitly
|
|
configured with technologies meant to defend against such attacks
|
|
(e.g. IOMMU).
|
|
|
|
* **Functional and performance regressions**:
|
|
|
|
Any issue that can be mitigated by setting proper permissions and limits
|
|
doesn't qualify as a security bug.
|