Pull sysctl updates from Joel Granados:
- Move sysctls out of the kern_table array
This is the final move of ctl_tables into their respective
subsystems. Only 5 (out of the original 50) will remain in
kernel/sysctl.c file; these handle either sysctl or common arch
variables.
By decentralizing sysctl registrations, subsystem maintainers regain
control over their sysctl interfaces, improving maintainability and
reducing the likelihood of merge conflicts.
- docs: Remove false positives from check-sysctl-docs
Stopped falsely identifying sysctls as undocumented or unimplemented
in the check-sysctl-docs script. This script can now be used to
automatically identify if documentation is missing.
* tag 'sysctl-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (23 commits)
docs: Downgrade arm64 & riscv from titles to comment
docs: Replace spaces with tabs in check-sysctl-docs
docs: Remove colon from ctltable title in vm.rst
docs: Add awk section for ucount sysctl entries
docs: Use skiplist when checking sysctl admin-guide
docs: nixify check-sysctl-docs
sysctl: rename kern_table -> sysctl_subsys_table
kernel/sys.c: Move overflow{uid,gid} sysctl into kernel/sys.c
uevent: mv uevent_helper into kobject_uevent.c
sysctl: Removed unused variable
sysctl: Nixify sysctl.sh
sysctl: Remove superfluous includes from kernel/sysctl.c
sysctl: Remove (very) old file changelog
sysctl: Move sysctl_panic_on_stackoverflow to kernel/panic.c
sysctl: move cad_pid into kernel/pid.c
sysctl: Move tainted ctl_table into kernel/panic.c
Input: sysrq: mv sysrq into drivers/tty/sysrq.c
fork: mv threads-max into kernel/fork.c
parisc/power: Move soft-power into power.c
mm: move randomize_va_space into memory.c
...
Pull scheduler updates from Ingo Molnar:
"Core scheduler changes:
- Better tracking of maximum lag of tasks in presence of different
slices duration, for better handling of lag in the fair scheduler
(Vincent Guittot)
- Clean up and standardize #if/#else/#endif markers throughout the
entire scheduler code base (Ingo Molnar)
- Make SMP unconditional: build the SMP scheduler's data structures
and logic on UP kernel too, even though they are not used, to
simplify the scheduler and remove around 200 #ifdef/[#else]/#endif
blocks from the scheduler (Ingo Molnar)
- Reorganize cgroup bandwidth control interface handling for better
interfacing with sched_ext (Tejun Heo)
Balancing:
- Bump sd->max_newidle_lb_cost when newidle balance fails (Chris
Mason)
- Remove sched_domain_topology_level::flags to simplify the code
(Prateek Nayak)
- Simplify and clean up build_sched_topology() (Li Chen)
- Optimize build_sched_topology() on large machines (Li Chen)
Real-time scheduling:
- Add initial version of proxy execution: a mechanism for
mutex-owning tasks to inherit the scheduling context of higher
priority waiters.
Currently limited to a single runqueue and conditional on
CONFIG_EXPERT, and other limitations (John Stultz, Peter Zijlstra,
Valentin Schneider)
- Deadline scheduler (Juri Lelli):
- Fix dl_servers initialization order (Juri Lelli)
- Fix DL scheduler's root domain reinitialization logic (Juri
Lelli)
- Fix accounting bugs after global limits change (Juri Lelli)
- Fix scalability regression by implementing less agressive
dl_server handling (Peter Zijlstra)
PSI:
- Improve scalability by optimizing psi_group_change() cpu_clock()
usage (Peter Zijlstra)
Rust changes:
- Make Task, CondVar and PollCondVar methods inline to avoid
unnecessary function calls (Kunwu Chan, Panagiotis Foliadis)
- Add might_sleep() support for Rust code: Rust's "#[track_caller]"
mechanism is used so that Rust's might_sleep() doesn't need to be
defined as a macro (Fujita Tomonori)
- Introduce file_from_location() (Boqun Feng)
Debugging & instrumentation:
- Make clangd usable with scheduler source code files again (Peter
Zijlstra)
- tools: Add root_domains_dump.py which dumps root domains info (Juri
Lelli)
- tools: Add dl_bw_dump.py for printing bandwidth accounting info
(Juri Lelli)
Misc cleanups & fixes:
- Remove play_idle() (Feng Lee)
- Fix check_preemption_disabled() (Sebastian Andrzej Siewior)
- Do not call __put_task_struct() on RT if pi_blocked_on is set (Luis
Claudio R. Goncalves)
- Correct the comment in place_entity() (wang wei)"
* tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits)
sched/idle: Remove play_idle()
sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
sched: Start blocked_on chain processing in find_proxy_task()
sched: Fix proxy/current (push,pull)ability
sched: Add an initial sketch of the find_proxy_task() function
sched: Fix runtime accounting w/ split exec & sched contexts
sched: Move update_curr_task logic into update_curr_se
locking/mutex: Add p->blocked_on wrappers for correctness checks
locking/mutex: Rework task_struct::blocked_on
sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
sched/topology: Remove sched_domain_topology_level::flags
x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
x86/smpboot: moves x86_topology to static initialize and truncate
x86/smpboot: remove redundant CONFIG_SCHED_SMT
smpboot: introduce SDTL_INIT() helper to tidy sched topology setup
tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info
tools/sched: Add root_domains_dump.py which dumps root domains info
sched/deadline: Fix accounting after global limits change
sched/deadline: Reset extra_bw to max_bw when clearing root domains
sched/deadline: Initialize dl_servers after SMP
...
Pull x86 CPU mitigation updates from Borislav Petkov:
- Untangle the Retbleed from the ITS mitigation on Intel. Allow for ITS
to enable stuffing independently from Retbleed, do some cleanups to
simplify and streamline the code
- Simplify SRSO and make mitigation types selection more versatile
depending on the Retbleed mitigation selection. Simplify code some
- Add the second part of the attack vector controls which provide a lot
friendlier user interface to the speculation mitigations than
selecting each one by one as it is now.
Instead, the selection of whole attack vectors which are relevant to
the system in use can be done and protection against only those
vectors is enabled, thus giving back some performance to the users
* tag 'x86_bugs_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86/bugs: Print enabled attack vectors
x86/bugs: Add attack vector controls for TSA
x86/pti: Add attack vector controls for PTI
x86/bugs: Add attack vector controls for ITS
x86/bugs: Add attack vector controls for SRSO
x86/bugs: Add attack vector controls for L1TF
x86/bugs: Add attack vector controls for spectre_v2
x86/bugs: Add attack vector controls for BHI
x86/bugs: Add attack vector controls for spectre_v2_user
x86/bugs: Add attack vector controls for retbleed
x86/bugs: Add attack vector controls for spectre_v1
x86/bugs: Add attack vector controls for GDS
x86/bugs: Add attack vector controls for SRBDS
x86/bugs: Add attack vector controls for RFDS
x86/bugs: Add attack vector controls for MMIO
x86/bugs: Add attack vector controls for TAA
x86/bugs: Add attack vector controls for MDS
x86/bugs: Define attack vectors relevant for each bug
x86/Kconfig: Add arch attack vector support
cpu: Define attack vectors
...
Pull generic entry code updates from Thomas Gleixner:
- Split the code into syscall and exception/interrupt parts to ease the
conversion of ARM[64] to the generic entry infrastructure
- Extend syscall user dispatching to support a single intercepted range
instead of the default single non-intercepted range. That allows
monitoring/analysis of a specific executable range, e.g. a library,
and also provides flexibility for sandboxing scenarios
- Cleanup and extend the user dispatch selftest
* tag 'core-entry-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Split generic entry into generic exception and syscall entry
selftests: Add tests for PR_SYS_DISPATCH_INCLUSIVE_ON
syscall_user_dispatch: Add PR_SYS_DISPATCH_INCLUSIVE_ON
selftests: Fix errno checking in syscall_user_dispatch test
Pull interrupt chip driver updates from Thomas Gleixner:
- Add support of forced affinity setting to yet offline CPUs for the
MIPS-GIC to ensure that the affinity of per CPU interrupts can be set
during the early bringup phase of a secondary CPU in the hotplug code
before the CPU is set online and interrupts are enabled
- Add support for the MIPS (RISC-V !?!?) P8700 SoC in the ACLINT_SSWI
interrupt chip
- Make the interrupt routing to RISV-V harts specification compliant so
it supports arbitrary hart indices
- Add a command line parameter and related handling to disable the
generic RISCV IMSIC mechanism on platforms which use a trap-emulated
IMSIC. Unfortunatly this is required because there is no mechanism
available to discover this programatically.
- Enable wakeup sources on the Renesas RZV2H driver
- Convert interrupt chip drivers, which use a open coded variant of
msi_create_parent_irq_domain() to use the new functionality
- Convert interrupt chip drivers, which use the old style two level
implementation of MSI support over to the MSI parent mechanism to
prepare for removing at least one of the three PCI/MSI backend
variants.
- The usual cleanups and improvements all over the place
* tag 'irq-drivers-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
irqchip/renesas-irqc: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
irqchip/renesas-intc-irqpin: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
irqchip/riscv-imsic: Add kernel parameter to disable IPIs
irqchip/gic-v3: Fix GICD_CTLR register naming
irqchip/ls-scfg-msi: Fix NULL dereference in error handling
irqchip/ls-scfg-msi: Switch to use msi_create_parent_irq_domain()
irqchip/armada-370-xp: Switch to msi_create_parent_irq_domain()
irqchip/alpine-msi: Switch to msi_create_parent_irq_domain()
irqchip/alpine-msi: Convert to __free
irqchip/alpine-msi: Convert to lock guards
irqchip/alpine-msi: Clean up whitespace style
irqchip/sg2042-msi: Switch to msi_create_parent_irq_domain()
irqchip/loongson-pch-msi.c: Switch to msi_create_parent_irq_domain()
irqchip/imx-mu-msi: Convert to msi_create_parent_irq_domain() helper
irqchip/riscv-imsic: Convert to msi_create_parent_irq_domain() helper
irqchip/bcm2712-mip: Switch to msi_create_parent_irq_domain()
irqdomain: Add device pointer to irq_domain_info and msi_domain_info
irqchip/renesas-rzv2h: Remove unneeded includes
irqchip/renesas-rzv2h: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
irqchip/aslint-sswi: Resolve hart index
...
Pull gpio updates from Bartosz Golaszewski:
"There's one new driver (Apple SMC) and extensions to existing drivers
for supporting new HW models. A lot of different impovements across
drivers and in core GPIO code. Details on that are in the signed tag
as usual.
We managed to remove some of the legacy APIs. Arnd Bergmann started to
work on making the legacy bits optional so that we may compile them
only for older platforms that still really need them.
Rob Herring has done a lot of work to convert legacy .txt dt-bindings
for GPIO controllers to YAML. There are only a few left now in the
GPIO tree.
A big part of the commits in this PR concern the conversion of GPIO
drivers to using the new line value setter callbacks. This conversion
is now complete treewide (unless I've missed something) and once all
the changes from different trees land in mainline, I'll send you
another PR containing a commit dropping the legacy callbacks from the
tree.
As the quest to pay back technical dept never really ends, we're
starting another set of interface conversions, this time it's about
moving fields specific to only a handful of drivers using the
gpio-mmio helper out of the core gpio_chip structure that every
controller implements and uses. This cycle we introduce a new set of
APIs and convert a few drivers under drivers/gpio/, next cycle we'll
convert remaining modules treewide (in gpio, pinctrl and mfd trees)
and finally remove the old interfaces and move the gpio-mmio fields
into their own structure wrapping gpio_chip.
One last change I should mention here is the rework of the sysfs
interface. In 2016, we introduced the GPIO character device as the
preferred alternative to the sysfs class under /sys/class/gpio. While
it has seen a wide adoption with the help of its user-space
counterpart - libgpiod - there are still users who prefer the
simplicity of sysfs.
As far as the GPIO subsystem is concerned, the problem is not the
existince of the GPIO class as such but rather the fact that it
exposes the global GPIO numbers to the user-space, stopping us from
ever being able to remove the numberspace from the kernel. To that
end, this release we introduced a parallel, limited sysfs interface
that doesn't expose these numbers and only implements a subset of
features that are relevant to the existing users. This is a result of
several discussions over the course of last year and should allow us
to remove the legacy part some time in the future.
Summary:
GPIOLIB core:
- introduce a parallel, limited sysfs user ABI that doesn't expose
the global GPIO numbers to user-space while maintaining backward
compatibility with the end goal of it completely replacing the
existing interface, allowing us to remove it
- remove the legacy devm_gpio_request() routine which has no more
users
- start the process of allowing to compile-out the legacy parts of
the GPIO core for users who don't need it by introducing a new
Kconfig option: GPIOLIB_LEGACY
- don't use global GPIO numbers in debugfs output from the core code
(drivers still do it, the work is ongoing)
- start the process of moving the fields specific to the gpio-mmio
helper out of the core struct gpio_chip into their own structure
that wraps it: create a new header with modern interfaces and
convert several drivers to using it
- remove the platform data structure associated with the gpio-mmio
helper from the kernel after having converted all remaining users
to generic device properties
- remove legacy struct gpio definition as it has no more users
New drivers:
- add the GPIO driver for the Apple System Management Controller
Driver improvements:
- add support for new models to gpio-adp5585, gpio-tps65219 and
gpio-pca953x
- extend the interrupt support in gpio-loongson-64bit
- allow to mark the simulated GPIO lines as invalid in gpio-sim
- convert all remaining GPIO drivers to using the new GPIO value
setter callbacks
- convert gpio-rcar to using simple device power management ops
callbacks
- don't check if current direction of a line is output before setting
the value in gpio-pisosr and ti-fpc202: the GPIO core already
handles that
- also drop unneeded GPIO range checks in drivers, the core already
makes sure we're within bounds when calling driver callbacks
- use dev_fwnode() where applicable across GPIO drivers
- set line value in gpio-zynqmp-modepin and gpio-twl6040 when the
user wants to change direction of the pin to output even though
these drivers don't need to do anything else to actually set the
direction, otherwise a call like gpiod_direction_output(d, 1) will
not result in the line driver high
- remove the reduntant call to pm_runtime_mark_last_busy() from
gpio-arizona
- use lock guards in gpio-cadence and gpio-mxc
- check the return values of regmap functions in gpio-wcd934x and
gpio-tps65912
- use better regmap interfaces in gpio-wcove and gpio-pca953x
- remove dummy GPIO chip callbacks from several drivers in cases
where the GPIO core can already handle their absence
- allow building gpio-palmas as a module
Fixes:
- use correct bit widths (according to the documentation) in
gpio-virtio
Device-tree bindings:
- convert several of the legacy .txt documents for many different
devices to YAML, improving automatic validation
- create a "trivial" GPIO DT schema that covers a wide range of
simple hardware that share a set of basic GPIO properties
- document new HW: Apple MAC SMC GPIO block and adp5589 I/O expander
- document a new model for pca95xx
- add and/or remove properties in YAML documents for gpio-rockchip,
fsl,qoriq-gpio, arm,pl061 and gpio-xilinx
Misc:
- some minor refactoring in several places, adding/removing forward
declarations, moving defines to better places, constify the
arguments in some functions, remove duplicate includes, etc.
- documentation updates"
* tag 'gpio-updates-for-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (202 commits)
MIPS: alchemy: gpio: use new GPIO line value setter callbacks for the remaining chips
gpiolib: enable CONFIG_GPIOLIB_LEGACY even for !GPIOLIB
gpio: virtio: Fix config space reading.
gpiolib: make legacy interfaces optional
dt-bindings: gpio: rockchip: Allow use of a power-domain
gpiolib: of: add forward declaration for struct device_node
power: reset: macsmc-reboot: Add driver for rebooting via Apple SMC
gpio: Add new gpio-macsmc driver for Apple Macs
mfd: Add Apple Silicon System Management Controller
soc: apple: rtkit: Make shmem_destroy optional
dt-bindings: mfd: Add Apple Mac System Management Controller
dt-bindings: power: reboot: Add Apple Mac SMC Reboot Controller
dt-bindings: gpio: Add Apple Mac SMC GPIO block
gpio: cadence: Remove duplicated include in gpio-cadence.c
gpio: tps65219: Add support for TI TPS65214 PMIC
gpio: tps65219: Update _IDX & _OFFSET macro prefix
gpio: sysfs: Fix an end of loop test in gpiod_unexport()
dt-bindings: gpio: Convert qca,ar7100-gpio to DT schema
dt-bindings: gpio: Convert maxim,max3191x to DT schema
dt-bindings: gpio: fsl,qoriq-gpio: Add missing mpc8xxx compatibles
...
Pull power management updates from Rafael Wysocki:
"As is tradition, cpufreq is the part with the largest number of
updates that include core fixes and cleanups as well as updates of
several assorted drivers, but there are also quite a few updates
related to system sleep, mostly focused on asynchronous suspend and
resume of devices and on making the integration of system suspend
and resume with runtime PM easier.
Runtime PM is also updated to allow some code duplication in drivers
to be eliminated going forward and to work more consistently overall
in some cases.
Apart from that, there are some driver core updates related to PM
domains that should help to address ordering issues with devm_ cleanup
routines relying on PM domains, some assorted devfreq updates
including core fixes and cleanups, tooling updates, and documentation
and MAINTAINERS updates.
Specifics:
- Fix two initialization ordering issues in the cpufreq core and a
governor initialization error path in it, and clean it up (Lifeng
Zheng)
- Add Granite Rapids support in no-HWP mode to the intel_pstate
cpufreq driver (Li RongQing)
- Make intel_pstate always use HWP_DESIRED_PERF when operating in the
passive mode (Rafael Wysocki)
- Allow building the tegra124 cpufreq driver as a module (Aaron
Kling)
- Do minor cleanups for Rust cpufreq and cpumask APIs and fix
MAINTAINERS entry for cpu.rs (Abhinav Ananthu, Ritvik Gupta, Lukas
Bulwahn)
- Clean up assorted cpufreq drivers (Arnd Bergmann, Dan Carpenter,
Krzysztof Kozlowski, Sven Peter, Svyatoslav Ryhel, Lifeng Zheng)
- Add the NEED_UPDATE_LIMITS flag to the CPPC cpufreq driver
(Prashant Malani)
- Fix minimum performance state label error in the amd-pstate driver
documentation (Shouye Liu)
- Add the CPUFREQ_GOV_STRICT_TARGET flag to the userspace cpufreq
governor and explain HW coordination influence on it in the
documentation (Shashank Balaji)
- Fix opencoded for_each_cpu() in idle_state_valid() in the DT
cpuidle driver (Yury Norov)
- Remove info about non-existing QoS interfaces from the PM QoS
documentation (Ulf Hansson)
- Use c_* types via kernel prelude in Rust for OPP (Abhinav Ananthu)
- Add HiSilicon uncore frequency scaling driver to devfreq (Jie Zhan)
- Allow devfreq drivers to add custom sysfs ABIs (Jie Zhan)
- Simplify the sun8i-a33-mbus devfreq driver by using more devm
functions (Uwe Kleine-König)
- Fix an index typo in trans_stat() in devfreq (Chanwoo Choi)
- Check devfreq governor before using governor->name (Lifeng Zheng)
- Remove a redundant devfreq_get_freq_range() call from
devfreq_add_device() (Lifeng Zheng)
- Limit max_freq with scaling_min_freq in devfreq (Lifeng Zheng)
- Replace sscanf() with kstrtoul() in set_freq_store() (Lifeng Zheng)
- Extend the asynchronous suspend and resume of devices to handle
suppliers like parents and consumers like children (Rafael Wysocki)
- Make pm_runtime_force_resume() work for drivers that set the
DPM_FLAG_SMART_SUSPEND flag and allow PCI drivers and drivers that
collaborate with the general ACPI PM domain to set it (Rafael
Wysocki)
- Add kernel parameter to disable asynchronous suspend/resume of
devices (Tudor Ambarus)
- Drop redundant might_sleep() calls from some functions in the
device suspend/resume core code (Zhongqiu Han)
- Fix the handling of monitors connected right before waking up the
system from sleep (tuhaowen)
- Clean up MAINTAINERS entries for suspend and hibernation (Rafael
Wysocki)
- Fix error code path in the KEXEC_JUMP flow and drop a redundant
pm_restore_gfp_mask() call from it (Rafael Wysocki)
- Rearrange suspend/resume error handling in the core device suspend
and resume code (Rafael Wysocki)
- Fix up white space that does not follow coding style in the
hibernation core code (Darshan Rathod)
- Document return values of suspend-related API functions in the
runtime PM framework (Sakari Ailus)
- Mark last busy stamp in multiple autosuspend-related functions in
the runtime PM framework and update its documentation (Sakari
Ailus)
- Take active children into account in pm_runtime_get_if_in_use() for
consistency (Rafael Wysocki)
- Fix NULL pointer dereference in get_pd_power_uw() in the dtpm_cpu
power capping driver (Sivan Zohar-Kotzer)
- Add support for the Bartlett Lake platform to the Intel RAPL power
capping driver (Qiao Wei)
- Add PL4 support for Panther Lake to the intel_rapl_msr power
capping driver (Zhang Rui)
- Update contact information in the PM ABI docs and maintainer
information in the power domains DT binding (Rafael Wysocki)
- Update PM header inclusions to follow the IWYU (Include What You
Use) principle (Andy Shevchenko)
- Add flags to specify power on attach/detach for PM domains, make
the driver core detach PM domains in device_unbind_cleanup(), and
drop the dev_pm_domain_detach() call from the platform bus type
(Claudiu Beznea)
- Improve Python binding's Makefile for cpupower (John B. Wyatt IV)
- Fix printing of CORE, CPU fields in cpupower-monitor (Gautham
Shenoy)"
* tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
cpufreq: CPPC: Mark driver with NEED_UPDATE_LIMITS flag
PM: docs: Use my kernel.org address in ABI docs and DT bindings
PM: hibernate: Fix up white space that does not follow coding style
PM: sleep: Rearrange suspend/resume error handling in the core
Documentation: amd-pstate:fix minimum performance state label error
PM: runtime: Take active children into account in pm_runtime_get_if_in_use()
kexec_core: Drop redundant pm_restore_gfp_mask() call
kexec_core: Fix error code path in the KEXEC_JUMP flow
PM: sleep: Clean up MAINTAINERS entries for suspend and hibernation
drivers: cpufreq: add Tegra114 support
rust: cpumask: Replace `MaybeUninit` and `mem::zeroed` with `Opaque` APIs
cpufreq: Exit governor when failed to start old governor
cpufreq: Move the check of cpufreq_driver->get into cpufreq_verify_current_freq()
cpufreq: Init policy->rwsem before it may be possibly used
cpufreq: Initialize cpufreq-based frequency-invariance later
cpufreq: Remove duplicate check in __cpufreq_offline()
cpufreq: Contain scaling_cur_freq.attr in cpufreq_attrs
cpufreq: intel_pstate: Add Granite Rapids support in no-HWP mode
cpufreq: intel_pstate: Always use HWP_DESIRED_PERF in passive mode
PM / devfreq: Add HiSilicon uncore frequency scaling driver
...
Pull selinux updates from Paul Moore:
- Introduce the concept of a SELinux "neveraudit" type which prevents
all auditing of the given type/domain.
Taken by itself, the benefit of marking a SELinux domain with the
"neveraudit" tag is likely not very interesting, especially given the
significant overlap with the "dontaudit" tag.
However, given that the "neveraudit" tag applies to *all* auditing of
the tagged domain, we can do some fairly interesting optimizations
when a SELinux domain is marked as both "permissive" and "dontaudit"
(think of the unconfined_t domain).
While this pull request includes optimized inode permission and
getattr hooks, these optimizations require SELinux policy changes,
therefore the improvements may not be visible on standard downstream
Linux distos for a period of time.
- Continue the deprecation process of /sys/fs/selinux/user.
After removing the associated userspace code in 2020, we marked the
/sys/fs/selinux/user interface as deprecated in Linux v6.13 with
pr_warn() and the usual documention update.
This adds a five second sleep after the pr_warn(), following a
previous deprecation process pattern that has worked well for us in
the past in helping identify any existing users that we haven't yet
reached.
- Add a __GFP_NOWARN flag to our initial hash table allocation.
Fuzzers such a syzbot often attempt abnormally large SELinux policy
loads, which the SELinux code gracefully handles by checking for
allocation failures, but not before the allocator emits a warning
which causes the automated fuzzing to flag this as an error and
report it to the list. While we want to continue to support the work
done by the fuzzing teams, we want to focus on proper issues and not
an error case that is already handled safely. Add a NOWARN flag to
quiet the allocator and prevent syzbot from tripping on this again.
- Remove some unnecessary selinuxfs cleanup code, courtesy of Al.
- Update the SELinux in-kernel documentation with pointers to
additional information.
* tag 'selinux-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: don't bother with selinuxfs_info_free() on failures
selinux: add __GFP_NOWARN to hashtab_init() allocations
selinux: optimize selinux_inode_getattr/permission() based on neveraudit|permissive
selinux: introduce neveraudit types
documentation: add links to SELinux resources
selinux: add a 5 second sleep to /sys/fs/selinux/user
Pull tpm updates from Jarkko Sakkinen:
"Quite a few commits but nothing really that would be worth of spending
too much time for, or would want to emphasize in particular"
* tag 'tpmdd-next-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm_crb_ffa: handle tpm busy return code
tpm_crb_ffa: Remove memset usage
tpm_crb_ffa: Fix typos in function name
tpm: Check for completion after timeout
tpm: Use of_reserved_mem_region_to_resource() for "memory-region"
tpm: Replace scnprintf() with sysfs_emit() and sysfs_emit_at() in sysfs show functions
tpm_crb_ffa: Remove unused export
tpm: tpm_crb_ffa: try to probe tpm_crb_ffa when it's built-in
firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall
tpm/tpm_svsm: support TPM_CHIP_FLAG_SYNC
tpm/tpm_ftpm_tee: support TPM_CHIP_FLAG_SYNC
tpm: support devices with synchronous send()
tpm: add bufsiz parameter in the .send callback
Pull hardening updates from Kees Cook:
- Introduce and start using TRAILING_OVERLAP() helper for fixing
embedded flex array instances (Gustavo A. R. Silva)
- mux: Convert mux_control_ops to a flex array member in mux_chip
(Thorsten Blum)
- string: Group str_has_prefix() and strstarts() (Andy Shevchenko)
- Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
Kees Cook)
- Refactor and rename stackleak feature to support Clang
- Add KUnit test for seq_buf API
- Fix KUnit fortify test under LTO
* tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits)
sched/task_stack: Add missing const qualifier to end_of_stack()
kstack_erase: Support Clang stack depth tracking
kstack_erase: Add -mgeneral-regs-only to silence Clang warnings
init.h: Disable sanitizer coverage for __init and __head
kstack_erase: Disable kstack_erase for all of arm compressed boot code
x86: Handle KCOV __init vs inline mismatches
arm64: Handle KCOV __init vs inline mismatches
s390: Handle KCOV __init vs inline mismatches
arm: Handle KCOV __init vs inline mismatches
mips: Handle KCOV __init vs inline mismatch
powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section
configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON
configs/hardening: Enable CONFIG_KSTACK_ERASE
stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS
stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth
stackleak: Rename STACKLEAK to KSTACK_ERASE
seq_buf: Introduce KUnit tests
string: Group str_has_prefix() and strstarts()
kunit/fortify: Add back "volatile" for sizeof() constants
acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings
...
Remove the title string ("====") from under arm64 & riscv and move them
to a commment under the perf_user_access sysctl. They are explanations,
*not* sysctls themselves
This effectively removes these two strings from appearing as not
implemented when the check-sysctl-docs script is run
Signed-off-by: Joel Granados <joel.granados@kernel.org>
Removing them solves an issue where they were incorrectly considered as
not implemented by the check-sysctl-docs script
Signed-off-by: Joel Granados <joel.granados@kernel.org>
Platforms supporting direct message request v2 [1] can support secure
partitions that support multiple services. For CRB over FF-A interface,
if the firmware TPM or TPM service [1] shares its Secure Partition (SP)
with another service, message requests may fail with a -EBUSY error.
To handle this, replace the single check and call with a retry loop
that attempts the TPM message send operation until it succeeds or a
configurable timeout is reached. Implement a _try_send_receive function
to do a single send/receive and modify the existing send_receive to
add this retry loop.
The retry mechanism introduces a module parameter (`busy_timeout_ms`,
default: 2000ms) to control how long to keep retrying on -EBUSY
responses. Between retries, the code waits briefly (50-100 microseconds)
to avoid busy-waiting and handling TPM BUSY conditions more gracefully.
The parameter can be modified at run-time as such:
echo 3000 | tee /sys/module/tpm_crb_ffa/parameters/busy_timeout_ms
This changes the timeout from the default 2000ms to 3000ms.
[1] TPM Service Command Response Buffer Interface Over FF-A
https://developer.arm.com/documentation/den0138/latest/
Signed-off-by: Prachotan Bathi <prachotan.bathi@arm.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Merge updates related to system sleep for 6.17-rc1:
- Extend the asynchronous suspend and resume of devices to handle
suppliers like parents and consumers like children (Rafael Wysocki)
- Make pm_runtime_force_resume() work for drivers that set the
DPM_FLAG_SMART_SUSPEND flag and allow PCI drivers and drivers that
collaborate with the general ACPI PM domain to set it (Rafael
Wysocki)
- Add kernel parameter to disable asynchronous suspend/resume of
devices (Tudor Ambarus)
- Drop redundant might_sleep() calls from some functions in the device
suspend/resume core code (Zhongqiu Han)
- Fix the handling of monitors connected right before waking up the
system from sleep (tuhaowen)
- Clean up MAINTAINERS entries for suspend and hibernation (Rafael
Wysocki)
- Fix error code path in the KEXEC_JUMP flow and drop a redundant
pm_restore_gfp_mask() call from it (Rafael Wysocki)
- Rearrange suspend/resume error handling in the core device suspend
and resume code (Rafael Wysocki)
- Fix up white space that does not follow coding style in the
hibernation core code (Darshan Rathod)
* pm-sleep:
PM: hibernate: Fix up white space that does not follow coding style
PM: sleep: Rearrange suspend/resume error handling in the core
kexec_core: Drop redundant pm_restore_gfp_mask() call
kexec_core: Fix error code path in the KEXEC_JUMP flow
PM: sleep: Clean up MAINTAINERS entries for suspend and hibernation
PM: sleep: add kernel parameter to disable asynchronous suspend/resume
PCI/PM: Set power.strict_midlayer in pci_pm_init()
ACPI: PM: Set/clear power.strict_midlayer in prepare/complete
PM: sleep: Add strict_midlayer flag to struct dev_pm_info
PM: runtime: Introduce __rpm_get_driver_callback()
PM: Check power.needs_force_resume in pm_runtime_force_suspend()
PM: runtime: Clear power.needs_force_resume in pm_runtime_reinit()
PM: Make pm_runtime_force_resume() work with DPM_FLAG_SMART_SUSPEND
PM: Move two sleep-related functions under CONFIG_PM_SLEEP
PM: Use true/false as power.needs_force_resume values
PM: sleep: Make async suspend handle suppliers like parents
PM: sleep: Make async resume handle consumers like children
PM: sleep: Drop superfluous might_sleep() calls
PM: sleep: console: Fix the black screen issue
In preparation for adding Clang sanitizer coverage stack depth tracking
that can support stack depth callbacks:
- Add the new top-level CONFIG_KSTACK_ERASE option which will be
implemented either with the stackleak GCC plugin, or with the Clang
stack depth callback support.
- Rename CONFIG_GCC_PLUGIN_STACKLEAK as needed to CONFIG_KSTACK_ERASE,
but keep it for anything specific to the GCC plugin itself.
- Rename all exposed "STACKLEAK" names and files to "KSTACK_ERASE" (named
for what it does rather than what it protects against), but leave as
many of the internals alone as possible to avoid even more churn.
While here, also split "prev_lowest_stack" into CONFIG_KSTACK_ERASE_METRICS,
since that's the only place it is referenced from.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250717232519.2984886-1-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
Patch series "generalize panic_print's dump function to be used by other
kernel parts", v3.
When working on kernel stability issues, panic, task-hung and
software/hardware lockup are frequently met. And to debug them, user may
need lots of system information at that time, like task call stacks, lock
info, memory info etc.
panic case already has panic_print_sys_info() for this purpose, and has a
'panic_print' bitmask to control what kinds of information is needed,
which is also helpful to debug other task-hung and lockup cases.
So this patchset extracts the function out to a new file 'lib/sys_info.c',
and makes it available for other cases which also need to dump system info
for debugging.
Also as suggested by Petr Mladek, add 'panic_sys_info=' interface to take
human readable string like "tasks,mem,locks,timers,ftrace,....", and
eventually obsolete the current 'panic_print' bitmap interface.
In RFC and V1 version, hung_task and SW/HW watchdog modules are enabled
with the new sys_info dump interface. In v2, they are kept out for better
review of current change, and will be posted later.
Locally these have been used in our bug chasing for stability issues and
was proven helpful.
Many thanks to Petr Mladek for great suggestions on both the code and
architectures!
This patch (of 5):
Currently the panic_print_sys_info() was called twice with different
parameters to handle console replay case, which is kind of confusing.
Add panic_console_replay() explicitly and rename
'PANIC_PRINT_ALL_PRINTK_MSG' to 'PANIC_CONSOLE_REPLAY', to make the code
straightforward. The related kernel document is also updated.
Link: https://lkml.kernel.org/r/20250703021004.42328-1-feng.tang@linux.alibaba.com
Link: https://lkml.kernel.org/r/20250703021004.42328-2-feng.tang@linux.alibaba.com
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
/proc/cgroups lists only v1 controllers by default, however, this is
only enforced since the commit af000ce852 ("cgroup: Do not report
unavailable v1 controllers in /proc/cgroups") and there is software in
the wild that uses content of /proc/cgroups to decide on availability of
v2 (sic) controllers.
Add a boottime param that can bring back the previous behavior for
setups where the check in the software cannot be changed and it causes
e.g. unintended OOMs.
Also, this patch takes out cgrp_v1_visible from cgroup1_subsys_absent()
guard since it's only important to check which hierarchy (v1 vs v2) the
subsys is attached to. This has no effect on the printed message but
the code is cleaner since cgrp_v1_visible is really about mounted
hierarchies, not the content of /proc/cgroups.
Link: https://lore.kernel.org/r/b26b60b7d0d2a5ecfd2f3c45f95f32922ed24686.camel@decadent.org.uk
Fixes: af000ce852 ("cgroup: Do not report unavailable v1 controllers in /proc/cgroups")
Fixes: a0ab145322 ("cgroup: Print message when /proc/cgroups is read on v2-only system")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
When injecting IPIs to a set of harts, the IMSIC IPI support will do a
separate MMIO write to the SETIPNUM_LE register of each target hart. This
means on a platform where IMSIC is trap-n-emulated, there will be N MMIO
traps when injecting IPI to N target harts hence IMSIC IPIs will be slow on
such platforms compared to the SBI IPI extension.
Unfortunately, there is no DT, ACPI, or any other way of discovering
whether the underlying IMSIC is trap-n-emulated. Using MMIO write to the
SETIPNUM_LE register for injecting IPI is purely a software choice in the
IMSIC driver hence add a kernel parameter to allow users to disable IMSIC
IPIs on platforms with trap-n-emulated IMSIC.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250716123745.557585-1-apatel@ventanamicro.com
In the AMD P-States Performance Scale diagram, the labels for "Max Perf"
and "Lowest Perf" were incorrectly used to define the range for
"Desired Perf".The "Desired performance target" should be bounded by the
"Maximum requested performance" and the "Minimum requested performance",
which corresponds to "Max Perf" and "Min Perf", respectively.
Signed-off-by: Shouye Liu <shouyeliu@tencent.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20250522070140.17557-1-shouyeliu@gmail.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Pull misc fixes from Andrew Morton:
"19 hotfixes. A whopping 16 are cc:stable and the remainder address
post-6.15 issues or aren't considered necessary for -stable kernels.
14 are for MM. Three gdb-script fixes and a kallsyms build fix"
* tag 'mm-hotfixes-stable-2025-07-11-16-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
Revert "sched/numa: add statistics of numa balance task"
mm: fix the inaccurate memory statistics issue for users
mm/damon: fix divide by zero in damon_get_intervals_score()
samples/damon: fix damon sample mtier for start failure
samples/damon: fix damon sample wsse for start failure
samples/damon: fix damon sample prcl for start failure
kasan: remove kasan_find_vm_area() to prevent possible deadlock
scripts: gdb: vfs: support external dentry names
mm/migrate: fix do_pages_stat in compat mode
mm/damon/core: handle damon_call_control as normal under kdmond deactivation
mm/rmap: fix potential out-of-bounds page table access during batched unmap
mm/hugetlb: don't crash when allocating a folio if there are no resv
scripts/gdb: de-reference per-CPU MCE interrupts
scripts/gdb: fix interrupts.py after maple tree conversion
maple_tree: fix mt_destroy_walk() on root leaf node
mm/vmalloc: leave lazy MMU mode on PTE mapping error
scripts/gdb: fix interrupts display after MCP on x86
lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
kallsyms: fix build without execinfo
Document the 5 new attack vector command line options, how they
interact with existing vulnerability controls, and recommendations on when
they can be disabled.
Note that while mitigating against untrusted userspace requires both
user-to-kernel and user-to-user protection, these are kept separate. The
kernel can control what code executes inside of it and that may affect the
risk associated with vulnerabilities especially if new kernel mitigations
are implemented. The same isn't typically true of userspace.
In other words, the risk associated with user-to-user or guest-to-guest
attacks is unlikely to change over time. While the risk associated with
user-to-kernel or guest-to-host attacks may change. Therefore, these
controls are separated.
Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250709155731.3279419-1-david.kaplan@amd.com
On some platforms, device dependencies are not properly represented by
device links, which can cause issues when asynchronous power management
is enabled. While it is possible to disable this via sysfs, doing so
at runtime can race with the first system suspend event.
This patch introduces a kernel command-line parameter, "pm_async", which
can be set to "off" to globally disable asynchronous suspend and resume
operations from early boot. It effectively provides a way to set the
initial value of the existing pm_async sysfs knob at boot time. This
offers a robust method to fall back to synchronous (sequential)
operation, which can stabilize platforms with problematic dependencies
and also serve as a useful debugging tool.
The default behavior remains unchanged (asynchronous enabled). To
disable it, boot the kernel with the "pm_async=off" parameter.
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20250709-pm-async-off-v3-1-cb69a6fc8d04@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a new line-level, boolean property to the gpio-sim configfs
interface called 'valid'. It's set by default and the user can unset it
to make the line be included in the standard `gpio-reserved-ranges`
property when the chip is registered with GPIO core. This allows users
to specify which lines should not be available for requesting as GPIOs.
Link: https://lore.kernel.org/r/20250630130358.40352-1-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
In commit b5325b2a27 ("coredump: hand a pidfd to the usermode coredump
helper") a new core_pattern specifier, %F, was added to provide a pidfs
to the usermode helper process referring to the crashed process.
Update the documentation to include the new core_pattern specifier.
Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250612060204.1159734-1-carnil@debian.org
In the AMD P-States Performance Scale diagram, the labels for "Max Perf"
and "Lowest Perf" were incorrectly used to define the range for
"Desired Perf".The "Desired performance target" should be bounded by the
"Maximum requested performance" and the "Minimum requested performance",
which corresponds to "Max Perf" and "Min Perf", respectively.
Signed-off-by: Shouye Liu <shouyeliu@tencent.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250620021658.92161-1-shouyeliu@gmail.com
Add links to the SELinux kernel subsystem README.md file, the
SELinux kernel wiki, and the SELinux userspace wiki to the
SELinux guide.
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: spacing and style corrections, subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Kdump kernel doesn't need IMA functionality, and enabling IMA will cost
extra memory. It would be very helpful to allow IMA to be disabled for
kdump kernel.
Hence add a knob ima=on|off here to allow turning IMA off in kdump
kernel if needed.
Note that this IMA disabling is limited to kdump kernel, please don't
abuse it in other kernel and thus serious consequences are caused.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
There are two possible scenarios for syscall filtering:
- having a trusted/allowed range of PCs, and intercepting everything else
- or the opposite: a single untrusted/intercepted range and allowing
everything else (this is relevant for any kind of sandboxing scenario,
or monitoring behavior of a single library)
The current API only allows the former use case due to allowed
range wrap-around check. Add PR_SYS_DISPATCH_INCLUSIVE_ON that
enables the second use case.
Add PR_SYS_DISPATCH_EXCLUSIVE_ON alias for PR_SYS_DISPATCH_ON
to make it clear how it's different from the new
PR_SYS_DISPATCH_INCLUSIVE_ON.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/97947cc8e205ff49675826d7b0327ef2e2c66eea.1747839857.git.dvyukov@google.com