Commit Graph

47072 Commits

Author SHA1 Message Date
Boqun Feng
467c890f2d Merge branches 'docs.2025.02.04a', 'lazypreempt.2025.03.04a', 'misc.2025.03.04a', 'srcu.2025.02.05a' and 'torture.2025.02.05a' 2025-03-04 18:47:32 -08:00
Ankur Arora
8437bb84bc rcu: limit PREEMPT_RCU configurations
PREEMPT_LAZY can be enabled stand-alone or alongside PREEMPT_DYNAMIC
which allows for dynamic switching of preemption models.

The choice of PREEMPT_RCU or not, however, is fixed at compile time.

Given that PREEMPT_RCU makes some trade-offs to optimize for latency
as opposed to throughput, configurations with limited preemption
might prefer the stronger forward-progress guarantees of PREEMPT_RCU=n.

Accordingly, explicitly limit PREEMPT_RCU=y to the latency oriented
preemption models: PREEMPT, PREEMPT_RT, and the runtime configurable
model PREEMPT_DYNAMIC.

This means the throughput oriented models, PREEMPT_NONE,
PREEMPT_VOLUNTARY, and PREEMPT_LAZY will run with PREEMPT_RCU=n.

Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:46:47 -08:00
Boqun Feng
a56ca5619f rcutorture: Update ->extendables check for lazy preemption
The rcutorture_one_extend_check() function's second last check assumes
that "preempt_count() & PREEMPT_MASK" is non-zero only if
RCUTORTURE_RDR_PREEMPT or RCUTORTURE_RDR_SCHED bit is set.

This works for preemptible RCU and for non-preemptible RCU running in
a non-preemptible kernel.  But it fails for non-preemptible RCU running
in a preemptible kernel because then rcu_read_lock() is just
preempt_disable(), which increases preempt count.

This commit therefore adjusts this check to take into account the case
fo non-preemptible RCU running in a preemptible kernel.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:46:47 -08:00
Paul E. McKenney
0be4b19edd rcutorture: Update rcutorture_one_extend_check() for lazy preemption
The rcutorture_one_extend_check() function's last check assumes that
if cur_ops->readlock_nesting() returns greater than zero, either the
RCUTORTURE_RDR_RCU_1 or the RCUTORTURE_RDR_RCU_2 bit must be set, that
is, there must be at least one rcu_read_lock() in effect.

This works for preemptible RCU and for non-preemptible RCU running in
a non-preemptible kernel.  But it fails for non-preemptible RCU running
in a preemptible kernel because then RCU's cur_ops->readlock_nesting()
function, which is rcu_torture_readlock_nesting(), will return
the PREEMPT_MASK mask bits from preempt_count().  The result will
be greater than zero if preemption is disabled, including by the
RCUTORTURE_RDR_PREEMPT and RCUTORTURE_RDR_SCHED bits.

This commit therefore adjusts this check to take into account the case
fo non-preemptible RCU running in a preemptible kernel.

[boqun: Fix the if condition and add comment]

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202502171415.8ec87c87-lkp@intel.com
Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:46:46 -08:00
Ankur Arora
9fd858cc5a osnoise: provide quiescent states
To reduce RCU noise for nohz_full configurations, osnoise depends
on cond_resched() providing quiescent states for PREEMPT_RCU=n
configurations. For PREEMPT_RCU=y configurations -- where
cond_resched() is a stub -- we do this by directly calling
rcu_momentary_eqs().

With (PREEMPT_LAZY=y, PREEMPT_DYNAMIC=n), however, we have a
configuration with (PREEMPTION=y, PREEMPT_RCU=n) where neither
of the above can help.

Handle that by providing an explicit quiescent state here for all
configurations.

As mentioned above this is not needed for non-stubbed cond_resched(),
but, providing a quiescent state here just pulls in one that a future
cond_resched() would provide, so doesn't cause any extra work for
this configuration.

Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:46:09 -08:00
Uladzislau Rezki (Sony)
5a562b8b3f rcu: Use _full() API to debug synchronize_rcu()
Switch for using of get_state_synchronize_rcu_full() and
poll_state_synchronize_rcu_full() pair to debug a normal
synchronize_rcu() call.

Just using "not" full APIs to identify if a grace period is
passed or not might lead to a false-positive kernel splat.

It can happen, because get_state_synchronize_rcu() compresses
both normal and expedited states into one single unsigned long
value, so a poll_state_synchronize_rcu() can miss GP-completion
when synchronize_rcu()/synchronize_rcu_expedited() concurrently
run.

To address this, switch to poll_state_synchronize_rcu_full() and
get_state_synchronize_rcu_full() APIs, which use separate variables
for expedited and normal states.

Reported-by: cheung wall <zzqq0103.hey@gmail.com>
Closes: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
Fixes: 988f569ae0 ("rcu: Reduce synchronize_rcu() latency")
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20250227131613.52683-3-urezki@gmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:44:29 -08:00
Uladzislau Rezki (Sony)
8d67c1558a rcutorture: Allow a negative value for nfakewriters
Currently "nfakewriters" parameter can be set to any value but
there is no possibility to adjust it automatically based on how
many CPUs a system has where a test is run on.

To address this, if the "nfakewriters" is set to negative it will
be adjusted to num_online_cpus() during torture initialization.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Link: https://lore.kernel.org/r/20250227131613.52683-1-urezki@gmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:44:29 -08:00
Paul E. McKenney
6ea9a1781c Flush console log from kernel_power_off()
Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output
and shutdown time, which hides shutdown-time RCU issues from rcutorture.
Therefore, make pr_flush() public and invoke it after then last print
in kernel_power_off().

[ paulmck: Apply John Ogness feedback. ]
[ paulmck: Appy Sebastian Andrzej Siewior feedback. ]
[ paulmck: Apply kernel test robot feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Link: https://lore.kernel.org/r/5f743488-dc2a-4f19-bdda-cf50b9314832@paulmck-laptop
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:44:29 -08:00
Paul E. McKenney
59bed79ffd context_tracking: Make RCU watch ct_kernel_exit_state() warning
The WARN_ON_ONCE() in ct_kernel_exit_state() follows the call to
ct_state_inc(), which means that RCU is not watching this WARN_ON_ONCE().
This can (and does) result in extraneous lockdep warnings when this
WARN_ON_ONCE() triggers.  These extraneous warnings are the opposite
of helpful.

Therefore, invert the WARN_ON_ONCE() condition and move it before the
call to ct_state_inc().  This does mean that the ct_state_inc() return
value can no longer be used in the WARN_ON_ONCE() condition, so discard
this return value and instead use a call to rcu_is_watching_curr_cpu().
This call is executed only in CONFIG_RCU_EQS_DEBUG=y kernels, so there
is no added overhead in production use.

[Boqun: Add the subsystem tag in the title]

Reported-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/bd911cd9-1fe9-447c-85e0-ea811a1dc896@paulmck-laptop
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:44:29 -08:00
Paul E. McKenney
69381f3828 rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state()
Analysis of an rcutorture callback-based forward-progress test failure was
hampered by the lack of ->cblist segment lengths.  This commit therefore
adds this information, so that what would have been ".W85620.N." (there
are some callbacks waiting for grace period sequence number 85620 and
some number more that have not yet been assigned to a grace period)
now prints as ".W2(85620).N6." (there are 2 callbacks waiting for grace
period 85620 and 6 not yet assigned to a grace period).  Note that
"D" (done), "N" (next and not yet assigned to a grace period, and "B"
(bypass, also not yet assigned to a grace period) have just the number
of callbacks without the parenthesized grace-period sequence number.

In contrast, "W" (waiting for the current grace period) and "R" (ready
to wait for the next grace period to start) both have parenthesized
grace-period sequence numbers.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:44:28 -08:00
Paul E. McKenney
23c22d9156 rcu-tasks: Move RCU Tasks self-tests to core_initcall()
The timer and hrtimer softirq processing has moved to dedicated threads
for kernels built with CONFIG_IRQ_FORCED_THREADING=y.  This results in
timers not expiring until later in early boot, which in turn causes the
RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y,
which further causes the entire kernel to hang.  One fix would be to
make timers work during this time, but there are no known users of RCU
Tasks grace periods during that time, so no justification for the added
complexity.  Not yet, anyway.

This commit therefore moves the call to rcu_init_tasks_generic() from
kernel_init_freeable() to a core_initcall().  This works because the
timer and hrtimer kthreads are created at early_initcall() time.

Fixes: 49a1763950 ("softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: <linux-trace-kernel@vger.kernel.org>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:44:28 -08:00
Paul E. McKenney
85aad7cc41 rcu: Fix get_state_synchronize_rcu_full() GP-start detection
The get_state_synchronize_rcu_full() and poll_state_synchronize_rcu_full()
functions use the root rcu_node structure's ->gp_seq field to detect
the beginnings and ends of grace periods, respectively.  This choice is
necessary for the poll_state_synchronize_rcu_full() function because
(give or take counter wrap), the following sequence is guaranteed not
to trigger:

	get_state_synchronize_rcu_full(&rgos);
	synchronize_rcu();
	WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&rgos));

The RCU callbacks that awaken synchronize_rcu() instances are
guaranteed not to be invoked before the root rcu_node structure's
->gp_seq field is updated to indicate the end of the grace period.
However, these callbacks might start being invoked immediately
thereafter, in particular, before rcu_state.gp_seq has been updated.
Therefore, poll_state_synchronize_rcu_full() must refer to the
root rcu_node structure's ->gp_seq field.  Because this field is
updated under this structure's ->lock, any code following a call to
poll_state_synchronize_rcu_full() will be fully ordered after the
full grace-period computation, as is required by RCU's memory-ordering
semantics.

By symmetry, the get_state_synchronize_rcu_full() function should also
use this same root rcu_node structure's ->gp_seq field.  But it turns out
that symmetry is profoundly (though extremely infrequently) destructive
in this case.  To see this, consider the following sequence of events:

1.	CPU 0 starts a new grace period, and updates rcu_state.gp_seq
	accordingly.

2.	As its first step of grace-period initialization, CPU 0 examines
	the current CPU hotplug state and decides that it need not wait
	for CPU 1, which is currently offline.

3.	CPU 1 comes online, and updates its state.  But this does not
	affect the current grace period, but rather the one after that.
	After all, CPU 1 was offline when the current grace period
	started, so all pre-existing RCU readers on CPU 1 must have
	completed or been preempted before it last went offline.
	The current grace period therefore has nothing it needs to wait
	for on CPU 1.

4.	CPU 1 switches to an rcutorture kthread which is running
	rcutorture's rcu_torture_reader() function, which starts a new
	RCU reader.

5.	CPU 2 is running rcutorture's rcu_torture_writer() function
	and collects a new polled grace-period "cookie" using
	get_state_synchronize_rcu_full().  Because the newly started
	grace period has not completed initialization, the root rcu_node
	structure's ->gp_seq field has not yet been updated to indicate
	that this new grace period has already started.

	This cookie is therefore set up for the end of the current grace
	period (rather than the end of the following grace period).

6.	CPU 0 finishes grace-period initialization.

7.	If CPU 1’s rcutorture reader is preempted, it will be added to
	the ->blkd_tasks list, but because CPU 1’s ->qsmask bit is not
	set in CPU 1's leaf rcu_node structure, the ->gp_tasks pointer
	will not be updated.  Thus, this grace period will not wait on
	it.  Which is only fair, given that the CPU did not come online
	until after the grace period officially started.

8.	CPUs 0 and 2 then detect the new grace period and then report
	a quiescent state to the RCU core.

9.	Because CPU 1 was offline at the start of the current grace
	period, CPUs 0 and 2 are the only CPUs that this grace period
	needs to wait on.  So the grace period ends and post-grace-period
	cleanup starts.  In particular, the root rcu_node structure's
	->gp_seq field is updated to indicate that this grace period
	has now ended.

10.	CPU 2 continues running rcu_torture_writer() and sees that,
	from the viewpoint of the root rcu_node structure consulted by
	the poll_state_synchronize_rcu_full() function, the grace period
	has ended.  It therefore updates state accordingly.

11.	CPU 1 is still running the same RCU reader, which notices this
	update and thus complains about the too-short grace period.

The fix is for the get_state_synchronize_rcu_full() function to use
rcu_state.gp_seq instead of the root rcu_node structure's ->gp_seq field.
With this change in place, if step 5's cookie indicates that the grace
period has not yet started, then any prior code executed by CPU 2 must
have happened before CPU 1 came online.  This will in turn prevent CPU
1's code in steps 3 and 11 from spanning CPU 2's grace-period wait,
thus preventing CPU 1 from being subjected to a too-short grace period.

This commit therefore makes this change.  Note that there is no change to
the poll_state_synchronize_rcu_full() function, which as noted above,
must continue to use the root rcu_node structure's ->gp_seq field.
This is of course an asymmetry between these two functions, but is an
asymmetry that is absolutely required for correct operation.  It is a
common human tendency to greatly value symmetry, and sometimes symmetry
is a wonderful thing.  Other times, symmetry results in poor performance.
But in this case, symmetry is just plain wrong.

Nevertheless, the asymmetry does require an additional adjustment.
It is possible for get_state_synchronize_rcu_full() to see a given
grace period as having started, but for an immediately following
poll_state_synchronize_rcu_full() to see it as having not yet started.
Given the current rcu_seq_done_exact() implementation, this will
result in a false-positive indication that the grace period is done
from poll_state_synchronize_rcu_full().  This is dealt with by making
rcu_seq_done_exact() reach back three grace periods rather than just
two of them.

However, simply changing get_state_synchronize_rcu_full() function to
use rcu_state.gp_seq instead of the root rcu_node structure's ->gp_seq
field results in a theoretical bug in kernels booted with
rcutree.rcu_normal_wake_from_gp=1 due to the following sequence of
events:

o	The rcu_gp_init() function invokes rcu_seq_start() to officially
	start a new grace period.

o	A new RCU reader begins, referencing X from some RCU-protected
	list.  The new grace period is not obligated to wait for this
	reader.

o	An updater removes X, then calls synchronize_rcu(), which queues
	a wait element.

o	The grace period ends, awakening the updater, which frees X
	while the reader is still referencing it.

The reason that this is theoretical is that although the grace period
has officially started, none of the CPUs are officially aware of this,
and thus will have to assume that the RCU reader pre-dated the start of
the grace period. Detailed explanation can be found at [2] and [3].

Except for kernels built with CONFIG_PROVE_RCU=y, which use the polled
grace-period APIs, which can and do complain bitterly when this sequence
of events occurs.  Not only that, there might be some future RCU
grace-period mechanism that pulls this sequence of events from theory
into practice.  This commit therefore also pulls the call to
rcu_sr_normal_gp_init() to precede that to rcu_seq_start().

Although this fixes commit 91a967fd69 ("rcu: Add full-sized polling
for get_completed*() and poll_state*()"), it is not clear that it is
worth backporting this commit.  First, it took me many weeks to convince
rcutorture to reproduce this more frequently than once per year.
Second, this cannot be reproduced at all without frequent CPU-hotplug
operations, as in waiting all of 50 milliseconds from the end of the
previous operation until starting the next one.  Third, the TREE03.boot
settings cause multi-millisecond delays during RCU grace-period
initialization, which greatly increase the probability of the above
sequence of events.  (Don't do this in production workloads!) Fourth,
the TREE03 rcutorture scenario was modified to use four-CPU guest OSes,
to have a single-rcu_node combining tree, no testing of RCU priority
boosting, and no random preemption, and these modifications were
necessary to reproduce this issue in a reasonable timeframe. Fifth,
extremely heavy use of get_state_synchronize_rcu_full() and/or
poll_state_synchronize_rcu_full() is required to reproduce this, and as
of v6.12, only kfree_rcu() uses it, and even then not particularly
heavily.

[boqun: Apply the fix [1], and add the comment before the moved
rcu_sr_normal_gp_init(). Additional links are added for explanation.]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Link: https://lore.kernel.org/rcu/d90bd6d9-d15c-4b9b-8a69-95336e74e8f4@paulmck-laptop/ [1]
Link: https://lore.kernel.org/rcu/20250303001507.GA3994772@joelnvbox/ [2]
Link: https://lore.kernel.org/rcu/Z8bcUsZ9IpRi1QoP@pc636/ [3]
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:43:34 -08:00
Paul E. McKenney
536e8b9b80 srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing
The srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe() functions
map to __srcu_read_lock() and __srcu_read_unlock() on systems like x86
that have NMI-safe this_cpu_inc() operations.  This makes the underlying
__srcu_read_lock_nmisafe() and __srcu_read_unlock_nmisafe() functions
difficult to test on (for example) x86 systems, allowing bugs to creep in.

This commit therefore creates a FORCE_NEED_SRCU_NMI_SAFE Kconfig that
forces those underlying functions to be used even on systems where they
are not needed, thus providing better testing coverage.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:14:40 -08:00
Paul E. McKenney
38b43eca66 rcutorture: Complain when invalid SRCU reader_flavor is specified
Currently, rcutorture ignores reader_flavor bits that are not in the
SRCU_READ_FLAVOR_ALL bitmask, which could confuse rcutorture users into
believing buggy patches had been fully tested.  This commit therefore
produces a splat in this case.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:14:40 -08:00
Paul E. McKenney
5d45bdf292 rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
The RCU_TORTURE_TEST_CHK_RDR_STATE and RCU_TORTURE_TEST_LOG_CPU Kconfig
options are pointlessly defined as tristate.  This commit therefore
converts them to bool.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202412241458.150d082b-lkp@intel.com
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:14:40 -08:00
Paul E. McKenney
7acc2d9015 rcutorture: Make cur_ops->format_gp_seqs take buffer length
The Tree and Tiny implementations of rcutorture_format_gp_seqs() use
hard-coded constants for the length of the buffer that they format into.
This is of course an accident waiting to happen, so this commit therefore
makes them take a length argument.  The rcutorture calling code uses
ARRAY_SIZE() to safely compute this new argument.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:14:39 -08:00
Paul E. McKenney
65e6ff0f31 rcutorture: Add ftrace-compatible timestamp to GP# failure/close-call output
This commit adds an ftrace-compatible microsecond-scale timestamp
to the failure/close-call output, but only in kernels built with
CONFIG_RCU_TORTURE_TEST_LOG_GP=y.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:14:39 -08:00
Paul E. McKenney
2db7ab8c10 rcutorture: Expand failure/close-call grace-period output
With only eight bits per grace-period sequence number, wrap can happen
in 64 grace periods.  This commit therefore increases this to sixteen
bits for normal grace-period sequence numbers and the combined short-form
polling sequence numbers, thus deferring wrap for at least 16,384 grace
periods.  Because expedited grace periods go faster, expand these to 24
bits, deferring wrap for at least 4,194,304 expedited grace periods.
These longer wrap times makes it easier to correlate these numbers to
trace-event output.

Note that the low-order two bits are reserved for intra-grace-period
state, hence the above wrap numbers being a factor of four smaller than
you might expect.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:14:39 -08:00
Paul E. McKenney
84ae91018a rcutorture: Include grace-period sequence numbers in failure/close-call
This commit includes the grace-period sequence numbers at the beginning
and end of each segment in the "Failure/close-call rcutorture reader
segments" list.  These are in hexadecimal, and only the bottom byte.
Currently, only RCU is supported, with its three sequence numbers (normal,
expedited, and polled).

Note that if all the grace-period sequence numbers remain the same across
a given reader segment, only one copy of the number will be printed.
Of course, if there is a change, both sets of values will be printed.

Because the overhead of collecting this information can suppress
heisenbugs, this information is collected and printed only in kernels
built with CONFIG_RCU_TORTURE_TEST_LOG_GP=y.

[ paulmck: Apply Nathan Chancellor feedback for IS_ENABLED(). ]
[ paulmck: Apply feedback from kernel test robot. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:14:39 -08:00
Paul E. McKenney
b8726c5aa6 rcutorture: Add a test_boost_holdoff module parameter
This commit adds a test_boost_holdoff module parameter that tells the RCU
priority-boosting tests to wait for the specified number of seconds past
the start of the rcutorture test.  This can be useful when rcutorture
is built into the kernel (as opposed to being modprobed), especially on
large systems where early start of RCU priority boosting can delay the
boot sequence, which adds a full CPU's worth of load onto the system.
This can in turn result in pointless stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:14:39 -08:00
Paul E. McKenney
623b52802b torture: Add get_torture_init_jiffies() for test-start time
This commit adds a get_torture_init_jiffies() function that returns the
value of the jiffies counter at the start of the test, that is, at the
point where torture_init_begin() was invoked.

This will be used to enable torture-test holdoffs for tests implemented
using per-CPU kthreads, which are created and deleted by CPU-hotplug
operations, and thus (unlike normal kthreads) don't automatically know
when the test started.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:14:24 -08:00
Paul E. McKenney
4c3fca0f59 refscale: Add srcu_read_lock_fast() support using "srcu-fast"
This commit creates a new srcu-fast option for the refscale.scale_type
module parameter that selects srcu_read_lock_fast() and
srcu_read_unlock_fast().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:12:05 -08:00
Paul E. McKenney
176d19eecb rcutorture: Add ability to test srcu_read_{,un}lock_fast()
This commit permits rcutorture to test srcu_read_{,un}lock_fast(), which
is specified by the rcutorture.reader_flavor=0x8 kernel boot parameter.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:12:05 -08:00
Paul E. McKenney
4937096b57 srcu: Pull integer-to-pointer conversion into __srcu_ctr_to_ptr()
This commit abstracts the srcu_read_unlock*() integer-to-pointer
conversion into a new __srcu_ctr_to_ptr().  This will be used
in rcutorture for testing an srcu_read_unlock_fast() that avoids
array-indexing overhead by taking a pointer rather than an integer.

[ paulmck: Apply kernel test robot feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:12:05 -08:00
Paul E. McKenney
f4bde41dd1 srcu: Pull pointer-to-integer conversion into __srcu_ptr_to_ctr()
This commit abstracts the srcu_read_lock*() pointer-to-integer conversion
into a new __srcu_ptr_to_ctr().  This will be used in rcutorture for
testing an srcu_read_lock_fast() that returns a pointer rather than
an integer.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:12:05 -08:00
Paul E. McKenney
4d86b1e7e1 srcu: Add SRCU_READ_FLAVOR_SLOWGP to flag need for synchronize_rcu()
This commit switches from a direct test of SRCU_READ_FLAVOR_LITE to a new
SRCU_READ_FLAVOR_SLOWGP macro to check for substituting synchronize_rcu()
for smp_mb() in SRCU grace periods.  Right now, SRCU_READ_FLAVOR_SLOWGP
is exactly SRCU_READ_FLAVOR_LITE, but the addition of the _fast() flavor
of SRCU will change that.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:12:05 -08:00
Paul E. McKenney
d31e31365b srcu: Force synchronization for srcu_get_delay()
Currently, srcu_get_delay() can be called concurrently, for example,
by a CPU that is the first to request a new grace period and the CPU
processing the current grace period.  Although concurrent access is
harmless, it unnecessarily expands the state space.  Additionally,
all calls to srcu_get_delay() are from slow paths.

This commit therefore protects all calls to srcu_get_delay() with
ssp->srcu_sup->lock, which is already held on the invocation from the
srcu_funnel_gp_start() function.  While in the area, this commit also
adds a lockdep_assert_held() to srcu_get_delay() itself.

Reported-by: syzbot+16a19b06125a2963eaee@syzkaller.appspotmail.com
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:12:05 -08:00
Paul E. McKenney
821ca6fa15 srcu: Make Tree SRCU updates independent of ->srcu_idx
This commit makes Tree SRCU updates independent of ->srcu_idx, then
drop ->srcu_idx.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:12:05 -08:00
Paul E. McKenney
795e7efec6 srcu: Make SRCU readers use ->srcu_ctrs for counter selection
This commit causes SRCU readers to use ->srcu_ctrs for counter
selection instead of ->srcu_idx.  This takes another step towards
array-indexing-free SRCU readers.

[ paulmck: Apply kernel test robot feedback. ]

Co-developed-by: Z qiang <qiang.zhang1211@gmail.com>
Signed-off-by: Z qiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:12:05 -08:00
Paul E. McKenney
56eb8be144 srcu: Pull ->srcu_{un,}lock_count into a new srcu_ctr structure
This commit prepares for array-index-free srcu_read_lock*() by moving the
->srcu_{un,}lock_count fields into a new srcu_ctr structure.  This will
permit ->srcu_index to be replaced by a per-CPU pointer to this structure.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:12:05 -08:00
Paul E. McKenney
5f9e1bc50a srcu: Use ->srcu_gp_seq for rcutorture reader batch
This commit stops using ->srcu_idx for rcutorture's reader-batch
consistency checking, using ->srcu_gp_seq instead.  This is a first
step towards a faster srcu_read_{,un}lock_lite() that avoids the array
accesses that use ->srcu_idx.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:12:05 -08:00
Paul E. McKenney
da2ac56237 srcu: Make Tiny SRCU able to operate in preemptible kernels
Given that SRCU allows its read-side critical sections are not just
preemptible, but also allow general blocking, there is not much
reason to restrict Tiny SRCU to non-preemptible kernels.  This commit
therefore removes Tiny SRCU dependencies on non-preemptibility, primarily
surrounding its interaction with rcutorture and early boot.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:11:58 -08:00
Ankur Arora
83b28cfe79 rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
states for read-side critical sections via rcu_all_qs().
One reason why this was needed: lacking preempt-count, the tick
handler has no way of knowing whether it is executing in a
read-side critical section or not.

With (PREEMPT_LAZY=y, PREEMPT_DYNAMIC=n), we get (PREEMPT_COUNT=y,
PREEMPT_RCU=n). In this configuration cond_resched() is a stub and
does not provide quiescent states via rcu_all_qs().
(PREEMPT_RCU=y provides this information via rcu_read_unlock() and
its nesting counter.)

So, use the availability of preempt_count() to report quiescent states
in rcu_flavor_sched_clock_irq().

Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:01:55 -08:00
Ankur Arora
fcf0e25ad4 rcu: handle unstable rdp in rcu_read_unlock_strict()
rcu_read_unlock_strict() can be called with preemption enabled
which can make for an unstable rdp and a racy norm value.

Fix this by dropping the preempt-count in __rcu_read_unlock()
after the call to rcu_read_unlock_strict(), adjusting the
preempt-count check appropriately.

Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:01:55 -08:00
Ankur Arora
2c00e1199c sched: update __cond_resched comment about RCU quiescent states
Update comment in __cond_resched() clarifying how urgently needed
quiescent state are provided.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:01:55 -08:00
Ankur Arora
4dca1af414 rcu: rename PREEMPT_AUTO to PREEMPT_LAZY
Replace mentions of PREEMPT_AUTO with PREEMPT_LAZY.

Also, since PREMPT_LAZY implies PREEMPTION, we can reduce the
TASKS_RCU selection criteria from this:

  NEED_TASKS_RCU && (PREEMPTION || PREEMPT_AUTO)

to this:

  NEED_TASKS_RCU && PREEMPTION

CC: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:01:55 -08:00
Zilin Guan
764f6a8110 rcu: Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes()
There is one access to the per-CPU rdp->gpwrap field in the
__note_gp_changes() function that does not use READ_ONCE(), but all other
accesses do use READ_ONCE().  When using the 8*TREE03 and CONFIG_NR_CPUS=8
configuration, KCSAN found no data races at that point.  This is because
all calls to __note_gp_changes() hold rnp->lock, which excludes writes
to the rdp->gpwrap fields for all CPUs associated with that same leaf
rcu_node structure.

This commit therefore removes READ_ONCE() from rdp->gpwrap accesses
within the __note_gp_changes() function.

Signed-off-by: Zilin Guan <zilinguan811@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-04 21:52:38 -08:00
Paul E. McKenney
a3e8162105 rcu: Split rcu_report_exp_cpu_mult() mask parameter and use for tracing
This commit renames the rcu_report_exp_cpu_mult() function from "mask"
to "mask_in" and introduced a "mask" local variable to better support
upcoming event-tracing additions.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-04 21:52:38 -08:00
Paul E. McKenney
81a208c56e rcu: Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text
This commit wordsmiths the RCU_LAZY and RCU_LAZY_DEFAULT_OFF Kconfig
options' help text.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-04 21:50:06 -08:00
Paul E. McKenney
053ca72554 rcu: Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header
This commit adds a description of the energy-efficiency delays that
call_rcu() can impose, along with a pointer to call_rcu_hurry() for
latency-sensitive kernel code.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-04 21:50:06 -08:00
Paul E. McKenney
366ba3f7f9 srcu: Point call_srcu() to call_rcu() for detailed memory ordering
This commit causes the call_srcu() kernel-doc header to reference that
of call_rcu() for detailed memory-ordering guarantees.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-04 21:50:06 -08:00
Paul E. McKenney
21ef249862 rcu: Document self-propagating callbacks
This commit documents the fact that a given RCU callback function can
repost itself.

Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-04 21:50:06 -08:00
Linus Torvalds
03cc3579bc Merge tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "21 hotfixes. 8 are cc:stable and the remainder address post-6.13
  issues. 13 are for MM and 8 are for non-MM.

  All are singletons, please see the changelogs for details"

* tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
  MAINTAINERS: include linux-mm for xarray maintenance
  revert "xarray: port tests to kunit"
  MAINTAINERS: add lib/test_xarray.c
  mailmap, MAINTAINERS, docs: update Carlos's email address
  mm/hugetlb: fix hugepage allocation for interleaved memory nodes
  mm: gup: fix infinite loop within __get_longterm_locked
  mm, swap: fix reclaim offset calculation error during allocation
  .mailmap: update email address for Christopher Obbard
  kfence: skip __GFP_THISNODE allocations on NUMA systems
  nilfs2: fix possible int overflows in nilfs_fiemap()
  mm: compaction: use the proper flag to determine watermarks
  kernel: be more careful about dup_mmap() failures and uprobe registering
  mm/fake-numa: handle cases with no SRAT info
  mm: kmemleak: fix upper boundary check for physical address objects
  mailmap: add an entry for Hamza Mahfooz
  MAINTAINERS: mailmap: update Yosry Ahmed's email address
  scripts/gdb: fix aarch64 userspace detection in get_current_task
  mm/vmscan: accumulate nr_demoted for accurate demotion statistics
  ocfs2: fix incorrect CPU endianness conversion causing mount failure
  mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()
  ...
2025-02-01 09:49:20 -08:00
Liam R. Howlett
64c37e134b kernel: be more careful about dup_mmap() failures and uprobe registering
If a memory allocation fails during dup_mmap(), the maple tree can be left
in an unsafe state for other iterators besides the exit path.  All the
locks are dropped before the exit_mmap() call (in mm/mmap.c), but the
incomplete mm_struct can be reached through (at least) the rmap finding
the vmas which have a pointer back to the mm_struct.

Up to this point, there have been no issues with being able to find an
mm_struct that was only partially initialised.  Syzbot was able to make
the incomplete mm_struct fail with recent forking changes, so it has been
proven unsafe to use the mm_struct that hasn't been initialised, as
referenced in the link below.

Although 8ac662f5da ("fork: avoid inappropriate uprobe access to
invalid mm") fixed the uprobe access, it does not completely remove the
race.

This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on the
oom side (even though this is extremely unlikely to be selected as an oom
victim in the race window), and sets MMF_UNSTABLE to avoid other potential
users from using a partially initialised mm_struct.

When registering vmas for uprobe, skip the vmas in an mm that is marked
unstable.  Modifying a vma in an unstable mm may cause issues if the mm
isn't fully initialised.

Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/
Link: https://lkml.kernel.org/r/20250127170221.1761366-1-Liam.Howlett@oracle.com
Fixes: d240629148 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01 03:53:25 -08:00
Linus Torvalds
fd8c09ad0d Merge tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:

 - Support multiple hook locations for maint scripts of Debian package

 - Remove 'cpio' from the build tool requirement

 - Introduce gendwarfksyms tool, which computes CRCs for export symbols
   based on the DWARF information

 - Support CONFIG_MODVERSIONS for Rust

 - Resolve all conflicts in the genksyms parser

 - Fix several syntax errors in genksyms

* tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (64 commits)
  kbuild: fix Clang LTO with CONFIG_OBJTOOL=n
  kbuild: Strip runtime const RELA sections correctly
  kconfig: fix memory leak in sym_warn_unmet_dep()
  kconfig: fix file name in warnings when loading KCONFIG_DEFCONFIG_LIST
  genksyms: fix syntax error for attribute before init-declarator
  genksyms: fix syntax error for builtin (u)int*x*_t types
  genksyms: fix syntax error for attribute after 'union'
  genksyms: fix syntax error for attribute after 'struct'
  genksyms: fix syntax error for attribute after abstact_declarator
  genksyms: fix syntax error for attribute before nested_declarator
  genksyms: fix syntax error for attribute before abstract_declarator
  genksyms: decouple ATTRIBUTE_PHRASE from type-qualifier
  genksyms: record attributes consistently for init-declarator
  genksyms: restrict direct-declarator to take one parameter-type-list
  genksyms: restrict direct-abstract-declarator to take one parameter-type-list
  genksyms: remove Makefile hack
  genksyms: fix last 3 shift/reduce conflicts
  genksyms: fix 6 shift/reduce conflicts and 5 reduce/reduce conflicts
  genksyms: reduce type_qualifier directly to decl_specifier
  genksyms: rename cvar_qualifier to type_qualifier
  ...
2025-01-31 12:07:07 -08:00
Linus Torvalds
e20f2bb8b7 Merge tag 'audit-pr-20250130' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
 "A minor audit patch to fix an unitialized variable problem"

* tag 'audit-pr-20250130' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: Initialize lsmctx to avoid memory allocation error
2025-01-30 17:36:14 -08:00
Linus Torvalds
f55b0671e3 Merge tag 'pm-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
 "These are mostly fixes on top of the previously merged power
  management material with the addition of some teo cpuidle governor
  updates, some of which may also be regarded as fixes:

   - Add missing error handling for syscore_suspend() to the hibernation
     core code (Wentao Liang)

   - Revert a commit that added unused macros (Andy Shevchenko)

   - Synchronize the runtime PM status of devices that were runtime-
     suspended before a system-wide suspend and need to be resumed
     during the subsequent system-wide resume transition (Rafael
     Wysocki)

   - Clean up the teo cpuidle governor and make the handling of short
     idle intervals in it consistent regardless of the properties of
     idle states supplied by the cpuidle driver (Rafael Wysocki)

   - Fix some boost-related issues in cpufreq (Lifeng Zheng)

   - Fix build issues in the s3c64xx and airoha cpufreq drivers (Viresh
     Kumar)

   - Remove unconditional binding of schedutil governor kthreads to the
     affected CPUs if the cpufreq driver indicates that updates can
     happen from any CPU (Christian Loehle)"

* tag 'pm-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: core: Synchronize runtime PM status of parents and children
  cpufreq: airoha: Depends on OF
  PM: Revert "Add EXPORT macros for exporting PM functions"
  PM: hibernate: Add error handling for syscore_suspend()
  cpufreq/schedutil: Only bind threads if needed
  cpufreq: ACPI: Remove set_boost in acpi_cpufreq_cpu_init()
  cpufreq: CPPC: Fix wrong max_freq in policy initialization
  cpufreq: Introduce a more generic way to set default per-policy boost flag
  cpufreq: Fix re-boost issue after hotplugging a CPU
  cpufreq: s3c64xx: Fix compilation warning
  cpuidle: teo: Skip sleep length computation for low latency constraints
  cpuidle: teo: Replace time_span_ns with a flag
  cpuidle: teo: Simplify handling of total events count
  cpuidle: teo: Skip getting the sleep length if wakeups are very frequent
  cpuidle: teo: Simplify counting events used for tick management
  cpuidle: teo: Clarify two code comments
  cpuidle: teo: Drop local variable prev_intercept_idx
  cpuidle: teo: Combine candidate state index checks against 0
  cpuidle: teo: Reorder candidate state index checks
  cpuidle: teo: Rearrange idle state lookup code
2025-01-30 15:10:34 -08:00
Rafael J. Wysocki
a01e0f47a7 Merge branch 'pm-sleep'
Merge fixes related to system sleep for 6.14-rc1:

 - Add missing error handling for syscore_suspend() to the hibernation
   core code (Wentao Liang).

 - Revert a commit that added unused macros (Andy Shevchenko).

 - Synchronize the runtime PM status of devices that were runtime-
   suspended before a system-wide suspend and need to be resumed during
   the subsequent syste-wide resume transition (Rafael Wysocki).

* pm-sleep:
  PM: sleep: core: Synchronize runtime PM status of parents and children
  PM: Revert "Add EXPORT macros for exporting PM functions"
  PM: hibernate: Add error handling for syscore_suspend()
2025-01-30 21:28:16 +01:00
Huacai Chen
35fcac7a7c audit: Initialize lsmctx to avoid memory allocation error
When audit is enabled in a kernel build, and there are no LSMs active
that support LSM labeling, it is possible that local variable lsmctx
in the AUDIT_SIGNAL_INFO handler in audit_receive_msg() could be used
before it is properly initialize. Then kmalloc() will try to allocate
a large amount of memory with the uninitialized length.

This patch corrects this problem by initializing the lsmctx to a safe
value when it is declared, which avoid errors like:

 WARNING: CPU: 2 PID: 443 at mm/page_alloc.c:4727 __alloc_pages_noprof
        ...
    ra: 9000000003059644 ___kmalloc_large_node+0x84/0x1e0
   ERA: 900000000304d588 __alloc_pages_noprof+0x4c8/0x1040
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 00000004 (PPLV0 +PIE -PWE)
  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
  PRID: 0014c010 (Loongson-64bit, Loongson-3A5000)
 CPU: 2 UID: 0 PID: 443 Comm: auditd Not tainted 6.13.0-rc1+ #1899
        ...
 Call Trace:
 [<9000000002def6a8>] show_stack+0x30/0x148
 [<9000000002debf58>] dump_stack_lvl+0x68/0xa0
 [<9000000002e0fe18>] __warn+0x80/0x108
 [<900000000407486c>] report_bug+0x154/0x268
 [<90000000040ad468>] do_bp+0x2a8/0x320
 [<9000000002dedda0>] handle_bp+0x120/0x1c0
 [<900000000304d588>] __alloc_pages_noprof+0x4c8/0x1040
 [<9000000003059640>] ___kmalloc_large_node+0x80/0x1e0
 [<9000000003061504>] __kmalloc_noprof+0x2c4/0x380
 [<9000000002f0f7ac>] audit_receive_msg+0x764/0x1530
 [<9000000002f1065c>] audit_receive+0xe4/0x1c0
 [<9000000003e5abe8>] netlink_unicast+0x340/0x450
 [<9000000003e5ae9c>] netlink_sendmsg+0x1a4/0x4a0
 [<9000000003d9ffd0>] __sock_sendmsg+0x48/0x58
 [<9000000003da32f0>] __sys_sendto+0x100/0x170
 [<9000000003da3374>] sys_sendto+0x14/0x28
 [<90000000040ad574>] do_syscall+0x94/0x138
 [<9000000002ded318>] handle_syscall+0xb8/0x158

Fixes: 6fba89813c ("lsm: ensure the correct LSM context releaser")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
[PM: resolved excessive line length in the backtrace]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-01-29 20:02:04 -05:00
Linus Torvalds
af13ff1c33 Merge tag 'constfy-sysctl-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl table constification from Joel Granados:
 "All ctl_table declared outside of functions and that remain unmodified
  after initialization are const qualified.

  This prevents unintended modifications to proc_handler function
  pointers by placing them in the .rodata section.

  This is a continuation of the tree-wide effort started a few releases
  ago with the constification of the ctl_table struct arguments in the
  sysctl API done in 78eb4ea25c ("sysctl: treewide: constify the
  ctl_table argument of proc_handlers")"

* tag 'constfy-sysctl-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  treewide: const qualify ctl_tables where applicable
2025-01-29 10:35:40 -08:00