Commit Graph

49620 Commits

Author SHA1 Message Date
Juri Lelli
9f239df555 sched/deadline: Initialize dl_servers after SMP
dl-servers are currently initialized too early at boot when CPUs are not
fully up (only boot CPU is). This results in miscalculation of per
runqueue DEADLINE variables like extra_bw (which needs a stable CPU
count).

Move initialization of dl-servers later on after SMP has been
initialized and CPUs are all online, so that CPU count is stable and
DEADLINE variables can be computed correctly.

Fixes: d741f297bc ("sched/fair: Fair server interface")
Reported-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # nuc & rock5b
Link: https://lore.kernel.org/r/20250627115118.438797-2-juri.lelli@redhat.com
2025-07-14 10:59:32 +02:00
Aruna Ramakrishna
36569780b0 sched: Change nr_uninterruptible type to unsigned long
The commit e6fe3f422b ("sched: Make multiple runqueue task counters
32-bit") changed nr_uninterruptible to an unsigned int. But the
nr_uninterruptible values for each of the CPU runqueues can grow to
large numbers, sometimes exceeding INT_MAX. This is valid, if, over
time, a large number of tasks are migrated off of one CPU after going
into an uninterruptible state. Only the sum of all nr_interruptible
values across all CPUs yields the correct result, as explained in a
comment in kernel/sched/loadavg.c.

Change the type of nr_uninterruptible back to unsigned long to prevent
overflows, and thus the miscalculation of load average.

Fixes: e6fe3f422b ("sched: Make multiple runqueue task counters 32-bit")

Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250709173328.606794-1-aruna.ramakrishna@oracle.com
2025-07-14 10:59:31 +02:00
Zi Yan
1bc3587a88 mm/page_alloc: add support for initializing pageblock as isolated
MIGRATE_ISOLATE is a standalone bit, so a pageblock cannot be initialized
to just MIGRATE_ISOLATE.  Add init_pageblock_migratetype() to enable
initialize a pageblock with a migratetype and isolated.

Link: https://lkml.kernel.org/r/20250617021115.2331563-4-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Richard Chang <richardycc@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-13 16:38:17 -07:00
Oscar Salvador
8e1bf051c5 kernel,cpuset: use node-notifier instead of memory-notifier
cpuset is only concerned when a numa node changes its memory state, as it
needs to know the current numa nodes with memory to keep an updated
mems_allowed mask.  So stop using the memory notifier and use the new numa
node notifer instead.

Link: https://lkml.kernel.org/r/20250616135158.450136-9-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-13 16:38:16 -07:00
Vlastimil Babka
6b233784b1 mm, madvise: extract mm code from prctl_set_vma() to mm/madvise.c
Setting anon_name is done via madvise_set_anon_name() and behaves a lot of
like other madvise operations.  However, apparently because madvise() has
lacked the 4th argument and prctl() not, the userspace entry point has
been implemented via prctl(PR_SET_VMA, ...) and handled first by
prctl_set_vma().

Currently prctl_set_vma() lives in kernel/sys.c but setting the
vma->anon_name is mm-specific code so extract it to a new
set_anon_vma_name() function under mm.  mm/madvise.c seems to be the most
straightforward place as that's where madvise_set_anon_name() lives.  Stop
declaring the latter in mm.h and instead declare set_anon_vma_name().

Link: https://lkml.kernel.org/r/20250624-anon_name_cleanup-v2-2-600075462a11@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Colin Cross <ccross@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-13 16:38:13 -07:00
Linus Torvalds
0a197b7576 Merge tag 'perf_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:

 - Prevent perf_sigtrap() from observing an exiting task and warning
   about it

* tag 'perf_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix WARN in perf_sigtrap()
2025-07-13 10:34:47 -07:00
Andrew Morton
cac3d177c0 Merge branch 'mm-hotfixes-stable' into mm-stable to pick up changes which
are required for a merge of the series "mm: folio_pte_batch()
improvements".
2025-07-12 14:48:26 -07:00
Linus Torvalds
3f31a806a6 Merge tag 'mm-hotfixes-stable-2025-07-11-16-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "19 hotfixes. A whopping 16 are cc:stable and the remainder address
  post-6.15 issues or aren't considered necessary for -stable kernels.

  14 are for MM.  Three gdb-script fixes and a kallsyms build fix"

* tag 'mm-hotfixes-stable-2025-07-11-16-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  Revert "sched/numa: add statistics of numa balance task"
  mm: fix the inaccurate memory statistics issue for users
  mm/damon: fix divide by zero in damon_get_intervals_score()
  samples/damon: fix damon sample mtier for start failure
  samples/damon: fix damon sample wsse for start failure
  samples/damon: fix damon sample prcl for start failure
  kasan: remove kasan_find_vm_area() to prevent possible deadlock
  scripts: gdb: vfs: support external dentry names
  mm/migrate: fix do_pages_stat in compat mode
  mm/damon/core: handle damon_call_control as normal under kdmond deactivation
  mm/rmap: fix potential out-of-bounds page table access during batched unmap
  mm/hugetlb: don't crash when allocating a folio if there are no resv
  scripts/gdb: de-reference per-CPU MCE interrupts
  scripts/gdb: fix interrupts.py after maple tree conversion
  maple_tree: fix mt_destroy_walk() on root leaf node
  mm/vmalloc: leave lazy MMU mode on PTE mapping error
  scripts/gdb: fix interrupts display after MCP on x86
  lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
  kallsyms: fix build without execinfo
2025-07-12 10:30:47 -07:00
Jinliang Zheng
f84a15b90d locking/rwsem: Use OWNER_NONSPINNABLE directly instead of OWNER_SPINNABLE
After commit 7d43f1ce9d ("locking/rwsem: Enable time-based spinning on
reader-owned rwsem"), OWNER_SPINNABLE contains all possible values except
OWNER_NONSPINNABLE, namely OWNER_NULL | OWNER_WRITER | OWNER_READER.

Therefore, it is better to use OWNER_NONSPINNABLE directly to determine
whether to exit optimistic spin.

And, remove useless OWNER_SPINNABLE to simplify the code.

Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250610130158.4876-1-alexjlzheng@tencent.com
2025-07-11 15:11:54 -07:00
Jakub Kicinski
0cad34fb7c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc6-2).

No conflicts.

Adjacent changes:

drivers/net/wireless/mediatek/mt76/mt7925/mcu.c
  c701574c54 ("wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan")
  b3a431fe2e ("wifi: mt76: mt7925: fix off by one in mt7925_mcu_hw_scan()")

drivers/net/wireless/mediatek/mt76/mt7996/mac.c
  62da647a2b ("wifi: mt76: mt7996: Add MLO support to mt7996_tx_check_aggr()")
  dc66a129ad ("wifi: mt76: add a wrapper for wcid access with validation")

drivers/net/wireless/mediatek/mt76/mt7996/main.c
  3dd6f67c66 ("wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl()")
  8989d8e90f ("wifi: mt76: mt7996: Do not set wcid.sta to 1 in mt7996_mac_sta_event()")

net/mac80211/cfg.c
  58fcb1b428 ("wifi: mac80211: reject VHT opmode for unsupported channel widths")
  037dc18ac3 ("wifi: mac80211: add support for storing station S1G capabilities")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-11 11:42:38 -07:00
Nam Cao
9a425da913 panic: Fix up description of vpanic()
The description above vpanic() has the wrong function name. Fix it up.

Cc: John Ogness <john.ogness@linutronix.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/23a7e8add6546b155371b7e0fbb37bb1def13d6e.1752232374.git.namcao@linutronix.de
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/lkml/20250711183802.2d8c124d@canb.auug.org.au/
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-11 14:25:39 -04:00
Tao Chen
0eeeebdcc5 bpf: Remove attach_type in bpf_tracing_link
Use attach_type in bpf_link, and remove it in bpf_tracing_link.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250710032038.888700-7-chen.dylane@linux.dev
2025-07-11 11:01:08 -07:00
Tao Chen
2a76a80c7f bpf: Remove attach_type in bpf_netns_link
Use attach_type in bpf_link, and remove it in bpf_netns_link.
And move netns_type field to the end to fill the byte hole.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20250710032038.888700-6-chen.dylane@linux.dev
2025-07-11 11:01:04 -07:00
Tao Chen
6e816e1c05 bpf: Remove location field in tcx_link
Use attach_type in bpf_link to replace the location filed, and
remove location field in tcx_link.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20250710032038.888700-5-chen.dylane@linux.dev
2025-07-11 11:00:57 -07:00
Tao Chen
9b8d543dc2 bpf: Remove attach_type in bpf_cgroup_link
Use attach_type in bpf_link, and remove it in bpf_cgroup_link.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250710032038.888700-3-chen.dylane@linux.dev
2025-07-11 10:51:55 -07:00
Tao Chen
b725441f02 bpf: Add attach_type field to bpf_link
Attach_type will be set when a link is created by user. It is better to
record attach_type in bpf_link generically and have it available
universally for all link types. So add the attach_type field in bpf_link
and move the sleepable field to avoid unnecessary gap padding.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250710032038.888700-2-chen.dylane@linux.dev
2025-07-11 10:51:55 -07:00
Paul Chaignon
6279846b9b bpf: Forget ranges when refining tnum after JSET
Syzbot reported a kernel warning due to a range invariant violation on
the following BPF program.

  0: call bpf_get_netns_cookie
  1: if r0 == 0 goto <exit>
  2: if r0 & Oxffffffff goto <exit>

The issue is on the path where we fall through both jumps.

That path is unreachable at runtime: after insn 1, we know r0 != 0, but
with the sign extension on the jset, we would only fallthrough insn 2
if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to
figure this out, so the verifier walks all branches. The verifier then
refines the register bounds using the second condition and we end
up with inconsistent bounds on this unreachable path:

  1: if r0 == 0 goto <exit>
    r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff)
  2: if r0 & 0xffffffff goto <exit>
    r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0)
    r0 after reg_bounds_sync:  u64=[0x1, 0] var_off=(0, 0)

Improving the range refinement for JSET to cover all cases is tricky. We
also don't expect many users to rely on JSET given LLVM doesn't generate
those instructions. So instead of improving the range refinement for
JSETs, Eduard suggested we forget the ranges whenever we're narrowing
tnums after a JSET. This patch implements that approach.

Reported-by: syzbot+c711ce17dd78e5d4fdcf@syzkaller.appspotmail.com
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/9d4fd6432a095d281f815770608fdcd16028ce0b.1752171365.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-11 10:45:25 -07:00
Emil Tsalapatis
8fc3d2d8b5 bpf/arena: add bpf_arena_reserve_pages kfunc
Add a new BPF arena kfunc for reserving a range of arena virtual
addresses without backing them with pages. This prevents the range from
being populated using bpf_arena_alloc_pages().

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250709191312.29840-2-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-11 10:43:54 -07:00
David Kaplan
19c24f7ee3 cpu: Define attack vectors
Define 4 new attack vectors that are used for controlling CPU speculation
mitigations.  These may be individually disabled as part of the
mitigations= command line.  Attack vector controls are combined with global
options like 'auto' or 'auto,nosmt' like 'mitigations=auto,no_user_kernel'.
The global options come first in the mitigations= string.

Cross-thread mitigations can either remain enabled fully, including
potentially disabling SMT ('auto,nosmt'), remain enabled except for
disabling SMT ('auto'), or entirely disabled through the new
'no_cross_thread' attack vector option.

The default settings for these attack vectors are consistent with existing
kernel defaults, other than the automatic disabling of VM-based attack
vectors if KVM support is not present.

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250707183316.1349127-3-david.kaplan@amd.com
2025-07-11 17:55:16 +02:00
Linus Torvalds
a0f8361c3c Merge tag 'dma-mapping-6.16-2025-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fix from Marek Szyprowski:

 - small fix relevant to arm64 server and custom CMA configuration (Feng
   Tang)

* tag 'dma-mapping-6.16-2025-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma-contiguous: hornor the cma address limit setup by user
2025-07-11 08:49:25 -07:00
Sebastian Andrzej Siewior
760e6f7bef futex: Remove support for IMMUTABLE
The FH_FLAG_IMMUTABLE flag was meant to avoid the reference counting on
the private hash and so to avoid the performance regression on big
machines.
With the switch to per-CPU counter this is no longer needed. That flag
was never useable on any released kernel.

Remove any support for IMMUTABLE while preserve the flags argument and
enforce it to be zero.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250710110011.384614-5-bigeasy@linutronix.de
2025-07-11 16:02:01 +02:00
Sebastian Andrzej Siewior
fb3c553da7 futex: Make futex_private_hash_get() static
futex_private_hash_get() is not used outside if its compilation unit.
Make it static.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250710110011.384614-4-bigeasy@linutronix.de
2025-07-11 16:02:00 +02:00
Peter Zijlstra
56180dd20c futex: Use RCU-based per-CPU reference counting instead of rcuref_t
The use of rcuref_t for reference counting introduces a performance bottleneck
when accessed concurrently by multiple threads during futex operations.

Replace rcuref_t with special crafted per-CPU reference counters. The
lifetime logic remains the same.

The newly allocate private hash starts in FR_PERCPU state. In this state, each
futex operation that requires the private hash uses a per-CPU counter (an
unsigned int) for incrementing or decrementing the reference count.

When the private hash is about to be replaced, the per-CPU counters are
migrated to a atomic_t counter mm_struct::futex_atomic.
The migration process:
- Waiting for one RCU grace period to ensure all users observe the
  current private hash. This can be skipped if a grace period elapsed
  since the private hash was assigned.

- futex_private_hash::state is set to FR_ATOMIC, forcing all users to
  use mm_struct::futex_atomic for reference counting.

- After a RCU grace period, all users are guaranteed to be using the
  atomic counter. The per-CPU counters can now be summed up and added to
  the atomic_t counter. If the resulting count is zero, the hash can be
  safely replaced. Otherwise, active users still hold a valid reference.

- Once the atomic reference count drops to zero, the next futex
  operation will switch to the new private hash.

call_rcu_hurry() is used to speed up transition which otherwise might be
delay with RCU_LAZY. There is nothing wrong with using call_rcu(). The
side effects would be that on auto scaling the new hash is used later
and the SET_SLOTS prctl() will block longer.

[bigeasy: commit description + mm get/ put_async]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250710110011.384614-3-bigeasy@linutronix.de
2025-07-11 16:02:00 +02:00
Nam Cao
a6b0465bd2 irqdomain: Export irq_domain_free_irqs_top()
Export irq_domain_free_irqs_top(), making it usable for drivers compiled as
modules.

Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-07-11 12:57:00 +01:00
Rafael J. Wysocki
fd4716a151 Merge back earlier changes related to system suspend and hibernation 2025-07-11 11:39:59 +02:00
Jakub Kicinski
3321e97eab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc6).

No conflicts.

Adjacent changes:

Documentation/devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml
  0a12c435a1 ("dt-bindings: net: sun8i-emac: Add A100 EMAC compatible")
  b3603c0466 ("dt-bindings: net: sun8i-emac: Rename A523 EMAC0 to GMAC0")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10 10:10:49 -07:00
Samuel Zhang
2640e81947 PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
When hibernate with data center dGPUs, huge number of VRAM data will be
moved to shmem during dev_pm_ops.prepare(). These shmem pages take a lot
of system memory so that there's no enough free memory for creating the
hibernation image. This will cause hibernation fail and abort.

After dev_pm_ops.prepare(), call shrink_all_memory() to force move shmem
pages to swap disk and reclaim the pages, so that there's enough system
memory for hibernation image and less pages needed to copy to the image.

This patch can only flush and free about half shmem pages. It will be
better to flush and free more pages, even all of shmem pages, so that
there're less pages to be copied to the hibernation image and the overall
hibernation time can be reduced.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20250710062313.3226149-4-guoqing.zhang@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-07-10 10:49:45 -05:00
Petr Mladek
d18d7989e3 printk: kunit: Fix __counted_by() in struct prbtest_rbdata
__counted_by() has to point to a variable which defines the size
of the related array. The code must never access the array
beyond this limit.

struct prbtest_rbdata currently stores the length of the string.
And the code access the array beyond the limit when writing
or reading the trailing '\0'.

Store the size of the string, including the trailing '\0' if
we wanted to keep __counted_by().

Consistently use "_size" suffix when the trailing '\0' is counted.
Note that MAX_RBDATA_TEXT_SIZE was originally used to limit
the text length.

When touching the code, make sure that @text_size produced by
get_random_u32_inclusive() stays within the limits.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/eaea66b9-266a-46e7-980d-33f40ad4b215@sabinyo.mountain
Suggested-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20250702095157.110916-4-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-07-10 17:12:21 +02:00
Petr Mladek
254e8fb5e6 printk: ringbuffer: Explain why the KUnit test ignores failed writes
The KUnit test ignores prb_reserve() failures on purpose. It tries
to push the ringbuffer beyond limits.

Note that it is a know problem that writes might fail in this situation.
printk() tries to prevent this problem by:

  + allocating big enough data buffer, see log_buf_add_cpu().

  + allocating enough descriptors by using small enough average
    record, see PRB_AVGBITS.

  + storing the record with disabled interrupts, see vprintk_store().

Also the amount of printk() messages is always somehow bound in
practice. And they are serialized when they are printed from
many CPUs on purpose, for example, when printing backtraces.

Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20250702095157.110916-2-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-07-10 17:11:57 +02:00
Tudor Ambarus
f747cde5e7 PM: sleep: add kernel parameter to disable asynchronous suspend/resume
On some platforms, device dependencies are not properly represented by
device links, which can cause issues when asynchronous power management
is enabled. While it is possible to disable this via sysfs, doing so
at runtime can race with the first system suspend event.

This patch introduces a kernel command-line parameter, "pm_async", which
can be set to "off" to globally disable asynchronous suspend and resume
operations from early boot. It effectively provides a way to set the
initial value of the existing pm_async sysfs knob at boot time. This
offers a robust method to fall back to synchronous (sequential)
operation, which can stabilize platforms with problematic dependencies
and also serve as a useful debugging tool.

The default behavior remains unchanged (asynchronous enabled). To
disable it, boot the kernel with the "pm_async=off" parameter.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20250709-pm-async-off-v3-1-cb69a6fc8d04@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-07-10 14:40:07 +02:00
Jiazi Li
fed307b67c kthread: update comment for __to_kthread
With commit 343f4c49f2 ("kthread: Don't allocate kthread_struct for init
and umh") and commit 753550eb0c ("fork: Explicitly set PF_KTHREAD"), umh
task no longer have struct kthread and PF_KTHREAD flag.

Update the comment to describe what the current rules are to detect
is something is a kthread.

Link: https://lkml.kernel.org/r/20250620100801.23185-1-jqqlijiazi@gmail.com
Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com>
Signed-off-by: mingzhu.wang <mingzhu.wang@transsion.com>
Suggested-by Eric W . Biederman <ebiederm@xmission.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:55 -07:00
Pasha Tatashin
64960497ea fork: clean up ifdef logic around stack allocation
There is an unneeded OR in the ifdef functions that are used to allocate
and free kernel stacks based on direct map or vmap.  Adding dynamic stack
support would complicate this logic even further.

Therefore, clean up by changing the order so OR is no longer needed.

Link: https://lkml.kernel.org/r/20250618-fork-fixes-v4-1-2e05a2e1f5fc@linaro.org
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/20240311164638.2015063-3-pasha.tatashin@soleen.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:54 -07:00
Jiri Olsa
aa644c4052 uprobes: revert ref_ctr_offset in uprobe_unregister error path
There's error path that could lead to inactive uprobe:

  1) uprobe_register succeeds - updates instruction to int3 and
     changes ref_ctr from 0 to 1
  2) uprobe_unregister fails  - int3 stays in place, but ref_ctr
     is changed to 0 (it's not restored to 1 in the fail path)
     uprobe is leaked
  3) another uprobe_register comes and re-uses the leaked uprobe
     and succeds - but int3 is already in place, so ref_ctr update
     is skipped and it stays 0 - uprobe CAN NOT be triggered now
  4) uprobe_unregister fails because ref_ctr value is unexpected

Fix this by reverting the updated ref_ctr value back to 1 in step 2),
which is the case when uprobe_unregister fails (int3 stays in place), but
we have already updated refctr.

The new scenario will go as follows:

  1) uprobe_register succeeds - updates instruction to int3 and
     changes ref_ctr from 0 to 1
  2) uprobe_unregister fails  - int3 stays in place and ref_ctr
     is reverted to 1..  uprobe is leaked
  3) another uprobe_register comes and re-uses the leaked uprobe
     and succeds - but int3 is already in place, so ref_ctr update
     is skipped and it stays 1 - uprobe CAN be triggered now
  4) uprobe_unregister succeeds

Link: https://lkml.kernel.org/r/20250514101809.2010193-1-jolsa@kernel.org
Fixes: 1cc33161a8 ("uprobes: Support SDT markers having reference count (semaphore)")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:53 -07:00
Fushuai Wang
d71b90e5ba exit: fix misleading comment in forget_original_parent()
The commit 482a3767e5 ("exit: reparent: call forget_original_parent()
under tasklist_lock") moved the comment from exit_notify() to
forget_original_parent().  However, the forget_original_parent() only
handles (A), while (B) is handled in kill_orphaned_pgrp().  So remove the
unrelated part.

Link: https://lkml.kernel.org/r/20250615030930.58051-1-wangfushuai@baidu.com
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: wangfushuai <wangfushuai@baidu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:52 -07:00
Wei Nanxin
ad2c8079e9 kcov: fix typo in comment of kcov_fault_in_area
change '__santizer_cov_trace_pc()' to '__sanitizer_cov_trace_pc()'

Link: https://lkml.kernel.org/r/20250615123237.110144-1-n9winx@163.com
Signed-off-by: Wei Nanxin <n9winx@163.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Macro Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:52 -07:00
Jason Xing
19f3cb64a2 relayfs: support a counter tracking if data is too big to write
It really doesn't matter if the user/admin knows what the last too big
value is.  Record how many times this case is triggered would be helpful.

Solve the existing issue where relay_reset() doesn't restore the value.

Store the counter in the per-cpu buffer structure instead of the global
buffer structure.  It also solves the racy condition which is likely to
happen when a few of per-cpu buffers encounter the too big data case and
then access the global field last_toobig without lock protection.

Remove the printk in relay_close() since kernel module can directly call
relay_stats() as they want.

Link: https://lkml.kernel.org/r/20250612061201.34272-6-kerneljasonxing@gmail.com
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Yushan Zhou <katrinzhou@tencent.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:52 -07:00
Jason Xing
7f2173894f blktrace: use rbuf->stats.full as a drop indicator in relayfs
Replace internal subbuf_start in blktrace with the default policy in
relayfs.

Remove dropped field from struct blktrace.  Correspondingly, call the
common helper in relay.  By incrementing full_count to keep track of how
many times we encountered a full buffer issue, user space will know how
many events were lost.

Link: https://lkml.kernel.org/r/20250612061201.34272-5-kerneljasonxing@gmail.com
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Yushan Zhou <katrinzhou@tencent.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:52 -07:00
Jason Xing
a53202ce7f relayfs: introduce getting relayfs statistics function
In this version, only support getting the counter for buffer full and
implement the framework of how it works.

Users can pass certain flag to fetch what field/statistics they expect to
know.  Each time it only returns one result.  So do not pass multiple
flags.

Link: https://lkml.kernel.org/r/20250612061201.34272-4-kerneljasonxing@gmail.com
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Yushan Zhou <katrinzhou@tencent.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:51 -07:00
Jason Xing
ca01a90ae7 relayfs: support a counter tracking if per-cpu buffers is full
When using relay mechanism, we often encounter the case where new data are
lost or old unconsumed data are overwritten because of slow reader.

Add 'full' field in per-cpu buffer structure to detect if the above case
is happening.  Relay has two modes: 1) non-overwrite mode, 2) overwrite
mode.  So buffer being full here respectively means: 1) relayfs doesn't
intend to accept new data and then simply drop them, or 2) relayfs is
going to start over again and overwrite old unread data with new data.

Note: this counter doesn't need any explicit lock to protect from being
modified by different threads for the better performance consideration. 
Writers calling __relay_write/relay_write should consider how to use the
lock and ensure it performs under the lock protection, thus it's not
necessary to add a new small lock here.

Link: https://lkml.kernel.org/r/20250612061201.34272-3-kerneljasonxing@gmail.com
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Yushan Zhou <katrinzhou@tencent.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:51 -07:00
Jason Xing
2489e95812 relayfs: abolish prev_padding
Patch series "relayfs: misc changes", v5.

The series mostly focuses on the error counters which helps every user
debug their own kernel module.


This patch (of 5):

prev_padding represents the unused space of certain subbuffer.  If the
content of a call of relay_write() exceeds the limit of the remainder of
this subbuffer, it will skip storing in the rest space and record the
start point as buf->prev_padding in relay_switch_subbuf().  Since the buf
is a per-cpu big buffer, the point of prev_padding as a global value for
the whole buffer instead of a single subbuffer (whose padding info is
stored in buf->padding[]) seems meaningless from the real use cases, so we
don't bother to record it any more.

Link: https://lkml.kernel.org/r/20250612061201.34272-1-kerneljasonxing@gmail.com
Link: https://lkml.kernel.org/r/20250612061201.34272-2-kerneljasonxing@gmail.com
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Yushan Zhou <katrinzhou@tencent.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:51 -07:00
Elijah Wright
0ba5a25ad1 kernel: relay: use __GFP_ZERO in relay_alloc_buf
Passing the __GFP_ZERO flag to alloc_page should result in less overhead
th= an using memset()

Link: https://lkml.kernel.org/r/20250610225639.314970-3-git@elijahs.space
Signed-off-by: Elijah Wright <git@elijahs.space>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:50 -07:00
Linus Walleij
f7b0ff2bc9 fork: define a local GFP_VMAP_STACK
The current allocation of VMAP stack memory is using (THREADINFO_GFP &
~__GFP_ACCOUNT) which is a complicated way of saying (GFP_KERNEL |
__GFP_ZERO):

<linux/thread_info.h>:
define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO)
<linux/gfp_types.h>:
define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)

This is an unfortunate side-effect of independent changes blurring the
picture:

commit 19809c2da2 changed (THREADINFO_GFP |
__GFP_HIGHMEM) to just THREADINFO_GFP since highmem became implicit.

commit 9b6f7e163c then added stack caching
and rewrote the allocation to (THREADINFO_GFP & ~__GFP_ACCOUNT) as cached
stacks need to be accounted separately.  However that code, when it
eventually accounts the memory does this:

  ret = memcg_kmem_charge(vm->pages[i], GFP_KERNEL, 0)

so the memory is charged as a GFP_KERNEL allocation.

Define a unique GFP_VMAP_STACK to use
GFP_KERNEL | __GFP_ZERO and move the comment there.

Link: https://lkml.kernel.org/r/20250509-gfp-stack-v1-1-82f6f7efc210@linaro.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reported-by: Mateusz Guzik <mjguzik@gmail.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:50 -07:00
Pasha Tatashin
449e0b4ed5 fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code
There are two data types: "struct vm_struct" and "struct vm_stack" that
have the same local variable names: vm_stack, or vm, or s, which makes the
code confusing to read.

Change the code so the naming is consistent:

struct vm_struct is always called vm_area
struct vm_stack is always called vm_stack

One change altering vfree(vm_stack) to vfree(vm_area->addr) may look like
a semantic change but it is not: vm_area->addr points to the vm_stack. 
This was done to improve readability.

[linus.walleij@linaro.org: rebased and added new users of the variable names, address review comments]
Link: https://lore.kernel.org/20240311164638.2015063-4-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20250509-fork-fixes-v3-2-e6c69dd356f2@linaro.org
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:50 -07:00
Alistair Popple
6b4a80e424 mm: filter zone device pages returned from folio_walk_start()
Previously dax pages were skipped by the pagewalk code as pud_special() or
vm_normal_page{_pmd}() would be false for DAX pages.  Now that dax pages
are refcounted normally that is no longer the case, so the pagewalk code
will start returning them.

Most callers already explicitly filter for DAX or zone device pages so
don't need updating.  However some don't, so add checks to those callers.

Link: https://lkml.kernel.org/r/4ecb7b357fc5b435588024770b88bbb695c30090.1750323463.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Björn Töpel <bjorn@rivosinc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Inki Dae <m.szyprowski@samsung.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Groves <john@groves.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:16 -07:00
Shivank Garg
a984f16fba mm: use folio_expected_ref_count() helper for reference counting
Replace open-coded folio reference count calculations with the
folio_expected_ref_count().

No functional changes intended.

Link: https://lkml.kernel.org/r/20250611052706.515408-2-shivankg@amd.com
Signed-off-by: Shivank Garg <shivankg@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Namhyung kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:08 -07:00
Chen Yu
db6cc3f4ac Revert "sched/numa: add statistics of numa balance task"
This reverts commit ad6b26b6a0.

This commit introduces per-memcg/task NUMA balance statistics, but
unfortunately it introduced a NULL pointer exception due to the following
race condition: After a swap task candidate was chosen, its mm_struct
pointer was set to NULL due to task exit.  Later, when performing the
actual task swapping, the p->mm caused the problem.

CPU0                                   CPU1
:
...
task_numa_migrate
     task_numa_find_cpu
      task_numa_compare
        # a normal task p is chosen
        env->best_task = p

                                          # p exit:
                                          exit_signals(p);
                                             p->flags |= PF_EXITING
                                          exit_mm
                                             p->mm = NULL;

      migrate_swap_stop
        __migrate_swap_task((arg->src_task, arg->dst_cpu)
         count_memcg_event_mm(p->mm, NUMA_TASK_SWAP)# p->mm is NULL

task_lock() should be held and the PF_EXITING flag needs to be checked to
prevent this from happening.  After discussion, the conclusion was that
adding a lock is not worthwhile for some statistics calculations.  Revert
the change and rely on the tracepoint for this purpose.

Link: https://lkml.kernel.org/r/20250704135620.685752-1-yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/20250708064917.BBD13C4CEED@smtp.kernel.org
Fixes: ad6b26b6a0 ("sched/numa: add statistics of numa balance task")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reported-by: Jirka Hladky <jhladky@redhat.com>
Closes: https://lore.kernel.org/all/CAE4VaGBLJxpd=NeRJXpSCuw=REhC5LWJpC29kDy-Zh2ZDyzQZA@mail.gmail.com/
Reported-by: Srikanth Aithal <Srikanth.Aithal@amd.com>
Reported-by: Suneeth D <Suneeth.D@amd.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Hladky <jhladky@redhat.com>
Cc: Libo Chen <libo.chen@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 21:07:56 -07:00
Artem Sadovnikov
437889b926 fgraph: Make pid_str size match the comment
The comment above buffer mentions sign, 10 bytes width for number and null
terminator, but buffer itself isn't large enough to hold that much data.

This is a cosmetic change, since PID cannot be negative, other than -1.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Link: https://lore.kernel.org/20250617152110.2530-1-a.sadovnikov@ispras.ru
Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-09 20:39:00 -04:00
Masami Hiramatsu (Google)
ca296d32ec tracing: ring_buffer: Rewind persistent ring buffer on reboot
Rewind persistent ring buffer pages which have been read in the previous
boot. Those pages are highly possible to be lost before writing it to the
disk if the previous kernel crashed. In this case, the trace data is kept
on the persistent ring buffer, but it can not be read because its commit
size has been reset after read.  This skips clearing the commit size of
each sub-buffer and recover it after reboot.

Note: If you read the previous boot data via trace_pipe, that is not
accessible in that time. But reboot without clearing (or reusing) the read
data, the read data is recovered again in the next boot.

Thus, when you read the previous boot data, clear it by `echo > trace`.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/174899582116.955054.773265393511190051.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-09 20:38:42 -04:00
Nam Cao
fac5493251 rv: Allow to configure the number of per-task monitor
Now that there are 2 monitors for real-time applications, users may want to
enable both of them simultaneously. Make the number of per-task monitor
configurable. Default it to 2 for now.

Cc: John Ogness <john.ogness@linutronix.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/93e83313fc4ba7f6e66f4abe80ca5f5494d658d0.1752088709.git.namcao@linutronix.de
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-09 15:27:02 -04:00
Nam Cao
f74f8bb246 rv: Add rtapp_sleep monitor
Add a monitor for checking that real-time tasks do not go to sleep in a
manner that may cause undesirable latency.

Also change
	RV depends on TRACING
to
	RV select TRACING
to avoid the following recursive dependency:

 error: recursive dependency detected!
	symbol TRACING is selected by PREEMPTIRQ_TRACEPOINTS
	symbol PREEMPTIRQ_TRACEPOINTS depends on TRACE_IRQFLAGS
	symbol TRACE_IRQFLAGS is selected by RV_MON_SLEEP
	symbol RV_MON_SLEEP depends on RV
	symbol RV depends on TRACING

Cc: John Ogness <john.ogness@linutronix.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/75bc5bcc741d153aa279c95faf778dff35c5c8ad.1752088709.git.namcao@linutronix.de
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-09 15:27:01 -04:00