Commit Graph

49605 Commits

Author SHA1 Message Date
Nam Cao
3d3c376118 rv: Merge struct rv_reactor_def into struct rv_reactor
Each struct rv_reactor has a unique struct rv_reactor_def associated with
it. struct rv_reactor is statically allocated, while struct rv_reactor_def
is dynamically allocated.

This makes the code more complicated than it should be:

  - Lookup is required to get the associated rv_reactor_def from rv_reactor

  - Dynamic memory allocation is required for rv_reactor_def. This is
    harder to get right compared to static memory. For instance, there is
    an existing mistake: rv_unregister_reactor() does not free the memory
    allocated by rv_register_reactor(). This is fortunately not a real
    memory leak problem as rv_unregister_reactor() is never called.

Simplify and merge rv_reactor_def into rv_reactor.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/71cb91c86cd40df5b8c492b788787f2a73c3eaa3.1753378331.git.namcao@linutronix.de
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25 09:04:14 -04:00
Nam Cao
24cbfe18d5 rv: Merge struct rv_monitor_def into struct rv_monitor
Each struct rv_monitor has a unique struct rv_monitor_def associated with
it. struct rv_monitor is statically allocated, while struct rv_monitor_def
is dynamically allocated.

This makes the code more complicated than it should be:

  - Lookup is required to get the associated rv_monitor_def from rv_monitor

  - Dynamic memory allocation is required for rv_monitor_def. This is
    harder to get right compared to static memory. For instance, there is
    an existing mistake: rv_unregister_monitor() does not free the memory
    allocated by rv_register_monitor(). This is fortunately not a real
    memory leak problem, as rv_unregister_monitor() is never called.

Simplify and merge rv_monitor_def into rv_monitor.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/194449c00f87945c207aab4c96920c75796a4f53.1753378331.git.namcao@linutronix.de
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25 09:04:14 -04:00
Nam Cao
b0c08dd534 rv: Remove unused field in struct rv_monitor_def
rv_monitor_def::task_monitor is not used. Delete it.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/502d94f2696435690a2b1fdbe80a9e56c96fcabf.1753378331.git.namcao@linutronix.de
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25 09:04:13 -04:00
Linus Torvalds
2942242dde Merge tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "11 hotfixes. 9 are cc:stable and the remainder address post-6.15
  issues or aren't considered necessary for -stable kernels.

  7 are for MM"

* tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  sprintf.h requires stdarg.h
  resource: fix false warning in __request_region()
  mm/damon/core: commit damos_quota_goal->nid
  kasan: use vmalloc_dump_obj() for vmalloc error reports
  mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show()
  mm: update MAINTAINERS entry for HMM
  nilfs2: reject invalid file types when reading inodes
  selftests/mm: fix split_huge_page_test for folio_split() tests
  mailmap: add entry for Senozhatsky
  mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n
  mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
2025-07-24 19:13:30 -07:00
Samiullah Khawaja
71c52411c5 net: Create separate gro_flush_normal function
Move multiple copies of same code snippet doing `gro_flush` and
`gro_normal_list` into separate helper function.

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250723013031.2911384-2-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 18:34:55 -07:00
Steven Rostedt
8c4e53a1a0 tracing: Call trace_ftrace_test_filter() for the event
The trace event filter bootup self test tests a bunch of filter logic
against the ftrace_test_filter event, but does not actually call the
event. Work is being done to cause a warning if an event is defined but
not used. To quiet the warning call the trace event under an if statement
where it is disabled so it doesn't get optimized out.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas.schier@linux.dev>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20250723194212.274458858@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24 21:33:35 -04:00
Jakub Kicinski
a4f5759b6f Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-07-24

We've added 3 non-merge commits during the last 3 day(s) which contain
a total of 4 files changed, 40 insertions(+), 15 deletions(-).

The main changes are:

1) Improved verifier error message for incorrect narrower load from
   pointer field in ctx, from Paul Chaignon.

2) Disabled migration in nf_hook_run_bpf to address a syzbot report,
   from Kuniyuki Iwashima.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Test invalid narrower ctx load
  bpf: Reject narrower access to pointer ctx fields
  bpf: Disable migration in nf_hook_run_bpf().
====================

Link: https://patch.msgid.link/20250724173306.3578483-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 18:02:24 -07:00
Akinobu Mita
91a229bb7b resource: fix false warning in __request_region()
A warning is raised when __request_region() detects a conflict with a
resource whose resource.desc is IORES_DESC_DEVICE_PRIVATE_MEMORY.

But this warning is only valid for iomem_resources.
The hmem device resource uses resource.desc as the numa node id, which can
cause spurious warnings.

This warning appeared on a machine with multiple cxl memory expanders. 
One of the NUMA node id is 6, which is the same as the value of
IORES_DESC_DEVICE_PRIVATE_MEMORY.

In this environment it was just a spurious warning, but when I saw the
warning I suspected a real problem so it's better to fix it.

This change fixes this by restricting the warning to only iomem_resource.
This also adds a missing new line to the warning message.

Link: https://lkml.kernel.org/r/20250719112604.25500-1-akinobu.mita@gmail.com
Fixes: 7dab174e2e ("dax/hmem: Move hmem device registration to dax_hmem.ko")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-24 17:57:59 -07:00
Kees Cook
8245d47cfa x86: Handle KCOV __init vs inline mismatches
GCC appears to have kind of fragile inlining heuristics, in the
sense that it can change whether or not it inlines something based on
optimizations. It looks like the kcov instrumentation being added (or in
this case, removed) from a function changes the optimization results,
and some functions marked "inline" are _not_ inlined. In that case,
we end up with __init code calling a function not marked __init, and we
get the build warnings I'm trying to eliminate in the coming patch that
adds __no_sanitize_coverage to __init functions:

WARNING: modpost: vmlinux: section mismatch in reference: xbc_exit+0x8 (section: .text.unlikely) -> _xbc_exit (section: .init.text)
WARNING: modpost: vmlinux: section mismatch in reference: real_mode_size_needed+0x15 (section: .text.unlikely) -> real_mode_blob_end (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: __set_percpu_decrypted+0x16 (section: .text.unlikely) -> early_set_memory_decrypted (section: .init.text)
WARNING: modpost: vmlinux: section mismatch in reference: memblock_alloc_from+0x26 (section: .text.unlikely) -> memblock_alloc_try_nid (section: .init.text)
WARNING: modpost: vmlinux: section mismatch in reference: acpi_arch_set_root_pointer+0xc (section: .text.unlikely) -> x86_init (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: acpi_arch_get_root_pointer+0x8 (section: .text.unlikely) -> x86_init (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: efi_config_table_is_usable+0x16 (section: .text.unlikely) -> xen_efi_config_table_is_usable (section: .init.text)

This problem is somewhat fragile (though using either __always_inline
or __init will deterministically solve it), but we've tripped over
this before with GCC and the solution has usually been to just use
__always_inline and move on.

For x86 this means forcing several functions to be inline with
__always_inline.

Link: https://lore.kernel.org/r/20250724055029.3623499-2-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-24 16:55:11 -07:00
Jakub Kicinski
8b5a19b4ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc8).

Conflicts:

drivers/net/ethernet/microsoft/mana/gdma_main.c
  9669ddda18 ("net: mana: Fix warnings for missing export.h header inclusion")
  7553911210 ("net: mana: Allocate MSI-X vectors dynamically")
https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/ti/icssg/icssg_prueth.h
  6e86fb73de ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG")
  ffe8a49091 ("net: ti: icssg-prueth: Read firmware-names from device tree")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 11:10:46 -07:00
Gabriele Monaco
58d5f0d437 rv: Return init error when registering monitors
Monitors generated with dot2k have their registration function (the one
called during monitor initialisation) return always 0, even if the
registration failed on RV side.
This can hide potential errors.

Return the value returned by the RV register function.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250723161240.194860-6-gmonaco@redhat.com
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24 10:43:46 -04:00
Gabriele Monaco
560473f2e2 verification/rvgen: Organise Kconfig entries for nested monitors
The current behaviour of rvgen when running with the -a option is to
append the necessary lines at the end of the configuration for Kconfig,
Makefile and tracepoints.
This is not always the desired behaviour in case of nested monitors:
while tracepoints are not affected by nesting and the Makefile's only
requirement is that the parent monitor is built before its children, in
the Kconfig it is better to have children defined right after their
parent, otherwise the result has wrong indentation:

[*]   foo_parent monitor
[*]     foo_child1 monitor
[*]     foo_child2 monitor
[*]   bar_parent monitor
[*]     bar_child1 monitor
[*]     bar_child2 monitor
[*]   foo_child3 monitor
[*]   foo_child4 monitor

Adapt rvgen to look for a different marker for nested monitors in the
Kconfig file and append the line right after the last sibling, instead
of the last monitor.
Also add the marker when creating a new parent monitor.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250723161240.194860-5-gmonaco@redhat.com
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24 10:43:46 -04:00
Gabriele Monaco
9efcf59082 tools/dot2c: Fix generated files going over 100 column limit
The dot2c.py script generates all states in a single line. This breaks the
100 column limit when the state machines are non-trivial.

Change dot2c.py to generate the states in separate lines in case the
generated line is going to be too long.

Also adapt existing monitors with line length over the limit.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250723161240.194860-4-gmonaco@redhat.com
Suggested-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24 10:43:46 -04:00
Steven Rostedt
dabd3e7dcc tracing: Have eprobes handle arrays
eprobes are dynamic events that can read other events using their fields
to create new events. Currently it doesn't work with arrays. When the new
event field is attached to the old event field, it looks at the size of
the field to determine what type of field the new field should be. For 1
byte fields it's a char, for 2 bytes, it's a short and for 4 bytes it's an
integer. For all other sizes it just defaults to "long". This also reads
the contents of the field for such cases.

For arrays that are bigger than the size of long, return the value of the
address of the content itself. This will allow eprobes to read other
values in the array of the old event.

This is useful when raw_syscalls is enabled but the syscall events are
not. The syscall events are created from the raw_syscalls as they have an
array of "args" that holds the 6 long words passed to the syscall entry
point. To read the value of "filename" from sys_openat, the eprobe could
attach to the raw_syscall and read the second value.

It can then even be passed to a synthetic event and converted back to
another eprobe to get the value of "filename" after it has been read by
the kernel during the system call:

 [
   Create an eprobe called "sys" and attach it to sys_enter.
   Read the id of the system call and the second argument
 ]
 # echo 'e:sys raw_syscalls.sys_enter nr=$id:u32 arg2=+8($args):u64' >> /sys/kernel/tracing/dynamic_events

 [
   Create a synthetic event "path" that will hold the address of the
   sys_openat filename. This is on a 64bit machine, so make it 64 bits
 ]
 # echo 's:path u64 file;' >> /sys/kernel/tracing/dynamic_events

 [
   Add a histogram to the eprobe/sys which tiggers if the "nr" field is
   257 (sys_openat), and save the filename in the "file" variable.
 ]
 # echo 'hist:keys=common_pid:file=arg2 if nr == 257' > /sys/kernel/tracing/events/eprobes/sys/trigger

 [
   Attach a histogram to sys_exit event that triggers the "path" synthetic
   event and records the "filename" that was passed from the sys eprobe.
 ]
 # echo 'hist:keys=common_pid:f=$file:onmatch(eprobes.sys).trace(path,$f)' >> /sys/kernel/tracing/events/raw_syscalls/sys_exit/trigger

 [
   Create another eprobe that dereferences the "file" field as a user
   space string and displays it.
 ]
 # echo 'e:open synthetic.path file=+0($file):ustring' >> /sys/kernel/tracing/dynamic_events

 # echo 1 > /sys/kernel/tracing/events/eprobes/open/enable
 # cat trace_pipe
             cat-1142    [003] ...5.   799.521912: open: (synthetic.path) file="/etc/ld.so.cache"
             cat-1142    [003] ...5.   799.521934: open: (synthetic.path) file="/etc/ld.so.cache"
             cat-1142    [003] ...5.   799.522065: open: (synthetic.path) file="/etc/ld.so.cache"
             cat-1142    [003] ...5.   799.522080: open: (synthetic.path) file="/etc/ld.so.cache"
             cat-1142    [003] ...5.   799.522296: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
             cat-1142    [003] ...5.   799.522319: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
            less-1143    [005] ...5.   799.522327: open: (synthetic.path) file="/etc/ld.so.cache"
             cat-1142    [003] ...5.   799.522333: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
             cat-1142    [003] ...5.   799.522348: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
            less-1143    [005] ...5.   799.522349: open: (synthetic.path) file="/etc/ld.so.cache"
             cat-1142    [003] ...5.   799.522363: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
            less-1143    [005] ...5.   799.522477: open: (synthetic.path) file="/etc/ld.so.cache"
             cat-1142    [003] ...5.   799.522489: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
            less-1143    [005] ...5.   799.522492: open: (synthetic.path) file="/etc/ld.so.cache"
            less-1143    [005] ...5.   799.522720: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libtinfo.so.6"
            less-1143    [005] ...5.   799.522744: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libtinfo.so.6"
            less-1143    [005] ...5.   799.522759: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libtinfo.so.6"
             cat-1142    [003] ...5.   799.522850: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"

Link: https://lore.kernel.org/all/20250723124202.4f7475be@batman.local.home/

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-07-24 22:57:32 +09:00
Paul Chaignon
e09299225d bpf: Reject narrower access to pointer ctx fields
The following BPF program, simplified from a syzkaller repro, causes a
kernel warning:

    r0 = *(u8 *)(r1 + 169);
    exit;

With pointer field sk being at offset 168 in __sk_buff. This access is
detected as a narrower read in bpf_skb_is_valid_access because it
doesn't match offsetof(struct __sk_buff, sk). It is therefore allowed
and later proceeds to bpf_convert_ctx_access. Note that for the
"is_narrower_load" case in the convert_ctx_accesses(), the insn->off
is aligned, so the cnt may not be 0 because it matches the
offsetof(struct __sk_buff, sk) in the bpf_convert_ctx_access. However,
the target_size stays 0 and the verifier errors with a kernel warning:

    verifier bug: error during ctx access conversion(1)

This patch fixes that to return a proper "invalid bpf_context access
off=X size=Y" error on the load instruction.

The same issue affects multiple other fields in context structures that
allow narrow access. Some other non-affected fields (for sk_msg,
sk_lookup, and sockopt) were also changed to use bpf_ctx_range_ptr for
consistency.

Note this syzkaller crash was reported in the "Closes" link below, which
used to be about a different bug, fixed in
commit fce7bd8e38 ("bpf/verifier: Handle BPF_LOAD_ACQ instructions
in insn_def_regno()"). Because syzbot somehow confused the two bugs,
the new crash and repro didn't get reported to the mailing list.

Fixes: f96da09473 ("bpf: simplify narrower ctx access")
Fixes: 0df1a55afa ("bpf: Warn on internal verifier errors")
Reported-by: syzbot+0ef84a7bdf5301d4cbec@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0ef84a7bdf5301d4cbec
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/3b8dcee67ff4296903351a974ddd9c4dca768b64.1753194596.git.paul.chaignon@gmail.com
2025-07-23 19:33:49 -07:00
Steven Rostedt
9f0cb91767 tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exit
The ipi tracepoints are mostly generic, but the tracepoints ipi_raise,
ipi_entry and ipi_exit are only used by arm and arm64. This means these
trace events are wasting memory in all the other architectures that do not
use them.

Add CONFIG_HAVE_EXTRA_IPI_TRACEPOINTS and have arm and arm64 select it to
enable these trace events. The config makes it easy if other architectures
decide to trace these as well.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Will Deacon <will@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/20250722103714.64eba013@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-23 14:58:55 -04:00
Neeraj Upadhyay (AMD)
cc1d1365f0 Merge branches 'rcu-exp.23.07.2025', 'rcu.22.07.2025', 'torture-scripts.16.07.2025', 'srcu.19.07.2025', 'rcu.nocb.18.07.2025' and 'refscale.07.07.2025' into rcu.merge.23.07.2025 2025-07-23 21:42:20 +05:30
Masami Hiramatsu (Google)
558d5f3cd2 tracing: probes: Add a kerneldoc for traceprobe_parse_event_name()
Since traceprobe_parse_event_name() is a bit complicated, add a
kerneldoc for explaining the behavior.

Link: https://lore.kernel.org/all/175323430565.57270.2602609519355112748.stgit@devnote2/

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24 00:22:06 +09:00
Masami Hiramatsu (Google)
97e8230f89 tracing: uprobe-event: Allocate string buffers from heap
Allocate temporary string buffers for parsing uprobe-events
from heap instead of stack.

Link: https://lore.kernel.org/all/175323429593.57270.12369235525923902341.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24 00:21:58 +09:00
Masami Hiramatsu (Google)
4c6edb43ea tracing: eprobe-event: Allocate string buffers from heap
Allocate temporary string buffers for parsing eprobe-events
from heap instead of stack.

Link: https://lore.kernel.org/all/175323428599.57270.988038042425748956.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24 00:21:51 +09:00
Masami Hiramatsu (Google)
33b4e38baa tracing: kprobe-event: Allocate string buffers from heap
Allocate temporary string buffers for parsing kprobe-events
from heap instead of stack.

Link: https://lore.kernel.org/all/175323427627.57270.5105357260879695051.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24 00:21:44 +09:00
Masami Hiramatsu (Google)
d643eaa708 tracing: fprobe-event: Allocate string buffers from heap
Allocate temporary string buffers for fprobe-event from heap
instead of stack. This fixes the stack frame exceed limit error.

Link: https://lore.kernel.org/all/175323426643.57270.6657152008331160704.stgit@devnote2/

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506240416.nZIhDXoO-lkp@intel.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24 00:21:37 +09:00
Masami Hiramatsu (Google)
43beb5e89b tracing: probe: Allocate traceprobe_parse_context from heap
Instead of allocating traceprobe_parse_context on stack, allocate it
dynamically from heap (slab).

This change is likely intended to prevent potential stack overflow
issues, which can be a concern in the kernel environment where stack
space is limited.

Link: https://lore.kernel.org/all/175323425650.57270.280750740753792504.stgit@devnote2/

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506240416.nZIhDXoO-lkp@intel.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24 00:21:30 +09:00
Masami Hiramatsu (Google)
2f02a61d84 tracing: probes: Sort #include alphabetically
Sort the #include directives in trace_probe* files alphabetically for
easier maintenance and avoid double includes.
This also groups headers as linux-generic, asm-generic, and local
headers.

Link: https://lore.kernel.org/all/175323424678.57270.11975372127870059007.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-07-24 00:21:22 +09:00
Steven Rostedt
9ba817fb7c tracing: Deprecate auto-mounting tracefs in debugfs
In January 2015, tracefs was created to allow access to the tracing
infrastructure without needing to compile in debugfs. When tracefs is
configured, the directory /sys/kernel/tracing will exist and tooling is
expected to use that path to access the tracing infrastructure.

To allow backward compatibility, when debugfs is mounted, it would
automount tracefs in its "tracing" directory so that tooling that had hard
coded /sys/kernel/debug/tracing would still work.

It has been over 10 years since the new interface was introduced, and all
tooling should now be using it. Start the process of deprecating the old
path so that it doesn't need to be maintained anymore.

A new config is added to allow distributions to disable automounting of
tracefs on debugfs.

If /sys/kernel/debug/tracing is accessed, a pr_warn() will trigger stating:

"NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030"

Expect to remove this feature in 5 years (2030).

Cc: <linux-trace-users@vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/20250722170806.40c068c6@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-23 10:47:26 -04:00
Joel Granados
73184c8e4f sysctl: rename kern_table -> sysctl_subsys_table
Renamed sysctl table from kern_table to sysctl_subsys_table and grouped
the two arch specific ctls to the end of the array.

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:56:02 +02:00
Joel Granados
25ebbce1f1 kernel/sys.c: Move overflow{uid,gid} sysctl into kernel/sys.c
Moved ctl_tables elements for overflowuid and overflowgid into in
kernel/sys.c. Create a register function that keeps them under "kernel"
and run it after core with postcore_initcall.

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:56:02 +02:00
Joel Granados
88eddb0502 uevent: mv uevent_helper into kobject_uevent.c
Move both uevent_helper table into lib/kobject_uevent.c. Place the
registration early in the initcall order with postcore_initcall.

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:56:02 +02:00
Joel Granados
6519dba9af sysctl: Remove superfluous includes from kernel/sysctl.c
Remove the following headers from the include list in sysctl.c.

* These are removed as the related variables are no longer there.
  ===================   ====================
  Include               Related Var
  ===================   ====================
  linux/kmod.h          usermodehelper
  asm/nmi.h             nmi_watchdoc_enabled
  asm/io.h              io_delay_type
  linux/pid.h           pid_max_{,min,max}
  linux/sched/sysctl.h  sysctl_{sched_*,numa_*,timer_*}
  linux/mount.h         sysctl_mount_max
  linux/reboot.h        poweroff_cmd
  linux/ratelimit.h     {,printk_}ratelimit_state
  linux/printk.h        kptr_restrict
  linux/security.h      CONFIG_SECURITY_CAPABILITIES
  linux/net.h           net_table
  linux/key.h           key_sysctls
  linux/nvs_fs.h        acpi_video_flags
  linux/acpi.h          acpi_video_flags
  linux/fs.h            proc_nr_files

* These are no longer needed as intermediate includes
  ==============
  Include
  ==============
  linux/filter.h
  linux/binfmts.h

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:52:48 +02:00
Joel Granados
ad0800b1d4 sysctl: Remove (very) old file changelog
These comments are older than 2003 and therefore do not bare any
relevance on the current state of the sysctl.c file. Remove them as they
confuse more than clarify.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:52:48 +02:00
Joel Granados
5a477e9341 sysctl: Move sysctl_panic_on_stackoverflow to kernel/panic.c
This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:52:48 +02:00
Joel Granados
e054bcbe7e sysctl: move cad_pid into kernel/pid.c
Move cad_pid as well as supporting function proc_do_cad_pid into
kernel/pic.c. Replaced call to __do_proc_dointvec with proc_dointvec
inside proc_do_cad_pid which requires the copy of the ctl_table to
handle the temp value.

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:52:48 +02:00
Joel Granados
942b296a6c sysctl: Move tainted ctl_table into kernel/panic.c
Move the ctl_table with the "tainted" proc_name into kernel/panic.c.
With it moves the proc_tainted helper function.

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:52:48 +02:00
Joel Granados
79ac8df974 Input: sysrq: mv sysrq into drivers/tty/sysrq.c
Move both sysrq ctl_table and supported sysrq_sysctl_handler helper
function into drivers/tty/sysrq.c. Replaced the __do_proc_dointvec in
helper function with do_proc_dointvec_minmax as the former is local to
kernel/sysctl.c. Here we use the minmax version of do_proc_dointvec
because do_proc_dointvec is static and calling do_proc_dointvec_minmax
with a NULL min and max is the same as calling do_proc_dointvec.

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:52:48 +02:00
Joel Granados
8e5f04b0d5 fork: mv threads-max into kernel/fork.c
make sysctl_max_threads static as it no longer needs to be exported into
sysctl.c.

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:52:48 +02:00
Joel Granados
9e2f403dd8 parisc/power: Move soft-power into power.c
Move the soft-power ctl table into parisc/power.c. As a consequence the
pwrsw_enabled var is made static.

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:52:48 +02:00
Joel Granados
851911aa72 mm: move randomize_va_space into memory.c
Move the randomize_va_space variable together with all its sysctl table
elements into memory.c. Register it to the "kernel" directory by
adding it to the subsys initialization calls

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:52:47 +02:00
Joel Granados
fff6703fc8 rcu: Move rcu_stall related sysctls into rcu/tree_stall.h
Move sysctl_panic_on_rcu_stall and sysctl_max_rcu_stall_to_panic into
the kernel/rcu subdirectory. Make these static in tree_stall.h and
removed them as extern from panic.h as their scope is now confined into
one file.

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:52:47 +02:00
Joel Granados
f1b4f23a52 locking/rtmutex: Move max_lock_depth into rtmutex.c
Move the max_lock_depth sysctl table element into rtmutex_api.c. Removed
the rtmutex.h include from sysctl.c. Chose to move into rtmutex_api.c
to avoid multiple registrations every time rtmutex.c is included in other
files.

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:52:47 +02:00
Joel Granados
d0d05f602c module: Move modprobe_path and modules_disabled ctl_tables into the module subsys
Move module sysctl (modprobe_path and modules_disabled) out of sysctl.c
and into the modules subsystem. Make modules_disabled static as it no
longer needs to be exported. Remove module.h from the includes in sysctl
as it no longer uses any module exported variables.

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23 11:52:47 +02:00
Marco Elver
9872916ad1 kcsan: test: Initialize dummy variable
Newer compiler versions rightfully point out:

 kernel/kcsan/kcsan_test.c:591:41: error: variable 'dummy' is
 uninitialized when passed as a const pointer argument here
 [-Werror,-Wuninitialized-const-pointer]
   591 |         KCSAN_EXPECT_READ_BARRIER(atomic_read(&dummy), false);
       |                                                ^~~~~
 1 error generated.

Although this particular test does not care about the value stored in
the dummy atomic variable, let's silence the warning.

Link: https://lkml.kernel.org/r/CA+G9fYu8JY=k-r0hnBRSkQQrFJ1Bz+ShdXNwC1TNeMt0eXaxeA@mail.gmail.com
Fixes: 8bc32b3481 ("kcsan: test: Add test cases for memory barrier instrumentation")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reviewed-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
2025-07-23 08:51:32 +02:00
Steven Rostedt
502ffa4399 tracing: Fix comment in trace_module_remove_events()
Fix typo "allocade" -> "allocated".

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250710095628.42ed6b06@batman.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22 20:28:54 -04:00
Steven Rostedt
4d6d0a6263 tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORD
Ftrace is tightly coupled with architecture specific code because it
requires the use of trampolines written in assembly. This means that when
a new feature or optimization is made, it must be done for all
architectures. To simplify the approach, CONFIG_HAVE_FTRACE_* configs are
added to denote which architecture has the new enhancement so that other
architectures can still function until they too have been updated.

The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the
DYNAMIC_FTRACE work, but now every architecture that implements
DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant
with the HAVE_DYNAMIC_FTRACE.

Remove the HAVE_FTRACE_MCOUNT config and use DYNAMIC_FTRACE directly where
applicable.

Link: https://lore.kernel.org/all/20250703154916.48e3ada7@gandalf.local.home/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20250704104838.27a18690@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22 20:15:56 -04:00
Steven Rostedt
07c3f391bc tracing: Remove EVENT_FILE_FL_SOFT_MODE flag
When soft disabling of trace events was first created, it needed to have a
way to know if a file had a user that was using it with soft disabled (for
triggers that need to enable or disable events from a context that can not
really enable or disable the event, it would set SOFT_DISABLED to state it
is disabled). The flag SOFT_MODE was used to denote that an event had a
user that would enable or disable it via the SOFT_DISABLED flag.

Commit 1cf4c0732d ("tracing: Modify soft-mode only if there's no other
referrer") fixed a bug where if two users were using the SOFT_DISABLED
flag the accounting would get messed up as the SOFT_MODE flag could only
handle one user. That commit added the sm_ref counter which kept track of
how many users were using the event in "soft mode". This made the
SOFT_MODE flag redundant as it should only be set if the sm_ref counter is
non zero.

Remove the SOFT_MODE flag and just use the sm_ref counter to know the
event is in soft mode or not. This makes the code a bit simpler.

Link: https://lore.kernel.org/all/20250702111908.03759998@batman.local.home/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Link: https://lore.kernel.org/20250702143657.18dd1882@batman.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22 20:15:51 -04:00
Nam Cao
c897c1e5b1 tracing: Remove pointless memory barriers
Memory barriers are useful to ensure memory accesses from one CPU appear in
the original order as seen by other CPUs.

Some smp_rmb() and smp_wmb() are used, but they are not ordering multiple
memory accesses.

Remove them.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626151940.1756398-1-namcao@linutronix.de
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22 20:15:51 -04:00
Steven Rostedt
9b4d5d330f ftrace: Make DYNAMIC_FTRACE always enabled for architectures that support it
ftrace has two flavors:

 1) static: Where every function always calls the ftrace trampoline

 2) dynamic: Where each function has nops that can be changed on demand to
             jump to the ftrace trampoline when needed.

The static flavor has very high performance overhead and was only created
to make it easier for architectures to implement the dynamic flavor. An
architecture developer can first implement the static ftrace to make sure
the trampolines work before working on the more complicated dynamic aspect
of ftrace. Once the architecture can support dynamic ftrace, there's no
reason to continue to support the static flavor. In fact, the static
flavor tends to bitrot and bugs start to appear in them.

Remove the prompt to pick DYNAMIC_FTRACE and simply enable it if the
architecture supports it.

Link: https://lore.kernel.org/all/f7e12c6d-892e-4ca3-9ef0-fbb524d04a48@ghiti.fr/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: ChenMiao <chenmiao.ku@gmail.com>
Link: https://lore.kernel.org/20250703115222.2d7c8cd5@batman.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22 20:13:16 -04:00
Steven Rostedt
218d372ce8 fgraph: Keep track of when fgraph_ops are registered or not
Add a warning if unregister_ftrace_graph() is called without ever
registering it, or if register_ftrace_graph() is called twice. This can
detect errors when they happen and not later when there's a side effect:

Link: https://lore.kernel.org/all/20250617120830.24fbdd62@gandalf.local.home/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20250701194451.22e34724@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22 20:06:02 -04:00
Steven Rostedt
119a5d5736 ring-buffer: Remove ring_buffer_read_prepare_sync()
When the ring buffer was first introduced, reading the non-consuming
"trace" file required disabling the writing of the ring buffer. To make
sure the writing was fully disabled before iterating the buffer with a
non-consuming read, it would set the disable flag of the buffer and then
call an RCU synchronization to make sure all the buffers were
synchronized.

The function ring_buffer_read_start() originally  would initialize the
iterator and call an RCU synchronization, but this was for each individual
per CPU buffer where this would get called many times on a machine with
many CPUs before the trace file could be read. The commit 72c9ddfd4c
("ring-buffer: Make non-consuming read less expensive with lots of cpus.")
separated ring_buffer_read_start into ring_buffer_read_prepare(),
ring_buffer_read_sync() and then ring_buffer_read_start() to allow each of
the per CPU buffers to be prepared, call the read_buffer_read_sync() once,
and then the ring_buffer_read_start() for each of the CPUs which made
things much faster.

The commit 1039221cc2 ("ring-buffer: Do not disable recording when there
is an iterator") removed the requirement of disabling the recording of the
ring buffer in order to iterate it, but it did not remove the
synchronization that was happening that was required to wait for all the
buffers to have no more writers. It's now OK for the buffers to have
writers and no synchronization is needed.

Remove the synchronization and put back the interface for the ring buffer
iterator back before commit 72c9ddfd4c was applied.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250630180440.3eabb514@batman.local.home
Reported-by: David Howells <dhowells@redhat.com>
Fixes: 1039221cc2 ("ring-buffer: Do not disable recording when there is an iterator")
Tested-by: David Howells <dhowells@redhat.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22 20:01:41 -04:00
Marc Zyngier
2c9e7f8574 genirq: Teach handle_simple_irq() to resend an in-progress interrupt
It appears that the defect outlined in 9c15eeb536 ("genirq: Allow
fasteoi handler to resend interrupts on concurrent handling") also
affects some other less stellar MSI controllers, this time using
the handle_simple_irq() flow.

Teach this flow about irqd_needs_resend_when_in_progress(). Given
the invasive nature of this workaround, only this flow is updated.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20250708173404.1278635-2-maz@kernel.org
2025-07-22 15:33:00 -05:00
Thomas Gleixner
8d39d6ec4d genirq: Prevent migration live lock in handle_edge_irq()
Yicon reported and Liangyan debugged a live lock in handle_edge_irq()
related to interrupt migration.

If the interrupt affinity is moved to a new target CPU and the interrupt is
currently handled on the previous target CPU for edge type interrupts the
handler might get stuck on the previous target:

CPU 0 (previous target)		CPU 1 (new target)

  handle_edge_irq()
   repeat:
	handle_event()		handle_edge_irq()
			        if (INPROGESS) {
				  set(PENDING);
				  mask();
				  return;
				}
	if (PENDING) {
	  clear(PENDING);
	  unmask();
	  goto repeat;
	}

The migration in software never completes and CPU0 continues to handle the
pending events forever. This happens when the device raises interrupts with
a high rate and always before handle_event() completes and before the CPU0
handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and
over. This has been observed in virtual machines.

Prevent this by checking whether the CPU which observes the INPROGRESS flag
is the new affinity target. If that's the case, do not set the PENDING flag
and wait for the INPROGRESS flag to be cleared instead, so that the new
interrupt is handled on the new target CPU and the previous CPU is released
from the action.

This is restricted to the edge type handler and only utilized on systems,
which use single CPU targets for interrupt affinity.

Reported-by: Yicong Shen <shenyicong.1023@bytedance.com>
Reported-by: Liangyan <liangyan.peng@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liangyan <liangyan.peng@bytedance.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/all/20250701163558.2588435-1-liangyan.peng@bytedance.com
Link: https://lore.kernel.org/all/20250718185312.076515034@linutronix.de
2025-07-22 14:30:42 +02:00