There are cases where a single system will use a single function callback
to handle multiple users. For example, to allow function_graph tracer to
have multiple users where each can trace their own set of functions, it is
useful to only have one ftrace_ops registered to ftrace that will call a
function by the function_graph tracer to handle the multiplexing with the
different registered function_graph tracers.
Add a "subop_list" to the ftrace_ops that will hold a list of other
ftrace_ops that the top ftrace_ops will manage.
The function ftrace_startup_subops() that takes the manager ftrace_ops and
a subop ftrace_ops it will manage. If there are no subops with the
ftrace_ops yet, it will copy the ftrace_ops subop filters to the manager
ftrace_ops and register that with ftrace_startup(), and adds the subop to
its subop_list. If the manager ops already has something registered, it
will then merge the new subop filters with what it has and enable the new
functions that covers all the subops it has.
To remove a subop, ftrace_shutdown_subops() is called which will use the
subop_list of the manager ops to rebuild all the functions it needs to
trace, and update the ftrace records to only call the functions it now has
registered. If there are no more functions registered, it will then call
ftrace_shutdown() to disable itself completely.
Note, it is up to the manager ops callback to always make sure that the
subops callbacks are called if its filter matches, as there are times in
the update where the callback could be calling more functions than those
that are currently registered.
This could be updated to handle other systems other than function_graph,
for example, fprobes could use this (but will need an interface to call
ftrace_startup_subops()).
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.508431129@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Allow for multiple users to attach to function graph tracer at the same
time. Only 16 simultaneous users can attach to the tracer. This is because
there's an array that stores the pointers to the attached fgraph_ops. When
a function being traced is entered, each of the ftrace_ops entryfunc is
called and if it returns non zero, its index into the array will be added
to the shadow stack.
On exit of the function being traced, the shadow stack will contain the
indexes of the ftrace_ops on the array that want their retfunc to be
called.
Because a function may sleep for a long time (if a task sleeps itself),
the return of the function may be literally days later. If the ftrace_ops
is removed, its place on the array is replaced with a ftrace_ops that
contains the stub functions and that will be called when the function
finally returns.
If another ftrace_ops is added that happens to get the same index into the
array, its return function may be called. But that's actually the way
things current work with the old function graph tracer. If one tracer is
removed and another is added, the new one will get the return calls of the
function traced by the previous one, thus this is not a regression. This
can be fixed by adding a counter to each time the array item is updated and
save that on the shadow stack as well, such that it won't be called if the
index saved does not match the index on the array.
Note, being able to filter functions when both are called is not completely
handled yet, but that shouldn't be too hard to manage.
Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509096221.162236.8806372072523195752.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190821.555493396@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The verifier has field information for specific special types, such as
kptr, rbtree root, and list head. These types are handled
differently. However, we did not previously examine the types of fields of
a struct type variable. Field information records were not generated for
the kptrs, rbtree roots, and linked_list heads that are not located at the
outermost struct type of a variable.
For example,
struct A {
struct task_struct __kptr * task;
};
struct B {
struct A mem_a;
}
struct B var_b;
It did not examine "struct A" so as not to generate field information for
the kptr in "struct A" for "var_b".
This patch enables BPF programs to define fields of these special types in
a struct type other than the direct type of a variable or in a struct type
that is the type of a field in the value type of a map.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240523174202.461236-6-thinker.li@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The verifier uses field information for certain special types, such as
kptr, rbtree root, and list head. These types are treated
differently. However, we did not previously support these types in
arrays. This update examines arrays and duplicates field information the
same number of times as the length of the array if the element type is one
of the special types.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240523174202.461236-5-thinker.li@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit allows rcutorture to test double-call_srcu() when the
CONFIG_DEBUG_OBJECTS_RCU_HEAD Kconfig option is enabled. The non-raw
sdp structure's ->spinlock will be acquired in call_srcu(), hence this
commit also removes the current IRQ and preemption disabling so as to
avoid lockdep complaints.
Link: https://lore.kernel.org/all/20240407112714.24460-1-qiang.zhang1211@gmail.com/
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This reverts commit 28319d6dc5. The race
it fixed was subject to conditions that don't exist anymore since:
1612160b91 ("rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks")
This latter commit removes the use of SRCU that used to cover the
RCU-tasks blind spot on exit between the tasklist's removal and the
final preemption disabling. The task is now placed instead into a
temporary list inside which voluntary sleeps are accounted as RCU-tasks
quiescent states. This would disarm the deadlock initially reported
against PID namespace exit.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The bypass lock contention mitigation assumes there can be at most
2 contenders on the bypass lock, following this scheme:
1) One kthread takes the bypass lock
2) Another one spins on it and increment the contended counter
3) A third one (a bypass enqueuer) sees the contended counter on and
busy loops waiting on it to decrement.
However this assumption is wrong. There can be only one CPU to find the
lock contended because call_rcu() (the bypass enqueuer) is the only
bypass lock acquire site that may not already hold the NOCB lock
beforehand, all the other sites must first contend on the NOCB lock.
Therefore step 2) is impossible.
The other problem is that the mitigation assumes that contenders all
belong to the same rdp CPU, which is also impossible for a raw spinlock.
In theory the warning could trigger if the enqueuer holds the bypass
lock and another CPU flushes the bypass queue concurrently but this is
prevented from all flush users:
1) NOCB kthreads only flush if they successfully _tried_ to lock the
bypass lock. So no contention management here.
2) Flush on callbacks migration happen remotely when the CPU is offline.
No concurrency against bypass enqueue.
3) Flush on deoffloading happen either locally with IRQs disabled or
remotely when the CPU is not yet online. No concurrency against
bypass enqueue.
4) Flush on barrier entrain happen either locally with IRQs disabled or
remotely when the CPU is offline. No concurrency against
bypass enqueue.
For those reasons, the bypass lock contention mitigation isn't needed
and is even wrong. Remove it but keep the warning reporting a contended
bypass lock on a remote CPU, to keep unexpected contention awareness.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Upon NOCB deoffloading, the rcuo kthread must be forced to sleep
until the corresponding rdp is ever offloaded again. The deoffloader
clears the SEGCBLIST_OFFLOADED flag, wakes up the rcuo kthread which
then notices that change and clears in turn its SEGCBLIST_KTHREAD_CB
flag before going to sleep, until it ever sees the SEGCBLIST_OFFLOADED
flag again, should a re-offloading happen.
Upon NOCB offloading, the rcuo kthread must be forced to wake up and
handle callbacks until the corresponding rdp is ever deoffloaded again.
The offloader sets the SEGCBLIST_OFFLOADED flag, wakes up the rcuo
kthread which then notices that change and sets in turn its
SEGCBLIST_KTHREAD_CB flag before going to check callbacks, until it
ever sees the SEGCBLIST_OFFLOADED flag cleared again, should a
de-offloading happen again.
This is all a crude ad-hoc and error-prone kthread (un-)parking
re-implementation.
Consolidate the behaviour with the appropriate API instead.
[ paulmck: Apply Qiang Zhang feedback provided in Link: below. ]
Link: https://lore.kernel.org/all/20240509074046.15629-1-qiang.zhang1211@gmail.com/
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
If only isolated partitions are being created underneath the cgroup root,
there will only be one sched domain with top_cpuset.effective_cpus. We can
skip the unnecessary sched domains scanning code and save some cycles.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Back in 2021 we already discussed removing deny_write_access() for
executables. Back then I was hesistant because I thought that this might
cause issues in userspace. But even back then I had started taking some
notes on what could potentially depend on this and I didn't come up with
a lot so I've changed my mind and I would like to try this.
Here are some of the notes that I took:
(1) The deny_write_access() mechanism is causing really pointless issues
such as [1]. If a thread in a thread-group opens a file writable,
then writes some stuff, then closing the file descriptor and then
calling execve() they can fail the execve() with ETXTBUSY because
another thread in the thread-group could have concurrently called
fork(). Multi-threaded libraries such as go suffer from this.
(2) There are userspace attacks that rely on overwriting the binary of a
running process. These attacks are _mitigated_ but _not at all
prevented_ from ocurring by the deny_write_access() mechanism.
I'll go over some details. The clearest example of such attacks was
the attack against runC in CVE-2019-5736 (cf. [3]).
An attack could compromise the runC host binary from inside a
_privileged_ runC container. The malicious binary could then be used
to take over the host.
(It is crucial to note that this attack is _not_ possible with
unprivileged containers. IOW, the setup here is already insecure.)
The attack can be made when attaching to a running container or when
starting a container running a specially crafted image. For example,
when runC attaches to a container the attacker can trick it into
executing itself.
This could be done by replacing the target binary inside the
container with a custom binary pointing back at the runC binary
itself. As an example, if the target binary was /bin/bash, this
could be replaced with an executable script specifying the
interpreter path #!/proc/self/exe.
As such when /bin/bash is executed inside the container, instead the
target of /proc/self/exe will be executed. That magic link will
point to the runc binary on the host. The attacker can then proceed
to write to the target of /proc/self/exe to try and overwrite the
runC binary on the host.
However, this will not succeed because of deny_write_access(). Now,
one might think that this would prevent the attack but it doesn't.
To overcome this, the attacker has multiple ways:
* Open a file descriptor to /proc/self/exe using the O_PATH flag and
then proceed to reopen the binary as O_WRONLY through
/proc/self/fd/<nr> and try to write to it in a busy loop from a
separate process. Ultimately it will succeed when the runC binary
exits. After this the runC binary is compromised and can be used
to attack other containers or the host itself.
* Use a malicious shared library annotating a function in there with
the constructor attribute making the malicious function run as an
initializor. The malicious library will then open /proc/self/exe
for creating a new entry under /proc/self/fd/<nr>. It'll then call
exec to a) force runC to exit and b) hand the file descriptor off
to a program that then reopens /proc/self/fd/<nr> for writing
(which is now possible because runC has exited) and overwriting
that binary.
To sum up: the deny_write_access() mechanism doesn't prevent such
attacks in insecure setups. It just makes them minimally harder.
That's all.
The only way back then to prevent this is to create a temporary copy
of the calling binary itself when it starts or attaches to
containers. So what I did back then for LXC (and Aleksa for runC)
was to create an anonymous, in-memory file using the memfd_create()
system call and to copy itself into the temporary in-memory file,
which is then sealed to prevent further modifications. This sealed,
in-memory file copy is then executed instead of the original on-disk
binary.
Any compromising write operations from a privileged container to the
host binary will then write to the temporary in-memory binary and
not to the host binary on-disk, preserving the integrity of the host
binary. Also as the temporary, in-memory binary is sealed, writes to
this will also fail.
The point is that deny_write_access() is uselss to prevent these
attacks.
(3) Denying write access to an inode because it's currently used in an
exec path could easily be done on an LSM level. It might need an
additional hook but that should be about it.
(4) The MAP_DENYWRITE flag for mmap() has been deprecated a long time
ago so while we do protect the main executable the bigger portion of
the things you'd think need protecting such as the shared libraries
aren't. IOW, we let anyone happily overwrite shared libraries.
(5) We removed all remaining uses of VM_DENYWRITE in [2]. That means:
(5.1) We removed the legacy uselib() protection for preventing
overwriting of shared libraries. Nobody cared in 3 years.
(5.2) We allow write access to the elf interpreter after exec
completed treating it on a par with shared libraries.
Yes, someone in userspace could potentially be relying on this. It's not
completely out of the realm of possibility but let's find out if that's
actually the case and not guess.
Link: https://github.com/golang/go/issues/22315 [1]
Link: 49624efa65 ("Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux") [2]
Link: https://unit42.paloaltonetworks.com/breaking-docker-via-runc-explaining-cve-2019-5736 [3]
Link: https://lwn.net/Articles/866493
Link: https://github.com/golang/go/issues/22220
Link: 5bf8c0cf09/src/cmd/go/internal/work/buildid.go (L724)
Link: 5bf8c0cf09/src/cmd/go/internal/work/exec.go (L1493)
Link: 5bf8c0cf09/src/cmd/go/internal/script/cmds.go (L457)
Link: 5bf8c0cf09/src/cmd/go/internal/test/test.go (L1557)
Link: 5bf8c0cf09/src/os/exec/lp_linux_test.go (L61)
Link: https://github.com/buildkite/agent/pull/2736
Link: https://github.com/rust-lang/rust/issues/114554
Link: https://bugs.openjdk.org/browse/JDK-8068370
Link: https://github.com/dotnet/runtime/issues/58964
Link: https://lore.kernel.org/r/20240531-vfs-i_writecount-v1-1-a17bea7ee36b@kernel.org
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Added a module description to sysctl Kunit self test module to fix the
'make W=1' warning (" WARNING: modpost: missing MODULE_DESCRIPTION() in
kernel/sysctl-test.o")
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
In a future commit the proc_handlers themselves will change to
"const struct ctl_table". As a preparation for that adapt the internal
helper.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Joel Granados <j.granados@samsung.com>
The sysctl core is preparing to only expose instances of
struct ctl_table as "const".
This will also affect the ctl_table argument of sysctl handlers.
As the function prototype of all sysctl handlers throughout the tree
needs to stay consistent that change will be done in one commit.
To reduce the size of that final commit, switch utility functions which
are not bound by "typedef proc_handler" to "const struct ctl_table".
No functional change.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
Move boundary checking for proc_dou8ved_minmax into module loading, thereby
reporting errors in advance. And add a kunit test case ensuring the
boundary check is done correctly.
The boundary check in proc_dou8vec_minmax done to the extra elements in
the ctl_table struct is currently performed at runtime. This allows buggy
kernel modules to be loaded normally without any errors only to fail
when used.
This is a buggy example module:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sysctl.h>
static struct ctl_table_header *_table_header = NULL;
static unsigned char _data = 0;
struct ctl_table table[] = {
{
.procname = "foo",
.data = &_data,
.maxlen = sizeof(u8),
.mode = 0644,
.proc_handler = proc_dou8vec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE_THOUSAND,
},
};
static int init_demo(void) {
_table_header = register_sysctl("kernel", table);
if (!_table_header)
return -ENOMEM;
return 0;
}
module_init(init_demo);
MODULE_LICENSE("GPL");
And this is the result:
# insmod test.ko
# cat /proc/sys/kernel/foo
cat: /proc/sys/kernel/foo: Invalid argument
Suggested-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Wen Yang <wen.yang@linux.dev>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Joel Granados <j.granados@samsung.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
Improve the readability of irqdomain debugging information in debugfs by
printing the flags field of domain files as human-readable strings instead
of a raw bitmask, which aligned with the existing style used for irqchip
flags in the irq debug files.
Before:
#cat :cpus:cpu@0:interrupt-controller
name: :cpus:cpu@0:interrupt-controller
size: 0
mapped: 2
flags: 0x00000003
After:
#cat :cpus:cpu@0:interrupt-controller
name: :cpus:cpu@0:interrupt-controller
size: 0
mapped: 3
flags: 0x00000003
IRQ_DOMAIN_FLAG_HIERARCHY
IRQ_DOMAIN_NAME_ALLOCATED
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240529091628.3666379-1-ruanjinjie@huawei.com
Interrupts which have no action and chained interrupts can be
ignored due to the following reasons (as per tglx's comment):
1) Interrupts which have no action are completely uninteresting as
there is no real information attached.
2) Chained interrupts do not have a count at all.
So there is no point to evaluate the number of accounted interrupts before
checking for non-requested or chained interrupts.
Remove the any_count logic and simply check whether the interrupt
descriptor has the kstat_irqs member populated.
[ tglx: Adapted to upstream changes ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiwei Sun <sunjw10@lenovo.com>
Link: https://lore.kernel.org/r/20240515100632.1419-1-ahuang12@lenovo.com
Link: https://lore.kernel.org/lkml/87h6f0knau.ffs@tglx/
PPS (Pulse Per Second) generates a hardware pulse every second based on
CLOCK_REALTIME. This works fine when the pulse is generated in software
from a hrtimer callback function.
For hardware which generates the pulse by programming a timer it is
required to convert CLOCK_REALTIME to the underlying hardware clock.
The X86 Timed IO device is based on the Always Running Timer (ART), which
is the base clock of the TSC, which is usually the system clocksource on
X86.
The core code already has functionality to convert base clock timestamps to
system clocksource timestamps, but there is no support for converting the
other way around.
Provide the required functionality to support such devices in a generic
way to avoid code duplication in drivers:
1) ktime_real_to_base_clock() to convert a CLOCK_REALTIME timestamp to a
base clock timestamp
2) timekeeping_clocksource_has_base() to allow drivers to validate that
the system clocksource is based on a particular clocksource ID.
[ tglx: Simplify timekeeping_clocksource_has_base() and add missing READ_ONCE() ]
Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Christopher S. Hall <christopher.s.hall@intel.com>
Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
Signed-off-by: Lakshmi Sowjanya D <lakshmi.sowjanya.d@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240513103813.5666-10-lakshmi.sowjanya.d@intel.com
Hardware time stamps like provided by PTP clock implementations are based
on a clock which feeds both the PCIe device and the system clock. For
further processing the underlying hardwarre clock timestamp must be
converted to the system clock.
Right now this requires drivers to invoke an architecture specific
conversion function, e.g. to convert the ART (Always Running Timer)
timestamp to a TSC timestamp.
As the system clock is aware of the underlying base clock, this can be
moved to the core code by providing a base clock property for the system
clock which contains the conversion factors and assigning a clocksource ID
to the base clock.
Add the required data structures and the conversion infrastructure in the
core code to prepare for converting X86 and the related PTP drivers over.
[ tglx: Added a missing READ_ONCE(). Massaged change log ]
Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Christopher S. Hall <christopher.s.hall@intel.com>
Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
Signed-off-by: Lakshmi Sowjanya D <lakshmi.sowjanya.d@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240513103813.5666-2-lakshmi.sowjanya.d@intel.com
In the cpuset_css_online(), clearing the CS_SCHED_LOAD_BALANCE bit
of cs->flags is guarded by callback_lock and cpuset_mutex. There is
no problem with itself, because it is consistent with the description
of there two global lock at the beginning of this file. However, since
the operation of checking, setting and clearing the flag bit is atomic,
protection of callback_lock is unnecessary here, see CS_SPREAD_*. so
to make it more consistent with the other code, move the operation
outside the critical section of callback_lock.
No functional changes intended.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull Kbuild fixes from Masahiro Yamada:
- Fix a Kconfig bug regarding comparisons to 'm' or 'n'
- Replace missed $(srctree)/$(src)
- Fix unneeded kallsyms step 3
- Remove incorrect "compatible" properties from image nodes in
image.fit
- Improve gen_kheaders.sh
- Fix 'make dt_binding_check'
- Clean up unnecessary code
* tag 'kbuild-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
dt-bindings: kbuild: Fix dt_binding_check on unconfigured build
kheaders: use `command -v` to test for existence of `cpio`
kheaders: explicitly define file modes for archived headers
scripts/make_fit: Drop fdt image entry compatible string
kbuild: remove a stale comment about cleaning in link-vmlinux.sh
kbuild: fix short log for AS in link-vmlinux.sh
kbuild: change scripts/mksysmap into sed script
kbuild: avoid unneeded kallsyms step 3
kbuild: scripts/gdb: Replace missed $(srctree)/$(src) w/ $(src)
kconfig: remove redundant check in expr_join_or()
kconfig: fix comparison to constant symbols, 'm', 'n'
kconfig: remove unused expr_is_no()
Pull dma-mapping fixes from Christoph Hellwig:
- dma-mapping benchmark error handling fixes (Fedor Pchelkin)
- correct a config symbol reference in the DMA API documentation (Lukas
Bulwahn)
* tag 'dma-mapping-6.10-2024-05-31' of git://git.infradead.org/users/hch/dma-mapping:
Documentation/core-api: correct reference to SWIOTLB_DYNAMIC
dma-mapping: benchmark: handle NUMA_NO_NODE correctly
dma-mapping: benchmark: fix node id validation
dma-mapping: benchmark: avoid needless copy_to_user if benchmark fails
dma-mapping: benchmark: fix up kthread-related error handling
Add epoll support to bpf struct_ops links to trigger EPOLLHUP event upon
detachment.
This patch implements the "poll" of the "struct file_operations" for BPF
links and introduces a new "poll" operator in the "struct bpf_link_ops". By
implementing "poll" of "struct bpf_link_ops" for the links of struct_ops,
the file descriptor of a struct_ops link can be added to an epoll file
descriptor to receive EPOLLHUP events.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240530065946.979330-4-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>