PRIO_PGRP needs the tasklist_lock mainly to serialize vs setpgid(2), to
protect against any concurrent change_pid(PIDTYPE_PGID) that can move
the task from one hlist to another while iterating.
However, the remaining can only rely only on RCU:
PRIO_PROCESS only does the task lookup and never iterates over tasklist
and we already have an rcu-aware stable pointer.
PRIO_USER is already racy vs setuid(2) so with creds being rcu
protected, we can end up seeing stale data. When removing the
tasklist_lock there can be a race with (i) fork but this is benign as
the child's nice is inherited and the new task is not observable by the
user yet either, hence the return semantics do not differ. And (ii) a
race with exit, which is a small window and can cause us to miss a task
which was removed from the list and it had the highest nice.
Similarly change the buggy do_each_thread/while_each_thread combo in
PRIO_USER for the rcu-safe for_each_process_thread flavor, which doesn't
make use of next_thread/p->thread_group.
[akpm@linux-foundation.org: coding style fixes]
Link: https://lkml.kernel.org/r/20211210182250.43734-1-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When I was implementing a new per-cpu kthread cfs_migration, I found the
comm of it "cfs_migration/%u" is truncated due to the limitation of
TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19
all have the same name "cfs_migration/1", which will confuse the user.
This issue is not critical, because we can get the corresponding CPU
from the task's Cpus_allowed. But for kthreads corresponding to other
hardware devices, it is not easy to get the detailed device info from
task comm, for example,
jbd2/nvme0n1p2-
xfs-reclaim/sdf
Currently there are so many truncated kthreads:
rcu_tasks_kthre
rcu_tasks_rude_
rcu_tasks_trace
poll_mpt3sas0_s
ext4-rsv-conver
xfs-reclaim/sd{a, b, c, ...}
xfs-blockgc/sd{a, b, c, ...}
xfs-inodegc/sd{a, b, c, ...}
audit_send_repl
ecryptfs-kthrea
vfio-irqfd-clea
jbd2/nvme0n1p2-
...
We can shorten these names to work around this problem, but it may be
not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-'
for example, it is a nice name, and it is not a good idea to shorten it.
One possible way to fix this issue is extending the task comm size, but
as task->comm is used in lots of places, that may cause some potential
buffer overflows. Another more conservative approach is introducing a
new pointer to store kthread's full name if it is truncated, which won't
introduce too much overhead as it is in the non-critical path. Finally
we make a dicision to use the second approach. See also the discussions
in this thread:
https://lore.kernel.org/lkml/20211101060419.4682-1-laoar.shao@gmail.com/
After this change, the full name of these truncated kthreads will be
displayed via /proc/[pid]/comm:
rcu_tasks_kthread
rcu_tasks_rude_kthread
rcu_tasks_trace_kthread
poll_mpt3sas0_statu
ext4-rsv-conversion
xfs-reclaim/sdf1
xfs-blockgc/sdf1
xfs-inodegc/sdf1
audit_send_reply
ecryptfs-kthread
vfio-irqfd-cleanup
jbd2/nvme0n1p2-8
Link: https://lkml.kernel.org/r/20211120112850.46047-1-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the proc_create[_data]() stubs which are used when CONFIG_PROC_FS
is not set from #defines to a static inline stubs.
This should fix clang -Werror builds failing due to errors like this:
drivers/platform/x86/thinkpad_acpi.c:918:30: error: unused variable
'dispatch_proc_ops' [-Werror,-Wunused-const-variable]
Fixing this in include/linux/proc_fs.h should ensure that the same issue
is also fixed in any other drivers hitting the same -Werror issue.
[akpm@linux-foundation.org: fix CONFIG_PROC_FS=n]
[akpm@linux-foundation.org: fix arch/sparc/kernel/led.c]
[akpm@linux-foundation.org: fix build]
Link: https://lkml.kernel.org/r/20211116131112.508304-1-hdegoede@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit cc5f2704c9 ("proc/vmcore: convert oldmem_pfn_is_ram callback
to more generic vmcore callbacks"), we added detection of surprise
vmcore_cb unregistration after the vmcore was already opened. Once
detected, we warn the user and simulate reading zeroes from that point
on when accessing the vmcore.
The basic reason was that unexpected unregistration, for example, by
manually unbinding a driver from a device after opening the vmcore, is
not supported and could result in reading oldmem the vmcore_cb would
have actually prohibited while registered. However, something like that
can similarly be trigger by a user that's really looking for trouble
simply by unbinding the relevant driver before opening the vmcore -- or
by disallowing loading the driver in the first place. So it's actually
of limited help.
Currently, unregistration can only be triggered via virtio-mem when
manually unbinding the driver from the device inside the VM; there is no
way to trigger it from the hypervisor, as hypervisors don't allow for
unplugging virtio-mem devices -- ripping out system RAM from a VM
without coordination with the guest is usually not a good idea.
The important part is that unbinding the driver and unregistering the
vmcore_cb while concurrently reading the vmcore won't crash the system,
and that is handled by the rwsem.
To make the mechanism more future proof, let's remove the "read zero"
part, but leave the warning in place. For example, we could have a
future driver (like virtio-balloon) that will contact the hypervisor to
figure out if we already populated a page for a given PFN.
Hotunplugging such a device and consequently unregistering the vmcore_cb
could be triggered from the hypervisor without harming the system even
while kdump is running. In that case, we don't want to silently end up
with a vmcore that contains wrong data, because the user inside the VM
might be unaware of the hypervisor action and might easily miss the
warning in the log.
Link: https://lkml.kernel.org/r/20211111192243.22002-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: percpu: Cleanup percpu first chunk function".
When supporting page mapping percpu first chunk allocator on arm64, we
found there are lots of duplicated codes in percpu embed/page first chunk
allocator. This patchset is aimed to cleanup them and should no function
change.
The currently supported status about 'embed' and 'page' in Archs shows
below,
embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
page: NEED_PER_CPU_EMBED_FIRST_CHUNK
embed page
------------------------
arm64 Y Y
mips Y N
powerpc Y Y
riscv Y N
sparc Y Y
x86 Y Y
------------------------
There are two interfaces about percpu first chunk allocator,
extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
size_t atom_size,
pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
extern int __init pcpu_page_first_chunk(size_t reserved_size,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn,
- pcpu_fc_populate_pte_fn_t populate_pte_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
pcpu_embed/page_first_chunk().
1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
provided when archs supported NUMA.
2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
a generic pcpu_populate_pte() which marked '__weak' is provided, if you
need a different function to populate pte on the arch(like x86), please
provide its own implementation.
[1] https://github.com/kevin78/linux.git percpu-cleanup
This patch (of 4):
The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
duplicate definitions on platforms that subscribe it.
Move them into mm, drop these redundant definitions and instead just
select it on applicable platforms.
Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull input fix from Dmitry Torokhov:
"A small fixup to the Zinitix touchscreen driver to avoid enabling the
IRQ line before we successfully requested it"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: zinitix - make sure the IRQ is allocated before it gets enabled
Pull ARM SoC fix from Olof Johansson:
"One more fix for 5.16
I had missed one patch when I sent up what I thought was the last
batch of fixes for this release. This one fixes issues on the
Raspberry Pi platforms due to gpio init changes this release, so
hopefully we can get it merged before final release is cut"
* tag 'soc-fixes-5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: dts: gpio-ranges property is now required
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Revert "libtraceevent: Increase libtraceevent logging when verbose",
breaks the build with libtraceevent-1.3.0, i.e. when building with
'LIBTRACEEVENT_DYNAMIC=1'.
- Avoid early exit in 'perf trace' due to running SIGCHLD handler
before it makes sense to. It can happen when using a BPF source code
event that have to be first built into an object file.
* tag 'perf-tools-fixes-for-v5.16-2022-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
Revert "libtraceevent: Increase libtraceevent logging when verbose"
perf trace: Avoid early exit due to running SIGCHLD handler before it makes sense to
Since irq request is the last thing in the driver probe, it happens
later than the input device registration. This means that there is a
small time window where if the open method is called the driver will
attempt to enable not yet available irq.
Fix that by moving the irq request before the input device registration.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Fixes: 26822652c8 ("Input: add zinitix touchscreen driver")
Signed-off-by: Nikita Travkin <nikita@trvn.ru>
Link: https://lore.kernel.org/r/20220106072840.36851-2-nikita@trvn.ru
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull ARM SoC fixes from Olof Johansson:
"A few more fixes have come in, nothing overly severe but would be good
to get in by final release:
- More specific compatible fields on the qspi controller for socfpga,
to enable quirks in the driver
- A runtime PM fix for Renesas to fix mismatched reference counts on
errors"
* tag 'soc-fixes-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: dts: socfpga: change qspi to "intel,socfpga-qspi"
dt-bindings: spi: cadence-quadspi: document "intel,socfpga-qspi"
reset: renesas: Fix Runtime PM usage
Pull i2c fixes from Wolfram Sang:
"Fix the regression with AMD GPU suspend by reverting the
handling of bus regulators in the I2C core.
Also, there is a fix for the MPC driver to prevent an
out-of-bound-access"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: core: support bus regulator controlling in adapter"
i2c: mpc: Avoid out of bounds memory access
Pull power supply fixes from Sebastian Reichel:
"Three fixes for the 5.16 cycle:
- Avoid going beyond last capacity in the power-supply core
- Replace 1E6L with NSEC_PER_MSEC to avoid floating point calculation
in LLVM resulting in a build failure
- Fix ADC measurements in bq25890 charger driver"
* tag 'for-v5.16-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: reset: ltc2952: Fix use of floating point literals
power: bq25890: Enable continuous conversion for ADC at charging
power: supply: core: Break capacity loop
Pull xfs fix from Darrick Wong:
- Make the old ALLOCSP ioctl behave in a consistent manner with newer
syscalls like fallocate.
* tag 'xfs-5.16-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate
Pull cgroup fixes from Tejun Heo:
"This contains the cgroup.procs permission check fixes so that they use
the credentials at the time of open rather than write, which also
fixes the cgroup namespace lifetime bug"
* 'for-5.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
selftests: cgroup: Test open-time cgroup namespace usage for migration checks
selftests: cgroup: Test open-time credential usage for migration checks
selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644
cgroup: Use open-time cgroup namespace for process migration perm checks
cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv
cgroup: Use open-time credentials for process migraton perm checks
Pull EDAC fix from Tony Luck:
"Fix 10nm EDAC driver to release and unmap resources on systems without
HBM"
* tag 'edac_urgent_for_v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/i10nm: Release mdev/mbase when failing to detect HBM
This reverts commit 08efcb4a63.
This breaks the build as it will prefer using libbpf-devel header files,
even when not using LIBBPF_DYNAMIC=1, breaking the build.
This was detected on OpenSuSE Tumbleweed with libtraceevent-devel 1.3.0,
as described by Jiri Slaby:
=======================================================================
It breaks build with LIBTRACEEVENT_DYNAMIC and version 1.3.0:
> util/debug.c: In function ‘perf_debug_option’:
> util/debug.c:243:17: error: implicit declaration of function
‘tep_set_loglevel’ [-Werror=implicit-function-declaration]
> 243 | tep_set_loglevel(TEP_LOG_INFO);
> | ^~~~~~~~~~~~~~~~
> util/debug.c:243:34: error: ‘TEP_LOG_INFO’ undeclared (first use in this
function); did you mean ‘TEP_PRINT_INFO’?
> 243 | tep_set_loglevel(TEP_LOG_INFO);
> | ^~~~~~~~~~~~
> | TEP_PRINT_INFO
> util/debug.c:243:34: note: each undeclared identifier is reported only once
for each function it appears in
> util/debug.c:245:34: error: ‘TEP_LOG_DEBUG’ undeclared (first use in this
function)
> 245 | tep_set_loglevel(TEP_LOG_DEBUG);
> | ^~~~~~~~~~~~~
> util/debug.c:247:34: error: ‘TEP_LOG_ALL’ undeclared (first use in this
function)
> 247 | tep_set_loglevel(TEP_LOG_ALL);
> | ^~~~~~~~~~~
It is because the gcc's command line looks like:
gcc
...
-I/home/abuild/rpmbuild/BUILD/tools/lib/
...
-DLIBTRACEEVENT_VERSION=65790
...
=======================================================================
The proper way to fix this is more involved and so not suitable for this
late in the 5.16-rc stage.
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/lkml/bc2b0786-8965-1bcd-2316-9d9bb37b9c31@kernel.org
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/lkml/YddGjjmlMZzxUZbN@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull kvm fixes from Paolo Bonzini:
"Two small fixes for x86:
- lockdep WARN due to missing lock nesting annotation
- NULL pointer dereference when accessing debugfs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Check for rmaps allocation
KVM: SEV: Mark nested locking of kvm->lock
Pull drm fixes from Dave Airlie:
"There is only the amdgpu runtime pm regression fix in here:
amdgpu:
- suspend/resume fix
- fix runtime PM regression"
* tag 'drm-fixes-2022-01-07' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: disable runpm if we are the primary adapter
fbdev: fbmem: add a helper to determine if an aperture is used by a fw fb
drm/amd/pm: keep the BACO feature enabled for suspend
Pull rdma fixes from Jason Gunthorpe:
"Last pull for 5.16, the reversion has been known for a while now but
didn't get a proper fix in time. Looks like we will have several
info-leak bugs to take care of going foward.
- Revert the patch fixing the DM related crash causing a widespread
regression for kernel ULPs. A proper fix just didn't appear this
cycle due to the holidays
- Missing NULL check on alloc in uverbs
- Double free in rxe error paths
- Fix a new kernel-infoleak report when forming ah_attr's without
GRH's in ucma"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/core: Don't infoleak GRH fields
RDMA/uverbs: Check for null return of kmalloc_array
Revert "RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow"
RDMA/rxe: Prevent double freeing rxe_map_set()
Pull tracing fixes from Steven Rostedt:
"Three minor tracing fixes:
- Fix missing prototypes in sample module for direct functions
- Fix check of valid buffer in get_trace_buf()
- Fix annotations of percpu pointers"
* tag 'trace-v5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Tag trace_percpu_buffer as a percpu pointer
tracing: Fix check for trace_percpu_buffer validity in get_trace_buf()
ftrace/samples: Add missing prototypes direct functions
When a task is writing to an fd opened by a different task, the perm check
should use the cgroup namespace of the latter task. Add a test for it.
Tested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
When a task is writing to an fd opened by a different task, the perm check
should use the credentials of the latter task. Add a test for it.
Tested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
0644 is an odd perm to create a cgroup which is a directory. Use the regular
0755 instead. This is necessary for euid switching test case.
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's cgroup namespace which is
a potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.
This patch makes cgroup remember the cgroup namespace at the time of open
and uses it for migration permission checks instad of current's. Note that
this only applies to cgroup2 as cgroup1 doesn't have namespace support.
This also fixes a use-after-free bug on cgroupns reported in
https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com
Note that backporting this fix also requires the preceding patch.
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Reported-by: syzbot+50f5cf33a284ce738b62@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com
Fixes: 5136f6365c ("cgroup: implement "nsdelegate" mount option")
Signed-off-by: Tejun Heo <tj@kernel.org>
of->priv is currently used by each interface file implementation to store
private information. This patch collects the current two private data usages
into struct cgroup_file_ctx which is allocated and freed by the common path.
This allows generic private data which applies to multiple files, which will
be used to in the following patch.
Note that cgroup_procs iterator is now embedded as procs.iter in the new
cgroup_file_ctx so that it doesn't need to be allocated and freed
separately.
v2: union dropped from cgroup_file_ctx and the procs iterator is embedded in
cgroup_file_ctx as suggested by Linus.
v3: Michal pointed out that cgroup1's procs pidlist uses of->priv too.
Converted. Didn't change to embedded allocation as cgroup1 pidlists get
stored for caching.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's credentials which is a
potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.
This patch makes both cgroup2 and cgroup1 process migration interfaces to
use the credentials saved at the time of open (file->f_cred) instead of
current's.
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Fixes: 187fe84067 ("cgroup: require write perm on common ancestor when moving processes on the default hierarchy")
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>