For debugging kernel panics and other bugs, there is already an option of
panic_print to dump all tasks' call stacks. On today's large servers
running many containers, there could be thousands of tasks or more, and
this will print out huge amount of call stacks, taking a lot of time (for
serial console which is main target user case of panic_print).
And in many cases, only those several tasks being blocked are key for the
panic, so add an option to only dump blocked tasks' call stacks.
[akpm@linux-foundation.org: clarify documentation a little]
Link: https://lkml.kernel.org/r/20240202132042.3609657-1-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/mm: Improve run_vmtests.sh", v3.
In this series, I'm trying to add 3 missing tests to vm_runtests.sh which
is used to run all the tests in mm suite. These tests weren't running by
CIs. While enabling them and through review feedback, I've fixed some
problems in tests as well. I've found more flakiness in more tests which
I'll be fixing with future patches.
hugetlb-read-hwpoison test is being added where it can only run with newly
added "-d" (destructive) flag only. Not sure why it is failing again. So
once it become stable, we can think of moving it to default set of tests
if it doesn't have any side-effect to them.
This patch (of 5):
Do not unmount the cgroup if it wasn't mounted by the test. The earlier
patch had fixed this for charge_reserved_hugetlb, but not for this test.
I'm adding fixes tag to that earlier patch.
Link: https://lkml.kernel.org/r/20240125154608.720072-1-usama.anjum@collabora.com
Link: https://lkml.kernel.org/r/20240125154608.720072-2-usama.anjum@collabora.com
Fixes: 209376ed2a ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Bump the minimum supported version of LLVM to 13.0.1".
This series bumps the minimum supported version of LLVM for building the
kernel to 13.0.1. The first patch does the bump and all subsequent
patches clean up all the various workarounds and checks for earlier
versions.
Quoting the first patch's commit message for those that were only on CC
for the clean ups:
When __builtin_mul_overflow() has arguments that differ in terms of
signedness and width, LLVM may generate a libcall to __muloti4 because
it performs the checks in terms of 65-bit multiplication. This issue
becomes harder to hit (but still possible) after LLVM 12.0.0, which
includes a special case for matching widths but different signs.
To gain access to this special case, which the kernel can take advantage
of when calls to __muloti4 appear, bump the minimum supported version of
LLVM for building the kernel to 13.0.1. 13.0.1 was chosen because there
is minimal impact to distribution support while allowing a few more
workarounds to be dropped in the kernel source than if 12.0.0 were
chosen. Looking at container images of up to date distribution versions:
archlinux:latest clang version 16.0.6
debian:oldoldstable-slim clang version 7.0.1-8+deb10u2 (tags/RELEASE_701/final)
debian:oldstable-slim Debian clang version 11.0.1-2
debian:stable-slim Debian clang version 14.0.6
debian:testing-slim Debian clang version 16.0.6 (19)
debian:unstable-slim Debian clang version 16.0.6 (19)
fedora:38 clang version 16.0.6 (Fedora 16.0.6-3.fc38)
fedora:latest clang version 17.0.6 (Fedora 17.0.6-1.fc39)
fedora:rawhide clang version 17.0.6 (Fedora 17.0.6-1.fc40)
opensuse/leap:latest clang version 15.0.7
opensuse/tumbleweed:latest clang version 17.0.6
ubuntu:focal clang version 10.0.0-4ubuntu1
ubuntu:latest Ubuntu clang version 14.0.0-1ubuntu1.1
ubuntu:rolling Ubuntu clang version 16.0.6 (15)
ubuntu:devel Ubuntu clang version 17.0.6 (3)
The only distribution that gets left behind is Debian Bullseye, as the
default version is 11.0.1; other distributions either have a newer
version than 13.0.1 or one older than the current minimum of 11.0.0.
Debian has easy access to more recent LLVM versions through
apt.llvm.org, so this is not as much of a concern. There are also the
kernel.org LLVM toolchains, which should work with distributions with
glibc 2.28 and newer.
Another benefit of slimming up the number of supported versions of LLVM
for building the kernel is reducing the build capacity needed to support
a matrix that builds with each supported version, which allows a matrix
to reallocate the freed up build capacity towards something else, such
as more configuration combinations.
This passes my build matrix with all supported versions.
This is based on Andrew's mm-nonmm-unstable to avoid trivial conflicts
with my series to update the LLVM links across the repository [1] but I
can easily rebase it to linux-kbuild if Masahiro would rather these
patches go through there (and defer the conflict resolution to the merge
window).
[1]: https://lore.kernel.org/20240109-update-llvm-links-v1-0-eb09b59db071@kernel.org/
This patch (of 11):
When __builtin_mul_overflow() has arguments that differ in terms of
signedness and width, LLVM may generate a libcall to __muloti4 because it
performs the checks in terms of 65-bit multiplication. This issue becomes
harder to hit (but still possible) after LLVM 12.0.0, which includes a
special case for matching widths but different signs.
To gain access to this special case, which the kernel can take advantage
of when calls to __muloti4 appear, bump the minimum supported version of
LLVM for building the kernel to 13.0.1. 13.0.1 was chosen because there
is minimal impact to distribution support while allowing a few more
workarounds to be dropped in the kernel source than if 12.0.0 were chosen.
Looking at container images of up to date distribution versions:
archlinux:latest clang version 16.0.6
debian:oldoldstable-slim clang version 7.0.1-8+deb10u2 (tags/RELEASE_701/final)
debian:oldstable-slim Debian clang version 11.0.1-2
debian:stable-slim Debian clang version 14.0.6
debian:testing-slim Debian clang version 16.0.6 (19)
debian:unstable-slim Debian clang version 16.0.6 (19)
fedora:38 clang version 16.0.6 (Fedora 16.0.6-3.fc38)
fedora:latest clang version 17.0.6 (Fedora 17.0.6-1.fc39)
fedora:rawhide clang version 17.0.6 (Fedora 17.0.6-1.fc40)
opensuse/leap:latest clang version 15.0.7
opensuse/tumbleweed:latest clang version 17.0.6
ubuntu:focal clang version 10.0.0-4ubuntu1
ubuntu:latest Ubuntu clang version 14.0.0-1ubuntu1.1
ubuntu:rolling Ubuntu clang version 16.0.6 (15)
ubuntu:devel Ubuntu clang version 17.0.6 (3)
The only distribution that gets left behind is Debian Bullseye, as the
default version is 11.0.1; other distributions either have a newer version
than 13.0.1 or one older than the current minimum of 11.0.0. Debian has
easy access to more recent LLVM versions through apt.llvm.org, so this is
not as much of a concern. There are also the kernel.org LLVM toolchains,
which should work with distributions with glibc 2.28 and newer.
Another benefit of slimming up the number of supported versions of LLVM
for building the kernel is reducing the build capacity needed to support a
matrix that builds with each supported version, which allows a matrix to
reallocate the freed up build capacity towards something else, such as
more configuration combinations.
Link: https://lkml.kernel.org/r/20240125-bump-min-llvm-ver-to-13-0-1-v1-0-f5ff9bda41c5@kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/1975
Link: https://github.com/llvm/llvm-project/issues/38013
Link: 3203143f13
Link: https://mirrors.edge.kernel.org/pub/tools/llvm/
Link: https://lkml.kernel.org/r/20240125-bump-min-llvm-ver-to-13-0-1-v1-1-f5ff9bda41c5@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move the code for reading from a checkpoint entry that is performed in
nilfs_attach_checkpoint() to the cpfile side, and make the page mapping
local and temporary. And use kmap_local instead of kmap to access the
checkpoint entry page.
In order to load the ifile inode information included in the checkpoint
entry within the inode lock section of nilfs_ifile_read(), the newly added
checkpoint reading method nilfs_cpfile_read_checkpoint() is called
indirectly via nilfs_ifile_read() instead of from
nilfs_attach_checkpoint().
Link: https://lkml.kernel.org/r/20240122140202.6950-14-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move the checkpoint finalization routine to the cpfile side, and make the
page mapping local and temporary. And use kmap_local instead of kmap to
access the checkpoint entry page when finalizing a checkpoint.
In this conversion, some of the information on the checkpoint entry being
rewritten is passed through the arguments of the newly added method
nilfs_cpfile_finalize_checkpoint().
Link: https://lkml.kernel.org/r/20240122140202.6950-13-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In order to convert kmap() used in cpfile to kmap_local, first move the
checkpoint creation routine, which is one of the places where kmap is
used, to the cpfile side and make the page mapping local and temporary.
And use kmap_local instead of kmap to access the checkpoint entry page
(and header block page) when generating a checkpoint.
Link: https://lkml.kernel.org/r/20240122140202.6950-12-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It is now clear that nilfs_bmap_write() is only used to finalize logs
written to disk. Concurrent bmap modification operations are not
performed on bmaps in this context. Additionally, this function does not
modify data used in read-only operations such as bmap lookups.
Therefore, there is no need to acquire bmap->b_sem in nilfs_bmap_write(),
so delete it.
Link: https://lkml.kernel.org/r/20240122140202.6950-10-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Before converting the disk inode management metadata file ifile, the call
to nilfs_bmap_write(), the i_device_code setting, and the zero-fill code
for inodes on the super root block are moved from
nilfs_write_inode_common() to its callers.
This cleanup simplifies the role and arguments of
nilfs_write_inode_common() and collects calls to nilfs_bmap_write() to the
log writing code.
Also, add and use a new helper nilfs_write_root_mdt_inode() to avoid code
duplication in the data export routine nilfs_segctor_fill_in_super_root()
to the super root block's buffer.
Link: https://lkml.kernel.org/r/20240122140202.6950-9-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Regarding the allocator code that is commonly used in the ondisk inode
metadata file ifile and the disk address translation metadata file DAT,
convert the parts that use the deprecated kmap_atomic() and kmap() to use
kmap_local.
Most can be converted directly, but only
nilfs_palloc_prepare_alloc_entry() needs to be rewritten to change mapping
sections so that multiple kmap_local/kunmap_local calls are nested and
disk I/O can be avoided within the mapping sections.
Link: https://lkml.kernel.org/r/20240122140202.6950-7-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "nilfs2: eliminate kmap and kmap_atomic calls".
This series converts remaining kmap and kmap_atomic calls to use
kmap_local, mainly in metadata files, and eliminates calls to these
deprecated kmap functions from nilfs2.
This series does not include converting metadata files to use folios, but
it is a step in that direction.
Most conversions are straightforward, but some are not: the checkpoint
file, the inode file, and the persistent object allocator. These have
been adjusted or rewritten to avoid multiple kmap_local calls or nest them
if necessary, and to eliminate long waits like block I/O within the
highmem mapping sections.
This series has been tested in both 32-bit and 64-bit environments with
varying block sizes.
This patch (of 15):
In the recovery function when mounting, nilfs_recovery_copy_block() uses
the deprecated kmap_atomic(), so convert it to use kmap_local.
Link: https://lkml.kernel.org/r/20240122140202.6950-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240122140202.6950-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Instead of popping only the maximum element from the heap during each
iteration, we now pop the two largest elements at once. Although this
introduces an additional comparison to determine the second largest
element, it enables a reduction in the height of the tree by one during
the heapify operations starting from root's left/right child. This
reduction in tree height by one leads to a decrease of one comparison and
one swap.
This optimization results in saving approximately 0.5 * n swaps without
increasing the number of comparisons. Additionally, the heap size during
heapify is now one less than the original size, offering a chance for
further reduction in comparisons and swaps.
The following experimental data is based on the array generated using
get_random_u32().
| N | swaps (old) | swaps (new) | comparisons (old) | comparisons (new) |
|-------|-------------|-------------|-------------------|-------------------|
| 1000 | 9054 | 8569 | 10328 | 10320 |
| 2000 | 20137 | 19182 | 22634 | 22587 |
| 3000 | 32062 | 30623 | 35833 | 35752 |
| 4000 | 44274 | 42282 | 49332 | 49306 |
| 5000 | 57195 | 54676 | 63300 | 63294 |
| 6000 | 70205 | 67202 | 77599 | 77557 |
| 7000 | 83276 | 79831 | 92113 | 92032 |
| 8000 | 96630 | 92678 | 106635 | 106617 |
| 9000 | 110349 | 105883 | 121505 | 121404 |
| 10000 | 124165 | 119202 | 136628 | 136617 |
Link: https://lkml.kernel.org/r/20240113031352.2395118-3-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: George Spelvin <lkml@sdf.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "lib/sort: Optimize the number of swaps and comparisons".
This patch series aims to optimize the heapsort algorithm, specifically
targeting a reduction in the number of swaps and comparisons required.
This patch (of 2):
Currently, when searching for the sift-down path and encountering equal
elements, the algorithm chooses the left child. However, considering that
the height of the right subtree may be one less than that of the left
subtree, selecting the right child in such cases can potentially reduce
the number of comparisons and swaps.
For instance, when sorting an array of 10,000 identical elements, the
current implementation requires 247,209 comparisons. With this patch, the
number of comparisons can be reduced to 227,241.
Link: https://lkml.kernel.org/r/20240113031352.2395118-1-visitorckw@gmail.com
Link: https://lkml.kernel.org/r/20240113031352.2395118-2-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Allow to change ipc/mq sysctls inside ipc namespace", v3.
Right now ipc and mq limits count as per ipc namespace, but only real root
can change them. By default, the current values of these limits are such
that it can only be reduced. Since only root can change the values, it is
impossible to reduce these limits in the rootless container.
We can allow limit changes within ipc namespace because mq parameters are
limited by RLIMIT_MSGQUEUE and ipc parameters are not limited to anything
other than cgroups.
This patch (of 3):
Rootless containers are not allowed to modify kernel IPC parameters.
All default limits are set to such high values that in fact there are no
limits at all. All limits are not inherited and are initialized to
default values when a new ipc_namespace is created.
For new ipc_namespace:
size_t ipc_ns.shm_ctlmax = SHMMAX; // (ULONG_MAX - (1UL << 24))
size_t ipc_ns.shm_ctlall = SHMALL; // (ULONG_MAX - (1UL << 24))
int ipc_ns.shm_ctlmni = IPCMNI; // (1 << 15)
int ipc_ns.shm_rmid_forced = 0;
unsigned int ipc_ns.msg_ctlmax = MSGMAX; // 8192
unsigned int ipc_ns.msg_ctlmni = MSGMNI; // 32000
unsigned int ipc_ns.msg_ctlmnb = MSGMNB; // 16384
The shm_tot (total amount of shared pages) has also ceased to be global,
it is located in ipc_namespace and is not inherited from anywhere.
In such conditions, it cannot be said that these limits limit anything.
The real limiter for them is cgroups.
If we allow rootless containers to change these parameters, then it can
only be reduced.
Link: https://lkml.kernel.org/r/cover.1705333426.git.legion@kernel.org
Link: https://lkml.kernel.org/r/d2f4603305cbfed58a24755aa61d027314b73a45.1705333426.git.legion@kernel.org
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Link: https://lkml.kernel.org/r/e2d84d3ec0172cfff759e6065da84ce0cc2736f8.1663756794.git.legion@kernel.org
Cc: Christian Brauner <brauner@kernel.org>
Cc: Joel Granados <joel.granados@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Optimize the min_heapify() function, resulting in a significant reduction
of approximately 50% in the number of comparisons for large random inputs,
while maintaining identical results.
The current implementation performs two comparisons per level to identify
the minimum among three elements. In contrast, the proposed bottom-up
variation uses only one comparison per level to assess two children until
reaching the leaves. Then, it sifts up until the correct position is
determined.
Typically, the process of sifting down proceeds to the leaf level,
resulting in O(1) secondary comparisons instead of log2(n). This
optimization significantly reduces the number of costly indirect function
calls and improves overall performance.
Link: https://lkml.kernel.org/r/20240110081213.2289636-3-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>