Commit Graph

1412034 Commits

Author SHA1 Message Date
Qiliang Yuan
fc94368bce fs/file: optimize close_range() complexity from O(N) to O(Sparse)
In close_range(), the kernel traditionally performs a linear scan over the
[fd, max_fd] range, resulting in O(N) complexity where N is the range size.
For processes with sparse FD tables, this is inefficient as it checks many
unallocated slots.

This patch optimizes __range_close() by using find_next_bit() on the
open_fds bitmap to skip holes. This shifts the algorithmic complexity from
O(Range Size) to O(Active FDs), providing a significant performance boost
for large-range close operations on sparse file descriptor tables.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
Link: https://patch.msgid.link/20260123081221.659125-1-realwujing@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-23 11:31:49 +01:00
Miklos Szeredi
6cbfdf8947 posix_acl: make posix_acl_to_xattr() alloc the buffer
Without exception all caller do that.  So move the allocation into the
helper.

This reduces boilerplate and removes unnecessary error checking.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://patch.msgid.link/20260115122341.556026-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16 10:51:12 +01:00
Mateusz Guzik
88ec797c46 fs: make insert_inode_locked() wait for inode destruction
This is the only routine which instead skipped instead of waiting.

The current behavior is arguably a bug as it results in a corner case
where the inode hash can have *two* matching inodes, one of which is on
its way out.

Ironing out this difference is an incremental step towards sanitizing
the API.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20260114094717.236202-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 17:05:35 +01:00
David Disseldorp
aaf7683961 initramfs_test: kunit test for cpio.filesize > PATH_MAX
initramfs unpack skips over cpio entries where namesize > PATH_MAX,
instead of returning an error. Add coverage for this behaviour.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Link: https://patch.msgid.link/20260114135051.4943-2-ddiss@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 17:05:35 +01:00
Yuto Ohnuki
7c02250033 fs: improve dump_inode() to safely access inode fields
Use get_kernel_nofault() to safely access inode and related structures
(superblock, file_system_type) to avoid crashing when the inode pointer
is invalid. This allows the same pattern as dump_mapping().

Note: The original access method for i_state and i_count is preserved,
as get_kernel_nofault() is unnecessary once the inode structure is
verified accessible.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
Link: https://patch.msgid.link/20260112181443.81286-1-ytohnuki@amazon.com
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 16:50:38 +01:00
Christian Brauner
58ecde96e8 Merge patch series "exportfs: Some kernel-doc fixes"
André Almeida <andrealmeid@igalia.com> says:

This short series removes some duplicated documentation and address some
kernel-doc issues.

* patches from https://patch.msgid.link/20260112-tonyk-fs_uuid-v1-0-acc1889de772@igalia.com:
  docs: exportfs: Use source code struct documentation
  exportfs: Complete kernel-doc for struct export_operations
  exportfs: Mark struct export_operations functions at kernel-doc
  exportfs: Fix kernel-doc output for get_name()

Link: https://patch.msgid.link/20260112-tonyk-fs_uuid-v1-0-acc1889de772@igalia.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 16:50:37 +01:00
Ben Dooks
589cff4975 fs: add <linux/init_task.h> for 'init_fs'
The init_fs symbol is defined in <linux/init_task.h> but was
not included in fs/fs_struct.c so fix by adding the include.

Fixes the following sparse warning:
fs/fs_struct.c:150:18: warning: symbol 'init_fs' was not declared. Should it be static?

Fixes: 3e93cd6718 ("Take fs_struct handling to new file")
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://patch.msgid.link/20260108115856.238027-1-ben.dooks@codethink.co.uk
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 16:50:37 +01:00
André Almeida
f9a6a3fec2 docs: exportfs: Use source code struct documentation
Instead of duplicating struct export_operations documentation in both
ReST file and in the C source code, just use the kernel-doc in the docs.

While here, make the sentence preceding the paragraph less redundant.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260112-tonyk-fs_uuid-v1-4-acc1889de772@igalia.com
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 16:50:37 +01:00
Amir Goldstein
1219e0feae fs: move initializing f_mode before file_ref_init()
The comment above file_ref_init() says:
"We're SLAB_TYPESAFE_BY_RCU so initialize f_ref last."
but file_set_fsnotify_mode() was added after file_ref_init().

Move it right after setting f_mode, where it makes more sense.

Fixes: 711f9b8fbe ("fsnotify: disable pre-content and permission events by default")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260109211536.3565697-1-amir73il@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 16:50:37 +01:00
André Almeida
7a6f811e2c exportfs: Complete kernel-doc for struct export_operations
Write down the missing members definitions for struct export_operations,
using as a reference the commit messages that created the members.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260112-tonyk-fs_uuid-v1-3-acc1889de772@igalia.com
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 16:50:37 +01:00
André Almeida
fc76b5968a exportfs: Mark struct export_operations functions at kernel-doc
Adding a `@` before the function names make then recognizable as
kernel-docs, so they get correctly rendered in the documentation.

Even if they are already marked with `@` in the short one-line summary,
the kernel-docs will correctly favor the more detailed definition here.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260112-tonyk-fs_uuid-v1-2-acc1889de772@igalia.com
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 16:50:37 +01:00
André Almeida
5e7fa6bfa9 exportfs: Fix kernel-doc output for get_name()
Without a space between %NAME_MAX and the plus sign, kernel-doc will
output ``NAME_MAX``+1, which scapes the last backtick and make Sphinx
format a much larger string as monospaced text.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260112-tonyk-fs_uuid-v1-1-acc1889de772@igalia.com
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 16:50:37 +01:00
Jeff Layton
46329a9dd7 acct(2): begin the deprecation of legacy BSD process accounting
As Christian points out [1], even though it's privileged, this interface
has a lot of footguns. There are better options these days (e.g. eBPF),
so it would be good to start discouraging its use and mark it as
deprecated.

[1]: https://lore.kernel.org/linux-fsdevel/20250212-giert-spannend-8893f1eaba7d@brauner/

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260106-bsd-acct-v1-1-d15564b52c83@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 16:50:36 +01:00
Breno Leitao
6784f27472 device_cgroup: remove branch hint after code refactor
commit 4ef4ac3601 ("device_cgroup: avoid access to ->i_rdev in the
common case in devcgroup_inode_permission()") reordered the checks in
devcgroup_inode_permission() to check the inode mode before checking
i_rdev, for better cache behavior.

However, the likely() annotation on the i_rdev check was not updated
to reflect the new code flow. Originally, when i_rdev was checked
first, likely(!inode->i_rdev) made sense because most inodes were(?)
regular files/directories, thus i_rdev == 0.

After the reorder, by the time we reach the i_rdev check, we have
already confirmed the inode IS a block or character device. Block and
character special files are precisely defined by having a device number
(i_rdev), so !inode->i_rdev is now the rare edge case, not the common
case.

Branch profiling confirmed this is 100% mispredicted:

  correct incorrect  %    Function                      File              Line
  ------- ---------  -    --------                      ----              ----
        0   2631904 100   devcgroup_inode_permission    device_cgroup.h   24

Remove likely() to avoid giving the wrong hint to the CPU.

Fixes: 4ef4ac3601 ("device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()")
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260107-likely_device-v1-1-0c55f83a7e47@debian.org
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 16:50:36 +01:00
Christian Brauner
edecd1ae59 Merge patch series "vfs kernel-doc fixes for 6.19"
Bagas Sanjaya <bagasdotme@gmail.com> says:

Here are kernel-doc fixes for vfs subsystem targetting 6.19. This small
series is split from much larger kernel-doc fixes series I posted a while
ago [1].

* patches from https://patch.msgid.link/20251219024620.22880-1-bagasdotme@gmail.com:
  VFS: fix __start_dirop() kernel-doc warnings
  fs: Describe @isnew parameter in ilookup5_nowait()

Link: https://patch.msgid.link/20251219024620.22880-1-bagasdotme@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-06 23:17:57 +01:00
Bagas Sanjaya
ba4c74f80e VFS: fix __start_dirop() kernel-doc warnings
Sphinx report kernel-doc warnings:

WARNING: ./fs/namei.c:2853 function parameter 'state' not described in '__start_dirop'
WARNING: ./fs/namei.c:2853 expecting prototype for start_dirop(). Prototype was for __start_dirop() instead

Fix them up.

Fixes: ff7c4ea11a ("VFS: add start_creating_killable() and start_removing_killable()")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20251219024620.22880-3-bagasdotme@gmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-06 23:17:52 +01:00
Bagas Sanjaya
b0f5804b41 fs: Describe @isnew parameter in ilookup5_nowait()
Sphinx reports kernel-doc warning:

WARNING: ./fs/inode.c:1607 function parameter 'isnew' not described in 'ilookup5_nowait'

Describe the parameter.

Fixes: a27628f436 ("fs: rework I_NEW handling to operate without fences")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20251219024620.22880-2-bagasdotme@gmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-06 23:17:52 +01:00
Breno Leitao
a6b9f5b2f0 fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu
The check for DCACHE_MANAGED_DENTRY at the start of __follow_mount_rcu()
is redundant because the only caller (handle_mounts) already verifies
d_managed(dentry) before calling this function, so, dentry in
__follow_mount_rcu() has always DCACHE_MANAGED_DENTRY set.

This early-out optimization never fires in practice - but it is marking
as likely().

This was detected with branch profiling, which shows 100% misprediction
in this likely.

Remove the whole if clause instead of removing the likely, given we
know for sure that dentry is not DCACHE_MANAGED_DENTRY.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260105-dcache-v1-1-f0d904b4a7c2@debian.org
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-06 22:48:15 +01:00
Mateusz Guzik
729d015ab2 fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS
Calls to the 2 modified routines are explicitly gated with checks for
the flag, so there is no use for this in production kernels.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251229125751.826050-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-06 22:47:07 +01:00
Thomas Weißschuh
0f166bf1d6 select: store end_time as timespec64 in restart block
Storing the end time seconds as 'unsigned long' can lead to truncation
on 32-bit architectures if assigned from the 64-bit timespec64::tv_sec.
As the select() core uses timespec64 consistently, also use that in the
restart block.

This also allows the simplification of the accessors.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20251223-restart-block-expiration-v2-1-8e33e5df7359@linutronix.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-24 14:01:57 +01:00
chen zhang
3685744afa chardev: Switch to guard(mutex) and __free(kfree)
Instead of using the 'goto label; mutex_unlock()' pattern use
'guard(mutex)' which will release the mutex when it goes out of scope.
Use the __free(kfree) cleanup to replace instances of manually
calling kfree(). Also make some code path simplifications that this
allows.

Signed-off-by: chen zhang <chenzhang@kylinos.cn>
Link: https://patch.msgid.link/20251215111500.159243-1-chenzhang@kylinos.cn
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-24 13:55:51 +01:00
Thorsten Blum
3f320e5c2e namespace: Replace simple_strtoul with kstrtoul to parse boot params
Replace simple_strtoul() with the recommended kstrtoul() for parsing the
'mhash_entries=' and 'mphash_entries=' boot parameters.

Check the return value of kstrtoul() and reject invalid values. This
adds error handling while preserving behavior for existing values, and
removes use of the deprecated simple_strtoul() helper.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20251214153141.218953-2-thorsten.blum@linux.dev
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-24 13:51:12 +01:00
Thorsten Blum
b29a0a37f4 dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries
Replace simple_strtoul() with the recommended kstrtoul() for parsing the
'dhash_entries=' boot parameter.

Check the return value of kstrtoul() and reject invalid values. This
adds error handling while preserving behavior for existing values, and
removes use of the deprecated simple_strtoul() helper.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20251216145236.44520-2-thorsten.blum@linux.dev
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-24 13:49:36 +01:00
Thorsten Blum
63ad216fbf fs: Replace simple_strtoul with kstrtoul in set_ihash_entries
Replace simple_strtoul() with the recommended kstrtoul() for parsing the
'ihash_entries=' boot parameter.

Check the return value of kstrtoul() and reject invalid values. This
adds error handling while preserving behavior for existing valid values,
and removes use of the deprecated simple_strtoul() helper.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20251218112144.225301-2-thorsten.blum@linux.dev
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-24 13:36:39 +01:00
Deepakkumar Karn
b68f91ef3b fs/buffer: add alert in try_to_free_buffers() for folios without buffers
try_to_free_buffers() can be called on folios with no buffers attached
when filemap_release_folio() is invoked on a folio belonging to a mapping
with AS_RELEASE_ALWAYS set but no release_folio operation defined.

In such cases, folio_needs_release() returns true because of the
AS_RELEASE_ALWAYS flag, but the folio has no private buffer data. This
causes try_to_free_buffers() to call drop_buffers() on a folio with no
buffers, leading to a null pointer dereference.

Adding a check in try_to_free_buffers() to return early if the folio has no
buffers attached, with WARN_ON_ONCE() to alert about the misconfiguration.
This provides defensive hardening.

Signed-off-by: Deepakkumar Karn <dkarn@redhat.com>
Link: https://patch.msgid.link/20251211131211.308021-1-dkarn@redhat.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15 15:07:22 +01:00
Christian Brauner
961b2ad1b4 Merge patch series "further damage-control lack of clone scalability"
Mateusz Guzik <mjguzik@gmail.com> says:

When spawning and killing threads in separate processes in parallel the
primary bottleneck on the stock kernel is pidmap_lock, largely because
of a back-to-back acquire in the common case.

Benchmark code at the end.

With this patchset alloc_pid() only takes the lock once and consequently
alleviates the problem. While scalability improves, the lock remains the
primary bottleneck by a large margin.

I believe idr is a poor choice for the task at hand to begin with, but
sorting out that out beyond the scope of this patchset. At the same time
any replacement would be best evaluated against a state where the
above relock problem is fixed.

Performance improvement varies between reboots. When benchmarking with
20 processes creating and killing threads in a loop, the unpatched
baseline hovers around 465k ops/s, while patched is anything between
~510k ops/s and ~560k depending on false-sharing (which I only minimally
sanitized). So this is at least 10% if you are unlucky.

bench from will-it-scale:

char *testcase_description = "Thread creation and teardown";

static void *worker(void *arg)
{
        return (NULL);
}

void testcase(unsigned long long *iterations, unsigned long nr)
{
        pthread_t thread[1];
        int error;

        while (1) {
                for (int i = 0; i < 1; i++) {
                        error = pthread_create(&thread[i], NULL, worker, NULL);
                        assert(error == 0);
                }
                for (int i = 0; i < 1; i++) {
                        error = pthread_join(thread[i], NULL);
                        assert(error == 0);
                }
                (*iterations)++;
        }
}

v2:
- cosmetic fixes from Oleg
- drop idr_preload_many, relock pidmap + call idr_preload again instead
- write a commit message for the alloc pid patch

* patches from https://patch.msgid.link/20251203092851.287617-1-mjguzik@gmail.com:
  pid: only take pidmap_lock once on alloc
  ns: pad refcount

Link: https://patch.msgid.link/20251203092851.287617-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15 14:33:38 +01:00
Mateusz Guzik
887e97745e fs: track the inode having file locks with a flag in ->i_opflags
Opening and closing an inode dirties the ->i_readcount field.

Depending on the alignment of the inode, it may happen to false-share
with other fields loaded both for both operations to various extent.

This notably concerns the ->i_flctx field.

Since most inodes don't have the field populated, this bit can be managed
with a flag in ->i_opflags instead which bypasses the problem.

Here are results I obtained while opening a file read-only in a loop
with 24 cores doing the work on Sapphire Rapids. Utilizing the flag as
opposed to reading ->i_flctx field was toggled at runtime as the benchmark
was running, to make sure both results come from the same alignment.

before: 3233740
after:  3373346 (+4%)

before: 3284313
after:  3518711 (+7%)

before: 3505545
after:  4092806 (+16%)

Or to put it differently, this varies wildly depending on how (un)lucky
you get.

The primary bottleneck before and after is the avoidable lockref trip in
do_dentry_open().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251203094837.290654-2-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15 14:33:38 +01:00
Mateusz Guzik
6d864a1b18 pid: only take pidmap_lock once on alloc
When spawning and killing threads in separate processes in parallel the
primary bottleneck on the stock kernel is pidmap_lock, largely because
of a back-to-back acquire in the common case. This aspect is fixed with
the patch.

Performance improvement varies between reboots. When benchmarking with
20 processes creating and killing threads in a loop, the unpatched
baseline hovers around 465k ops/s, while patched is anything between
~510k ops/s and ~560k depending on false-sharing (which I only minimally
sanitized). So this is at least 10% if you are unlucky.

The change also facilitated some cosmetic fixes.

It has an unintentional side effect of no longer issuing spurious
idr_preload() around idr_replace().

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251203092851.287617-3-mjguzik@gmail.com
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15 14:33:38 +01:00
Mateusz Guzik
1fa4e69a54 filelock: use a consume fence in locks_inode_context()
Matches the idiom of storing a pointer with a release fence and safely
getting the content with a consume fence after.

Eliminates an actual fence on some archs.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251203094837.290654-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15 14:33:38 +01:00
Mateusz Guzik
c0aac5975b ns: pad refcount
Note no effort is made to make sure structs embedding the namespace are
themselves aligned, so this is not guaranteed to eliminate cacheline
bouncing due to refcount management.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251203092851.287617-2-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15 14:33:38 +01:00
Mateusz Guzik
5854fc6391 fs: annotate cdev_lock with __cacheline_aligned_in_smp
No need for the crapper to be susceptible to false-sharing.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251203095508.291073-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15 14:33:38 +01:00
David Laight
0f5bb0cfb0 fs: use min() or umin() instead of min_t()
min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'.
Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long'
and so cannot discard significant bits.

A couple of places need umin() because of loops like:
	nfolios = DIV_ROUND_UP(ret + start, PAGE_SIZE);

	for (i = 0; i < nfolios; i++) {
		struct folio *folio = page_folio(pages[i]);
		...
		unsigned int len = umin(ret, PAGE_SIZE - start);
		...
		ret -= len;
		...
	}
where the compiler doesn't track things well enough to know that
'ret' is never negative.

The alternate loop:
        for (i = 0; ret > 0; i++) {
                struct folio *folio = page_folio(pages[i]);
                ...
                unsigned int len = min(ret, PAGE_SIZE - start);
                ...
                ret -= len;
                ...
        }
would be equivalent and doesn't need 'nfolios'.

Most of the 'unsigned long' actually come from PAGE_SIZE.

Detected by an extra check added to min_t().

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Link: https://patch.msgid.link/20251119224140.8616-31-david.laight.linux@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15 14:33:37 +01:00
Linus Torvalds
8f0b4cce44 Linux 6.19-rc1 v6.19-rc1 2025-12-14 16:05:07 +12:00
Linus Torvalds
6a1636e066 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "The only core fix is in doc; all the others are in drivers, with the
  biggest impacts in libsas being the rollback on error handling and in
  ufs coming from a couple of error handling fixes, one causing a crash
  if it's activated before scanning and the other fixing W-LUN
  resumption"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: qcom: Fix confusing cleanup.h syntax
  scsi: libsas: Add rollback handling when an error occurs
  scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name()
  scsi: ufs: core: Fix a deadlock in the frequency scaling code
  scsi: ufs: core: Fix an error handler crash
  scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed"
  scsi: ufs: core: Fix RPMB link error by reversing Kconfig dependencies
  scsi: qla4xxx: Use time conversion macros
  scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during reset
  scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset
  scsi: imm: Fix use-after-free bug caused by unfinished delayed work
  scsi: target: sbp: Remove KMSG_COMPONENT macro
  scsi: core: Correct documentation for scsi_device_quiesce()
  scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1
  scsi: target: Reset t_task_cdb pointer in error case
  scsi: ufs: core: Fix EH failure after W-LUN resume error
2025-12-14 15:35:35 +12:00
Linus Torvalds
0dfb36b2dc Merge tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
 "We have a patch that adds an initial set of tracepoints to the MDS
  client from Max, a fix that hardens osdmap parsing code from myself
  (marked for stable) and a few assorted fixups"

* tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client:
  rbd: stop selecting CRC32, CRYPTO, and CRYPTO_AES
  ceph: stop selecting CRC32, CRYPTO, and CRYPTO_AES
  libceph: make decode_pool() more resilient against corrupted osdmaps
  libceph: Amend checking to fix `make W=1` build breakage
  ceph: Amend checking to fix `make W=1` build breakage
  ceph: add trace points to the MDS client
  libceph: fix log output race condition in OSD client
2025-12-14 15:24:10 +12:00
Linus Torvalds
4cfc21494a Merge tag 'tomoyo-pr-20251212' of git://git.code.sf.net/p/tomoyo/tomoyo
Pull tomoyo update from Tetsuo Handa:
 "Trivial optimization"

* tag 'tomoyo-pr-20251212' of git://git.code.sf.net/p/tomoyo/tomoyo:
  tomoyo: Use local kmap in tomoyo_dump_page()
2025-12-14 15:21:02 +12:00
Linus Torvalds
4a298a43f5 Merge tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug fix from Ingo Molnar:

 - Fix CPU hotplug callbacks to disable interrupts on UP kernels

* tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu: Make atomic hotplug callbacks run with interrupts disabled on UP
2025-12-14 06:12:46 +12:00
Linus Torvalds
cba09e3ed0 Merge tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:

 - Fix NULL pointer dereference crash in the Intel PMU driver

 - Fix missing read event generation on task exit

 - Fix AMD uncore driver init error handling

 - Fix whitespace noise

* tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common()
  perf/core: Fix missing read event generation on task exit
  perf/x86/amd/uncore: Fix the return value of amd_uncore_df_event_init() on error
  perf/uprobes: Remove <space><Tab> whitespace noise
2025-12-14 06:10:35 +12:00
Linus Torvalds
db0130185e Merge tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:

 - Fix error code in the irqchip/mchp-eic driver

 - Fix setup_percpu_irq() affinity assumptions

 - Remove the unused irq_domain_add_tree() function

* tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mchp-eic: Fix error code in mchp_eic_domain_alloc()
  irqdomain: Delete irq_domain_add_tree()
  genirq: Allow NULL affinity for setup_percpu_irq()
2025-12-14 06:07:09 +12:00
Linus Torvalds
edbe407235 Merge tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc core fixes from Ingo Molnar:

 - Improve bug reporting

 - Suppress W=1 format warning

 - Improve rseq scalability on Clang builds

* tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq: Always inline rseq_debug_syscall_return()
  bug: Hush suggest-attribute=format for __warn_printf()
  bug: Let report_bug_entry() provide the correct bugaddr
2025-12-14 06:04:16 +12:00
Linus Torvalds
9d9c1cfec0 Merge tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
 "There are no significant series in this small merge. Please see the
  individual changelogs for details"

[ Editor's note: it's mainly ocfs2 and a couple of random fixes ]

* tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: memfd_luo: add CONFIG_SHMEM dependency
  mm: shmem: avoid build warning for CONFIG_SHMEM=n
  ocfs2: fix memory leak in ocfs2_merge_rec_left()
  ocfs2: invalidate inode if i_mode is zero after block read
  ocfs2: avoid -Wflex-array-member-not-at-end warning
  ocfs2: convert remaining read-only checks to ocfs2_emergency_state
  ocfs2: add ocfs2_emergency_state helper and apply to setattr
  checkpatch: add uninitialized pointer with __free attribute check
  args: fix documentation to reflect the correct numbers
  ocfs2: fix kernel BUG in ocfs2_find_victim_chain
  liveupdate: luo_core: fix redundant bound check in luo_ioctl()
  ocfs2: validate inline xattr size and entry count in ocfs2_xattr_ibody_list
  fs/fat: remove unnecessary wrapper fat_max_cache()
  ocfs2: replace deprecated strcpy with strscpy
  ocfs2: check tl_used after reading it from trancate log inode
  liveupdate: luo_file: don't use invalid list iterator
2025-12-13 20:55:12 +12:00
Linus Torvalds
2516a87153 Merge tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:

 - "powerpc/pseries/cmm: two smaller fixes" (David Hildenbrand)
   fixes a couple of minor things in ppc land

 - "Improve folio split related functions" (Zi Yan)
   some cleanups and minorish fixes in the folio splitting code

* tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/damon/tests/core-kunit: avoid damos_test_commit stack warning
  mm: vmscan: correct nr_requested tracing in scan_folios
  MAINTAINERS: add idr core-api doc file to XARRAY
  mm/hugetlb: fix incorrect error return from hugetlb_reserve_pages()
  mm: fix CONFIG_STACK_GROWSUP typo in mm.h
  mm/huge_memory: fix folio split stats counting
  mm/huge_memory: make min_order_for_split() always return an order
  mm/huge_memory: replace can_split_folio() with direct refcount calculation
  mm/huge_memory: change folio_split_supported() to folio_check_splittable()
  mm/sparse: fix sparse_vmemmap_init_nid_early definition without CONFIG_SPARSEMEM
  powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pages
  powerpc/pseries/cmm: call balloon_devinfo_init() also without CONFIG_BALLOON_COMPACTION
2025-12-13 20:35:41 +12:00
Christian Brauner
d2ea4d254d file: ensure cleanup
Brown paper bag time. This is a silly oversight where I missed to drop
the error condition checking to ensure we clean up on early error
returns. I have an internal unit testset coming up for this which will
catch all such issues going forward.

Reported-by: Chris Mason <clm@fb.com>
Reported-by: Jeff Layton <jlayton@kernel.org>
Fixes: 011703a9ac ("file: add FD_{ADD,PREPARE}()")
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-12-13 20:04:32 +12:00
Linus Torvalds
d552fc632c x86/hv: Add gitignore entry for generated header file
Commit 7bfe3b8ea6 ("Drivers: hv: Introduce mshv_vtl driver") added a
new generated header file for the offsets into the mshv_vtl_cpu_context
structure to be used by the low-level assembly code.  But it didn't add
the .gitignore file to go with it, so 'git status' and friends will
mention it.

Let's add the gitignore file before somebody thinks that generated
header should be committed.

Fixes: 7bfe3b8ea6 ("Drivers: hv: Introduce mshv_vtl driver")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-12-13 19:57:41 +12:00
Linus Torvalds
a859eca0e4 Merge tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel
Pull more drm fixes from Dave Airlie:
 "These are the enqueued fixes that ended up in our fixes branch,
  nouveau mostly, along with some small fixes in other places.

  plane:
   - Handle IS_ERR vs NULL in drm_plane_create_hotspot_properties()

  ttm:
   - fix devcoredump for evicted bos

  panel:
   - Fix stack usage warning in novatek-nt35560

  nouveau:
   - alloc fwsec sb at boot to avoid s/r problems
   - fix strcpy usage
   - fix i2c encoder crash

  bridge:
   - Ignore spurious PLL_UNLOCK bit in ti-sn65dsi83

  mgag200:
   - Fix bigendian handling in mgag200

  tilcdc:
   - Fix probe failure in tilcdc"

* tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel:
  drm/mgag200: Fix big-endian support
  drm/tilcdc: Fix removal actions in case of failed probe
  drm/ttm: Avoid NULL pointer deref for evicted BOs
  drm: nouveau: Replace sprintf() with sysfs_emit()
  drm/nouveau: fix circular dep oops from vendored i2c encoder
  drm/nouveau: refactor deprecated strcpy
  drm/plane: Fix IS_ERR() vs NULL check in drm_plane_create_hotspot_properties()
  drm/bridge: ti-sn65dsi83: ignore PLL_UNLOCK errors
  drm/nouveau/gsp: Allocate fwsec-sb at boot
  drm/panel: novatek-nt35560: avoid on-stack device structure
2025-12-13 17:39:28 +12:00
Linus Torvalds
237f1bbfe3 Merge tag 'drm-next-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "This is the weekly fixes for what is in next tree, mostly amdgpu and
  some i915, panthor and a core revert.

  core:
   - revert dumb bo 8 byte alignment

  amdgpu:
   - SI fix
   - DC reduce stack usage
   - HDMI fixes
   - VCN 4.0.5 fix
   - DP MST fix
   - DC memory allocation fix

  amdkfd:
   - SVM fix
   - Trap handler fix
   - VGPR fixes for GC 11.5

  i915:
   - Fix format string truncation warning
   - FIx runtime PM reference during fbdev BO creation

  panthor:
   - fix UAF

  renesas:
   - fix sync flag handling"

* tag 'drm-next-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel:
  Revert "drm/amd/display: Fix pbn to kbps Conversion"
  drm/amd: Fix unbind/rebind for VCN 4.0.5
  drm/i915: Fix format string truncation warning
  drm/i915/fbdev: Hold runtime PM ref during fbdev BO creation
  drm/amd/display: Improve HDMI info retrieval
  drm/amdkfd: bump minimum vgpr size for gfx1151
  drm/amd/display: shrink struct members
  drm/amdkfd: Export the cwsr_size and ctl_stack_size to userspace
  drm/amd/display: Refactor dml_core_mode_support to reduce stack frame
  drm/amdgpu: don't attach the tlb fence for SI
  drm/amd/display: Use GFP_ATOMIC in dc_create_plane_state()
  drm/amdkfd: Trap handler support for expert scheduling mode
  drm/amdkfd: Use huge page size to check split svm range alignment
  drm/rcar-du: dsi: Handle both DRM_MODE_FLAG_N.SYNC and !DRM_MODE_FLAG_P.SYNC
  drm/gem-shmem: revert the 8-byte alignment constraint
  drm/gem-dma: revert the 8-byte alignment constraint
  drm/panthor: Prevent potential UAF in group creation
2025-12-13 17:25:26 +12:00
Linus Torvalds
d8cc0b917b Merge tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull further i3c update from Alexandre Belloni:
 "We are removing a legacy API callback and having this sooner rather
  than later will help ensuring no one introduces a new driver using it.

  I've also added patches removing the "__free(...) = NULL" pattern
  because I'm sure we won't avoid people sending those following the
  mailing list discussion..."

* tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  i3c: adi: Fix confusing cleanup.h syntax
  i3c: master: Fix confusing cleanup.h syntax
  i3c: master: cleanup callback .priv_xfers()
  i3c: master: switch to use new callback .i3c_xfers() from .priv_xfers()
2025-12-13 17:15:16 +12:00
Linus Torvalds
d324e9a915 Merge tag 'rtc-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
 "Subsystem:
   - stop setting max_user_freq from the individual drivers as this has
     not been hardware related for a while

  New drivers:
   - Andes ATCRTC100
   - Apple SMC
   - Nvidia VRS

  Drivers:
   - renesas-rtca3: add RZ/V2H support
   - tegra: add ACPI support"

* tag 'rtc-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (34 commits)
  rtc: spacemit: MFD_SPACEMIT_P1 as dependencies
  rtc: atcrtc100: Fix signedness bug in probe()
  rtc: max31335: Fix ignored return value in set_alarm
  rtc: gamecube: Check the return value of ioremap()
  Documentation: ABI: testing: Fix "upto" typo in rtc-cdev
  rtc: Add new rtc-macsmc driver for Apple Silicon Macs
  dt-bindings: rtc: Add Apple SMC RTC
  MAINTAINERS: drop unneeded file entry in NVIDIA VRS RTC DRIVER
  rtc: isl12026: Add id_table
  rtc: renesas-rtca3: Add support for multiple reset lines
  dt-bindings: rtc: renesas,rz-rtca3: Add RZ/V2H support
  rtc: tegra: Replace deprecated SIMPLE_DEV_PM_OPS
  rtc: tegra: Add ACPI support
  rtc: tegra: Use devm_clk_get_enabled() in probe
  rtc: Kconfig: add MC34708 to mc13xxx help text
  rtc: s35390a: use u8 instead of char for register buffer
  rtc: nvvrs: add NVIDIA VRS RTC device driver
  dt-bindings: rtc: Document NVIDIA VRS RTC
  rtc: atcrtc100: Add ATCRTC100 RTC driver
  MAINTAINERS: Add entry for ATCRTC100 RTC driver
  ...
2025-12-13 17:09:06 +12:00
Linus Torvalds
a919610db4 Merge tag 'pwm/for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fix from Uwe Kleine-König:
 "Fix missing th1520 Kconfig dependencies

  This tightens the dependency for the new pwm driver written in Rust to
  make build bots and obviously also users happy"

* tag 'pwm/for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
  pwm: th1520: Fix missing Kconfig dependencies
2025-12-13 16:41:50 +12:00
Linus Torvalds
a6bb419c1c Merge tag 'gpio-fixes-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:

 - fix spinlock op type after conversion to lock guards

 - fix a memory leak in error path in gpio-regmap

 - Kconfig fixes in GPIO drivers

 - add a GPIO ACPI quirk for Dell Precision 7780

 - set of fixes for shared GPIO management

* tag 'gpio-fixes-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: shared: make locking more fine-grained
  gpio: shared: fix auxiliary device cleanup order
  gpio: shared: check if a reference is populated before cleaning its resources
  gpio: shared: fix NULL-pointer dereference in teardown path
  gpio: shared: ignore disabled nodes when traversing the device-tree
  gpiolib: acpi: Add quirk for Dell Precision 7780
  gpio: tb10x: fix OF_GPIO dependency
  gpio: qixis: select CONFIG_REGMAP_MMIO
  gpio: regmap: Fix memleak in error path in gpio_regmap_register()
  gpio: mmio: fix bad guard conversion
2025-12-13 16:36:57 +12:00