Commit Graph

102029 Commits

Author SHA1 Message Date
Dmitry Antipov
fe7a283b39 ocfs2: add suballoc slot check in ocfs2_validate_inode_block()
In 'ocfs2_validate_inode_block()', add suballoc slot check similar to one
in 'ocfs2_get_suballoc_slot_bit()', thus preventing an out-of-bounds
accesses for 'local_system_inodes' in 'get_local_system_inode()'.

Most likely this fixes
https://syzkaller.appspot.com/bug?extid=a77d690840e60bc2ddd8 as well.

Link: https://lkml.kernel.org/r/20250826095106.666980-1-dmantipov@yandex.ru
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Reported-by: syzbot+900962ac9bf1860033f2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=900962ac9bf1860033f2
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 17:32:53 -07:00
yili
493977de14 ocfs2: fix super block reserved field offset comment
The offset annotation for s_reserved2 in struct ocfs2_super_block
was incorrect. After the preceding fields:
- s_xattr_inline_size (2 bytes at 0xB8)
- s_reserved0 (2 bytes at 0xBA)
- s_dx_seed[3] (12 bytes at 0xBC)

The actual offset of s_reserved2 is at 0xC8, when calculating from the
start of the structure.

Correct the offset comment from C0 to C8 to reflect the proper location in
the super block structure.

Link: https://lkml.kernel.org/r/20250807155749.164481-1-yili@winhong.com
Signed-off-by: yili <yili@winhong.com>
Reviewed-by: Joel Becker <jlbec@evilplan.org>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 17:32:49 -07:00
Dan Carpenter
171c047289 ocfs2: remove unnecessary NULL check in ocfs2_grab_folios()
Smatch complains that checking "folios" for NULL doesn't make sense
because it has already been dereferenced at this point.  Really passing a
NULL "folios" pointer to ocfs2_grab_folios() doesn't make sense, and
fortunately none of the callers do that.  Delete the unnecessary NULL
check.

Link: https://lkml.kernel.org/r/aKRG39hyvDJcN2G7@stanley.mountain
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Mark Tinguely <mark.tinguely@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 17:32:49 -07:00
zhoumin
228bf041a7 vfat: remove unused variable
Remove unused variable definition and related function definition and
redundant variable assignments within functions.

Link: https://lkml.kernel.org/r/tencent_9DE7CC9367096503F6ADD2BD960079267406@qq.com
Signed-off-by: zhoumin <teczm@foxmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 17:32:47 -07:00
Thorsten Blum
9a0ee378eb ocfs2: remove commented out mlog() statements
The mlog() statements have been commented out ever since commit
6714d8e86b ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") -
remove them.

Link: https://lkml.kernel.org/r/20250814113815.219064-1-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Mark Tinguely <mark.tinguely@oracle.com>
Reviewed-by: Mark Fasheh <mark@fasheh.com>
Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Cc: Dmitry Antipov <dmantipov@yandex.ru>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Thorsten Blum <thorsten.blum@linux.dev>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 17:32:46 -07:00
Tetsuo Handa
bc107a619f squashfs: verify inode mode when loading from disk
The inode mode loaded from corrupted disk might by error contain the file
type bits.  Since the file type bits are set by squashfs_read_inode()
using bitwise OR, the file type bits must not be set by
squashfs_new_inode() from squashfs_read_inode(); otherwise, an invalid
file type bits later confuses may_open().

Link: https://lkml.kernel.org/r/f63d8d11-2254-4fc3-9292-9a43a93b374e@I-love.SAKURA.ne.jp
Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 17:32:46 -07:00
Linus Torvalds
f83a4f2a4d Merge tag 'erofs-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:

 - Fix invalid algorithm dereference in encoded extents

 - Add missing dax_break_layout_final(), since recent FSDAX fixes
   didn't cover EROFS

 - Arrange long xattr name prefixes more properly

* tag 'erofs-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix long xattr name prefix placement
  erofs: fix runtime warning on truncate_folio_batch_exceptionals()
  erofs: fix invalid algorithm for encoded extents
2025-09-13 17:16:52 -07:00
Boris Burkov
b55102826d btrfs: set AS_KERNEL_FILE on the btree_inode
extent_buffers are global and shared so their pages should not belong to
any particular cgroup (currently whichever cgroups happens to allocate the
extent_buffer).

Btrfs tree operations should not arbitrarily block on cgroup reclaim or
have the shared extent_buffer pages on a cgroup's reclaim lists.

Link: https://lkml.kernel.org/r/2ee99832619a3fdfe80bf4dc9760278662d2d746.1755812945.git.boris@bur.io
Signed-off-by: Boris Burkov <boris@bur.io>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Tested-by: syzbot@syzkaller.appspotmail.com
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Qu Wenruo <wqu@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:55:20 -07:00
Miaohe Lin
c090868f59 Revert "hugetlb: make hugetlb depends on SYSFS or SYSCTL"
Commit f8142cf94d ("hugetlb: make hugetlb depends on SYSFS or SYSCTL")
added dependency on SYSFS or SYSCTL but hugetlb can be used without SYSFS
or SYSCTL.  So this dependency is wrong and should be removed.

For users with CONFIG_SYSFS or CONFIG_SYSCTL on, there should be no
difference.  For users have CONFIG_SYSFS and CONFIG_SYSCTL both
undefined, hugetlbfs can still works perfectly well through cmdline
except a possible kismet warning[1] when select CONFIG_HUGETLBFS. 
IMHO, it might not worth a backport.

This reverts commit f8142cf94d.  It
overlooked the scenario of using hugetlb through boot parameters when
it was submitted.

Link: https://lkml.kernel.org/r/20250826030955.2898709-1-linmiaohe@huawei.com
Link: https://lore.kernel.org/all/5c99458f-4a91-485f-8a35-3618a992e2e4@csgroup.eu/ [1]
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508222032.bwJsQPZ1-lkp@intel.com/
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:55:20 -07:00
Christoph Hellwig
e34b21ba15 bcachefs: stop using write_cache_pages
Stop using the obsolete write_cache_pages and use writeback_iter directly.
This basically just open codes write_cache_pages without the indirect
call, but there's probably ways to structure the code even nicer as a
follow on.

Link: https://lkml.kernel.org/r/20250818061017.1526853-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:55:14 -07:00
Christoph Hellwig
8d4bb46ba7 ntfs3: stop using write_cache_pages
Patch series "remove write_cache_pages()".

Kill off write_cache_pages() after converting the last two users to the
iterator.


This patch (of 3):

Stop using the obsolete write_cache_pages and use writeback_iter directly.

Link: https://lkml.kernel.org/r/20250818061017.1526853-1-hch@lst.de
Link: https://lkml.kernel.org/r/20250818061017.1526853-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:55:13 -07:00
Matthew Wilcox (Oracle)
53fbef56e0 mm: introduce memdesc_flags_t
Patch series "Add and use memdesc_flags_t".

At some point struct page will be separated from struct slab and struct
folio.  This is a step towards that by introducing a type for the 'flags'
word of all three structures.  This gives us a certain amount of type
safety by establishing that some of these unsigned longs are different
from other unsigned longs in that they contain things like node ID,
section number and zone number in the upper bits.  That lets us have
functions that can be easily called by anyone who has a slab, folio or
page (but not easily by anyone else) to get the node or zone.

There's going to be some unusual merge problems with this as some odd bits
of the kernel decide they want to print out the flags value or something
similar by writing page->flags and now they'll need to write page->flags.f
instead.  That's most of the churn here.  Maybe we should be removing
these things from the debug output?


This patch (of 11):

Wrap the unsigned long flags in a typedef.  In upcoming patches, this will
provide a strong hint that you can't just pass a random unsigned long to
functions which take this as an argument.

[willy@infradead.org: s/flags/flags.f/ in several architectures]
  Link: https://lkml.kernel.org/r/aKMgPRLD-WnkPxYm@casper.infradead.org
[nicola.vetrini@gmail.com: mips: fix compilation error]
  Link: https://lore.kernel.org/lkml/CA+G9fYvkpmqGr6wjBNHY=dRp71PLCoi2341JxOudi60yqaeUdg@mail.gmail.com/
  Link: https://lkml.kernel.org/r/20250825214245.1838158-1-nicola.vetrini@gmail.com
Link: https://lkml.kernel.org/r/20250805172307.1302730-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20250805172307.1302730-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:55:07 -07:00
David Hildenbrand
1f1c061089 mm/huge_memory: convert "tva_flags" to "enum tva_type"
When determining which THP orders are eligible for a VMA mapping, we have
previously specified tva_flags, however it turns out it is really not
necessary to treat these as flags.

Rather, we distinguish between distinct modes.

The only case where we previously combined flags was with
TVA_ENFORCE_SYSFS, but we can avoid this by observing that this is the
default, except for MADV_COLLAPSE or an edge cases in
collapse_pte_mapped_thp() and hugepage_vma_revalidate(), and adding a mode
specifically for this case - TVA_FORCED_COLLAPSE.

We have:
* smaps handling for showing "THPeligible"
* Pagefault handling
* khugepaged handling
* Forced collapse handling: primarily MADV_COLLAPSE, but also for
  an edge case in collapse_pte_mapped_thp()

Disregarding the edge cases, we only want to ignore sysfs settings only
when we are forcing a collapse through MADV_COLLAPSE, otherwise we want to
enforce it, hence this patch does the following flag to enum conversions:

* TVA_SMAPS | TVA_ENFORCE_SYSFS -> TVA_SMAPS
* TVA_IN_PF | TVA_ENFORCE_SYSFS -> TVA_PAGEFAULT
* TVA_ENFORCE_SYSFS             -> TVA_KHUGEPAGED
* 0                             -> TVA_FORCED_COLLAPSE

With this change, we immediately know if we are in the forced collapse
case, which will be valuable next.

Link: https://lkml.kernel.org/r/20250815135549.130506-3-usamaarif642@gmail.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: Usama Arif <usamaarif642@gmail.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yafang <laoar.shao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:55:05 -07:00
David Hildenbrand
9dc21bbd62 prctl: extend PR_SET_THP_DISABLE to optionally exclude VM_HUGEPAGE
Patch series "prctl: extend PR_SET_THP_DISABLE to only provide THPs when
advised", v5.

This will allow individual processes to opt-out of THP = "always" into THP
= "madvise", without affecting other workloads on the system.  This has
been extensively discussed on the mailing list and has been summarized
very well by David in the first patch which also includes the links to
alternatives, please refer to the first patch commit message for the
motivation for this series.

Patch 1 adds the PR_THP_DISABLE_EXCEPT_ADVISED flag to implement this,
along with the MMF changes.

Patch 2 is a cleanup patch for tva_flags that will allow the forced
collapse case to be transmitted to vma_thp_disabled (which is done in
patch 3).

Patch 4 adds documentation for PR_SET_THP_DISABLE/PR_GET_THP_DISABLE.

Patches 6-7 implement the selftests for PR_SET_THP_DISABLE for completely
disabling THPs (old behaviour) and only enabling it at advise
(PR_THP_DISABLE_EXCEPT_ADVISED).


This patch (of 7):

People want to make use of more THPs, for example, moving from the "never"
system policy to "madvise", or from "madvise" to "always".

While this is great news for every THP desperately waiting to get
allocated out there, apparently there are some workloads that require a
bit of care during that transition: individual processes may need to
opt-out from this behavior for various reasons, and this should be
permitted without needing to make all other workloads on the system
similarly opt-out.

The following scenarios are imaginable:

(1) Switch from "none" system policy to "madvise"/"always", but keep THPs
    disabled for selected workloads.

(2) Stay at "none" system policy, but enable THPs for selected
    workloads, making only these workloads use the "madvise" or "always"
    policy.

(3) Switch from "madvise" system policy to "always", but keep the
    "madvise" policy for selected workloads: allocate THPs only when
    advised.

(4) Stay at "madvise" system policy, but enable THPs even when not advised
    for selected workloads -- "always" policy.

Once can emulate (2) through (1), by setting the system policy to
"madvise"/"always" while disabling THPs for all processes that don't want
THPs.  It requires configuring all workloads, but that is a user-space
problem to sort out.

(4) can be emulated through (3) in a similar way.

Back when (1) was relevant in the past, as people started enabling THPs,
we added PR_SET_THP_DISABLE, so relevant workloads that were not ready yet
(i.e., used by Redis) were able to just disable THPs completely.  Redis
still implements the option to use this interface to disable THPs
completely.

With PR_SET_THP_DISABLE, we added a way to force-disable THPs for a
workload -- a process, including fork+exec'ed process hierarchy.  That
essentially made us support (1): simply disable THPs for all workloads
that are not ready for THPs yet, while still enabling THPs system-wide.

The quest for handling (3) and (4) started, but current approaches
(completely new prctl, options to set other policies per process,
alternatives to prctl -- mctrl, cgroup handling) don't look particularly
promising.  Likely, the future will use bpf or something similar to
implement better policies, in particular to also make better decisions
about THP sizes to use, but this will certainly take a while as that work
just started.

Long story short: a simple enable/disable is not really suitable for the
future, so we're not willing to add completely new toggles.

While we could emulate (3)+(4) through (1)+(2) by simply disabling THPs
completely for these processes, this is a step backwards, because these
processes can no longer allocate THPs in regions where THPs were
explicitly advised: regions flagged as VM_HUGEPAGE.  Apparently, that
imposes a problem for relevant workloads, because "not THPs" is certainly
worse than "THPs only when advised".

Could we simply relax PR_SET_THP_DISABLE, to "disable THPs unless not
explicitly advised by the app through MAD_HUGEPAGE"?  *maybe*, but this
would change the documented semantics quite a bit, and the versatility to
use it for debugging purposes, so I am not 100% sure that is what we want
-- although it would certainly be much easier.

So instead, as an easy way forward for (3) and (4), add an option to
make PR_SET_THP_DISABLE disable *less* THPs for a process.

In essence, this patch:

(A) Adds PR_THP_DISABLE_EXCEPT_ADVISED, to be used as a flag in arg3
    of prctl(PR_SET_THP_DISABLE) when disabling THPs (arg2 != 0).

    prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED).

(B) Makes prctl(PR_GET_THP_DISABLE) return 3 if
    PR_THP_DISABLE_EXCEPT_ADVISED was set while disabling.

    Previously, it would return 1 if THPs were disabled completely. Now
    it returns the set flags as well: 3 if PR_THP_DISABLE_EXCEPT_ADVISED
    was set.

(C) Renames MMF_DISABLE_THP to MMF_DISABLE_THP_COMPLETELY, to express
    the semantics clearly.

    Fortunately, there are only two instances outside of prctl() code.

(D) Adds MMF_DISABLE_THP_EXCEPT_ADVISED to express "no THP except for VMAs
    with VM_HUGEPAGE" -- essentially "thp=madvise" behavior

    Fortunately, we only have to extend vma_thp_disabled().

(E) Indicates "THP_enabled: 0" in /proc/pid/status only if THPs are
    disabled completely

    Only indicating that THPs are disabled when they are really disabled
    completely, not only partially.

    For now, we don't add another interface to obtained whether THPs
    are disabled partially (PR_THP_DISABLE_EXCEPT_ADVISED was set). If
    ever required, we could add a new entry.

The documented semantics in the man page for PR_SET_THP_DISABLE "is
inherited by a child created via fork(2) and is preserved across
execve(2)" is maintained.  This behavior, for example, allows for
disabling THPs for a workload through the launching process (e.g., systemd
where we fork() a helper process to then exec()).

For now, MADV_COLLAPSE will *fail* in regions without VM_HUGEPAGE and
VM_NOHUGEPAGE.  As MADV_COLLAPSE is a clear advise that user space thinks
a THP is a good idea, we'll enable that separately next (requiring a bit
of cleanup first).

There is currently not way to prevent that a process will not issue
PR_SET_THP_DISABLE itself to re-enable THP.  There are not really known
users for re-enabling it, and it's against the purpose of the original
interface.  So if ever required, we could investigate just forbidding to
re-enable them, or make this somehow configurable.

Link: https://lkml.kernel.org/r/20250815135549.130506-1-usamaarif642@gmail.com
Link: https://lkml.kernel.org/r/20250815135549.130506-2-usamaarif642@gmail.com
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Usama Arif <usamaarif642@gmail.com>
Tested-by: Usama Arif <usamaarif642@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yafang <laoar.shao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:55:05 -07:00
Lorenzo Stoakes
d14d3f535e mm: convert remaining users to mm_flags_*() accessors
As part of the effort to move to mm->flags becoming a bitmap field,
convert existing users to making use of the mm_flags_*() accessors which
will, when the conversion is complete, be the only means of accessing
mm_struct flags.

No functional change intended.

Link: https://lkml.kernel.org/r/cc67a56f9a8746a8ec7d9791853dc892c1c33e0b.1755012943.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Namhyung kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:54:58 -07:00
Lorenzo Stoakes
39f8049cd4 mm: update coredump logic to correctly use bitmap mm flags
The coredump logic is slightly different from other users in that it both
stores mm flags and additionally sets and gets using masks.

Since the MMF_DUMPABLE_* flags must remain as they are for uABI reasons,
and of course these are within the first 32-bits of the flags, it is
reasonable to provide access to these in the same fashion so this logic
can all still keep working as it has been.

Therefore, introduce coredump-specific helpers __mm_flags_get_dumpable()
and __mm_flags_set_mask_dumpable() for this purpose, and update all core
dump users of mm flags to use these.

[lorenzo.stoakes@oracle.com: abstract set_mask_bits() invocation to mm_types.h to satisfy ARC]
  Link: https://lkml.kernel.org/r/0e7ad263-1ff7-446d-81fe-97cff9c0e7ed@lucifer.local
Link: https://lkml.kernel.org/r/2a5075f7e3c5b367d988178c79a3063d12ee53a9.1755012943.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Namhyung kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:54:57 -07:00
David Hildenbrand
b0f86aaebe fs/dax: use vmf_insert_folio_pmd() to insert the huge zero folio
Let's convert to vmf_insert_folio_pmd().

There is a theoretical change in behavior: in the unlikely case there is
already something mapped, we'll now still call trace_dax_pmd_load_hole()
and return VM_FAULT_NOPAGE.

Previously, we would have returned VM_FAULT_FALLBACK, and the caller would
have zapped the PMD to try a PTE fault.

However, that behavior was different to other PTE+PMD faults, when there
would already be something mapped, and it's not even clear if it could be
triggered.

Assuming the huge zero folio is already mapped, all good, no need to
fallback to PTEs.

Assuming there is already a leaf page table ...  the behavior would be
just like when trying to insert a PMD mapping a folio through
dax_fault_iter()->vmf_insert_folio_pmd().

Assuming there is already something else mapped as PMD?  It sounds like a
BUG, and the behavior would be just like when trying to insert a PMD
mapping a folio through dax_fault_iter()->vmf_insert_folio_pmd().

So, it sounds reasonable to not handle huge zero folios differently to
inserting PMDs mapping folios when there already is something mapped.

Link: https://lkml.kernel.org/r/20250811112631.759341-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Juegren Gross <jgross@suse.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:54:51 -07:00
David Hildenbrand
fb49a4425c treewide: remove MIGRATEPAGE_SUCCESS
At this point MIGRATEPAGE_SUCCESS is misnamed for all folio users,
and now that we remove MIGRATEPAGE_UNMAP, it's really the only "success"
return value that the code uses and expects.

Let's just get rid of MIGRATEPAGE_SUCCESS completely and just use "0"
for success.

Link: https://lkml.kernel.org/r/20250811143949.1117439-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>			[mm]
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>	[jfs]
Acked-by: David Sterba <dsterba@suse.com>		[btrfs]
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Byungchul Park <byungchul@sk.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:54:50 -07:00
Suren Baghdasaryan
d9d1c2d817 fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under per-vma locks
Utilize per-vma locks to stabilize vma after lookup without taking
mmap_lock during PROCMAP_QUERY ioctl execution. If vma lock is
contended, we fall back to mmap_lock but take it only momentarily
to lock the vma and release the mmap_lock. In a very unlikely case
of vm_refcnt overflow, this fall back path will fail and ioctl is
done under mmap_lock protection.

This change is designed to reduce mmap_lock contention and prevent
PROCMAP_QUERY ioctl calls from blocking address space updates.

Link: https://lkml.kernel.org/r/20250808152850.2580887-4-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Ye Bin <yebin10@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:54:48 -07:00
Suren Baghdasaryan
ee737a5a10 fs/proc/task_mmu: factor out proc_maps_private fields used by PROCMAP_QUERY
Refactor struct proc_maps_private so that the fields used by PROCMAP_QUERY
ioctl are moved into a separate structure. In the next patch this allows
ioctl to reuse some of the functions used for reading /proc/pid/maps
without using file->private_data. This prevents concurrent modification
of file->private_data members by ioctl and /proc/pid/maps readers.

The change is pure code refactoring and has no functional changes.

Link: https://lkml.kernel.org/r/20250808152850.2580887-3-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Ye Bin <yebin10@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:54:48 -07:00
Vitaly Wool
2cd8231796 mm/slub: allow to set node and align in k[v]realloc
Reimplement k[v]realloc_node() to be able to set node and alignment should
a user need to do so.  In order to do that while retaining the maximal
backward compatibility, add k[v]realloc_node_align() functions and
redefine the rest of API using these new ones.

While doing that, we also keep the number of _noprof variants to a
minimum, which implies some changes to the existing users of older _noprof
functions, that basically being bcachefs.

With that change we also provide the ability for the Rust part of the
kernel to set node and alignment in its K[v]xxx [re]allocations.

Link: https://lkml.kernel.org/r/20250806124147.1724658-1-vitaly.wool@konsulko.se
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jann Horn <jannh@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:54:45 -07:00
Nathan Chancellor
025e87f8ea nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/*
When accessing one of the files under /sys/fs/nilfs2/features when
CONFIG_CFI_CLANG is enabled, there is a CFI violation:

  CFI failure at kobj_attr_show+0x59/0x80 (target: nilfs_feature_revision_show+0x0/0x30; expected type: 0xfc392c4d)
  ...
  Call Trace:
   <TASK>
   sysfs_kf_seq_show+0x2a6/0x390
   ? __cfi_kobj_attr_show+0x10/0x10
   kernfs_seq_show+0x104/0x15b
   seq_read_iter+0x580/0xe2b
  ...

When the kobject of the kset for /sys/fs/nilfs2 is initialized, its ktype
is set to kset_ktype, which has a ->sysfs_ops of kobj_sysfs_ops.  When
nilfs_feature_attr_group is added to that kobject via
sysfs_create_group(), the kernfs_ops of each files is sysfs_file_kfops_rw,
which will call sysfs_kf_seq_show() when ->seq_show() is called. 
sysfs_kf_seq_show() in turn calls kobj_attr_show() through
->sysfs_ops->show().  kobj_attr_show() casts the provided attribute out to
a 'struct kobj_attribute' via container_of() and calls ->show(), resulting
in the CFI violation since neither nilfs_feature_revision_show() nor
nilfs_feature_README_show() match the prototype of ->show() in 'struct
kobj_attribute'.

Resolve the CFI violation by adjusting the second parameter in
nilfs_feature_{revision,README}_show() from 'struct attribute' to 'struct
kobj_attribute' to match the expected prototype.

Link: https://lkml.kernel.org/r/20250906144410.22511-1-konishi.ryusuke@gmail.com
Fixes: aebe17f684 ("nilfs2: add /sys/fs/nilfs2/features group")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202509021646.bc78d9ef-lkp@intel.com/
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 13:05:38 -07:00
Linus Torvalds
5cd64d4f92 Merge tag 'ceph-for-6.17-rc6' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
 "A fix for a race condition around r_parent tracking that took a long
  time to track down from Alex and some fixes for potential crashes on
  accessing invalid memory from Max and myself.

  All marked for stable"

* tag 'ceph-for-6.17-rc6' of https://github.com/ceph/ceph-client:
  libceph: fix invalid accesses to ceph_connection_v1_info
  ceph: fix crash after fscrypt_encrypt_pagecache_blocks() error
  ceph: always call ceph_shift_unused_folios_left()
  ceph: fix race condition where r_parent becomes stale before sending message
  ceph: fix race condition validating r_parent before applying state
2025-09-13 10:45:11 -07:00
Linus Torvalds
b891d11b74 Merge tag 'driver-core-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:

 - Fix UAF in cgroup pressure polling by using kernfs_get_active_of()
   to prevent operations on released file descriptors

 - Fix unresolved intra-doc link in the documentation of struct Device
   when CONFIG_DRM != y

 - Update the DMA Rust MAINTAINERS entry

* tag 'driver-core-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  MAINTAINERS: Update the DMA Rust entry
  kernfs: Fix UAF in polling when open file is released
  rust: device: fix unresolved link to drm::Device
2025-09-13 10:36:06 -07:00
Linus Torvalds
cb780b79b2 Merge tag 'v6.17-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
 "Two smb3 client fixes, both for stable:

   - Fix encryption problem with multiple compounded ops

   - Fix rename error cases that could lead to data corruption"

* tag 'v6.17-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: fix data loss due to broken rename(2)
  smb: client: fix compound alignment with encryption
2025-09-12 09:03:01 -07:00
Andreas Gruenbacher
28c4d9bc07 gfs2: Fix unlikely race in gdlm_put_lock
In gdlm_put_lock(), there is a small window of time in which the
DFL_UNMOUNT flag has been set but the lockspace hasn't been released,
yet.  In that window, dlm may still call gdlm_ast() and gdlm_bast().
To prevent it from dereferencing freed glock objects, only free the
glock if the lockspace has actually been released.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:03:01 +02:00
Andreas Gruenbacher
6ab26555c9 gfs2: Add proper lockspace locking
GFS2 has been calling functions like dlm_lock() even after the lockspace
that these functions operate on has been released with
dlm_release_lockspace().  It has always assumed that those functions
would return -EINVAL in that case, but that was never guaranteed, and it
certainly is no longer the case since commit 4db41bf4f0 ("dlm: remove
ls_local_handle from struct dlm_ls").

To fix that, add proper lockspace locking.

Fixes: 3e11e53041 ("GFS2: ignore unlock failures after withdraw")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:57 +02:00
Andreas Gruenbacher
47faf937da gfs2: Minor run_queue fixes
Provide a better description of why the GLF_DEMOTE_IN_PROGRESS flag
cannot be set.

Function do_xmote() may block, so make sure it isn't called when
nonblock is true.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:53 +02:00
Andreas Gruenbacher
cd493dcf4f gfs2: run_queue cleanup
Transform the code in run_queue() to make it more readable.  No change
in functionality.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:50 +02:00
Andreas Gruenbacher
2045364497 gfs2: Simplify do_promote
While not immediately obvious, do_promote() returns whether or not there
are any active holders in the queue.  But the function description is
confusing, and this information is easy to come by for callers anyway,
so turn do_promote() into a void function.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:45 +02:00
Andreas Gruenbacher
bddb53b776 gfs2: Get rid of GLF_INVALIDATE_IN_PROGRESS
Get rid of the GLF_INVALIDATE_IN_PROGRESS flag: it was originally used
to indicate to add_to_queue() that the ->go_sync() and ->go_invalid()
operations were in progress, but as we have established in commit "gfs2:
Fix LM_FLAG_TRY* logic in add_to_queue", add_to_queue() has no need to
know.

Commit d99724c3c3 describes a race in which GLF_INVALIDATE_IN_PROGRESS
is used to serialize two processes which are both in do_xmote() at the
same time.  That analysis is wrong: the serialization happens via the
GLF_LOCK flag, which ensures that at most one glock operation can be
active at any time.

Fixes: d99724c3c3 ("gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:41 +02:00
Andreas Gruenbacher
061df28b82 gfs2: Fix GLF_INVALIDATE_IN_PROGRESS flag clearing in do_xmote
Commit 865cc3e9cc ("gfs2: fix a deadlock on withdraw-during-mount")
added a statement to do_xmote() to clear the GLF_INVALIDATE_IN_PROGRESS
flag a second time after it has already been cleared.  Fix that.

Fixes: 865cc3e9cc ("gfs2: fix a deadlock on withdraw-during-mount")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:38 +02:00
Andreas Gruenbacher
9b54770b68 gfs2: Remove duplicate check in do_xmote
In do_xmote(), remove the duplicate check for the ->go_sync and
->go_inval glock operations.  They are either both defined, or none of
them are.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:31 +02:00
Andreas Gruenbacher
0c23e24164 gfs2: Fix LM_FLAG_TRY* logic in add_to_queue
The logic in add_to_queue() for determining whether a LM_FLAG_TRY or
LM_FLAG_TRY_1CB holder should be queued does not make any sense: we are
interested in wether or not the new operation will block behind an
existing or future holder in the queue, but the current code checks for
ongoing locking or ->go_inval() operations, which has little to do with
that.

Replace that code with something more sensible, remove the incorrect
add_to_queue() function annotations, remove the similarly misguided
do_error(gl, 0) call in do_xmote(), and add a missing comment to the
same call in do_promote().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:28 +02:00
Andreas Gruenbacher
2b813a7288 gfs2: Remove DLM_LKF_ALTCW / DLM_LKF_ALTPR code
Commit 6802e3400f ("[GFS2] Clean up the glock core") stopped passing
the LM_FLAG_ANY flag down to gdlm_lock() (then gfs2_lm_lock()).  Since
then, gfs2 effectively hasn't been using dlm's DLM_LKF_ALTCW /
DLM_LKF_ALTPR flags, but the code still suggests that it does.  Recent
testing shows that those flags don't even work reliably anymore, so
instead of fixing code that hasn't been used since 2008, remove it.

In addition, clean up how the flags are passed to [gd]lm_lock().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:25 +02:00
Andreas Gruenbacher
fd70ab7155 gfs2: Further sanitize lock_dlm.c
The gl_req field and GLF_BLOCKING flag are only relevant to gdlm_lock(),
its callback gdlm_ast(), and their helpers, so set and clear them inside
lock_dlm.c.

Also, the LM_FLAG_ANY flag is relevant to gdlm_lock(), but do_xmote()
doesn't pass that flag down to gdlm_lock() as it should.  Fix that by
passing down all the flags.

In addition, document the effect of the LM_FLAG_ANY flag on locks held
in EX mode locally.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:20 +02:00
Andreas Gruenbacher
cd71804664 gfs2: Do not use atomic operations unnecessarily
The GLF_DEMOTE_IN_PROGRESS and GLF_LOCK flags and the glock refcount are
all protected by the glock spin lock, so there is no need for atomic
operations / barriers here.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:18 +02:00
Andreas Gruenbacher
13c0004168 gfs2: Sanitize gfs2_meta_check, gfs2_metatype_check, gfs2_io_error
Change those functions to either return a useful value, or nothing at
all.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:15 +02:00
Andreas Gruenbacher
6e42240826 gfs2: Turn gfs2_withdraw into a void function
Since commit 601ef0d52e ("gfs2: Force withdraw to replay journals and
wait for it to finish"), gfs2_withdraw() always returns -1, so turn it
into a straight void function.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:12 +02:00
Andreas Gruenbacher
418c854759 gfs2: Partially revert "gfs2: do_xmote fixes"
When the lm_lock hook which calls dlm_lock() returns an error,
do_xmote() previously reported the error to the syslog ("lm_lock ret
%d\n") but otherwise ignored it during withdraws.  Commit 9947a06d29
("gfs2: do_xmote fixes") changed that to pass the error on to the glock
layer, but the error would then only result in an unconditional BUG() in
finish_xmote(), which doesn't help.  Instead, revert to the previous
behavior.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:08 +02:00
Andreas Gruenbacher
4250e683de gfs2: Simplify refcounting in do_xmote
In do_xmote(), take the additional glock references close to where those
references are needed.  This will simplify the next commit.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:06 +02:00
Andreas Gruenbacher
2309a01351 gfs2: do_xmote cleanup
Check for asynchronous completion and clear the GLF_PENDING_REPLY flag
earlier in do_xmote().  This will make future changes more readable.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:02:02 +02:00
Colin Ian King
aa94ad9ab2 gfs2: Remove space before newline
There is an extraneous space before a newline in a fs_err message.
Remove it

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:01:58 +02:00
Andreas Gruenbacher
37b1c0f120 gfs2: Remove unused sd_withdraw_wait field
This field was introduced in commit 601ef0d52e ("gfs2: Force withdraw
to replay journals and wait for it to finish") and never used.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:01:54 +02:00
Andreas Gruenbacher
60c6273131 gfs2: Remove unused GIF_FREE_VFS_INODE flag
Since commit 38552ff676 ("gfs2: Fix and clean up create / evict
interaction"), the GIF_FREE_VFS_INODE flag isn't used anymore.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2025-09-12 12:01:45 +02:00
Gao Xiang
1fcf686def erofs: fix long xattr name prefix placement
Currently, xattr name prefixes are forcibly placed into the packed
inode if the fragments feature is enabled, and users have no option
to put them in plain form directly on disk.

This is inflexible. First, as mentioned above, users should be able
to store unwrapped long xattr name prefixes unconditionally
(COMPAT_PLAIN_XATTR_PFX). Second, since we now have the new metabox
inode to store metadata, it should be used when available instead
of the packed inode.

Fixes: 414091322c ("erofs: implement metadata compression")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-09-12 03:37:07 +08:00
Linus Torvalds
b10c31b70b Merge tag 'for-6.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - fix delayed inode tracking in xarray, eviction can race with
   insertion and leave behind a disconnected inode

 - on systems with large page (64K) and small block size (4K) fix
   compression read that can return partially filled folio

 - slightly relax compression option format for backward compatibility,
   allow to specify level for LZO although there's only one

 - fix simple quota accounting of compressed extents

 - validate minimum device size in 'device add'

 - update maintainers' entry

* tag 'for-6.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: don't allow adding block device of less than 1 MB
  MAINTAINERS: update btrfs entry
  btrfs: fix subvolume deletion lockup caused by inodes xarray race
  btrfs: fix corruption reading compressed range when block size is smaller than page size
  btrfs: accept and ignore compression level for lzo
  btrfs: fix squota compressed stats leak
2025-09-11 08:01:18 -07:00
Miklos Szeredi
b8cf8fda52 fanotify: add watchdog for permission events
This is to make it easier to debug issues with AV software, which time and
again deadlocks with no indication of where the issue comes from, and the
kernel being blamed for the deadlock.  Then we need to analyze dumps to
prove that the kernel is not in fact at fault.

The deadlock comes from recursion: handling the event triggers another
permission event, in some roundabout way, obviously, otherwise it would
have been found in testing.

With this patch a warning is printed when permission event is received by
userspace but not answered for more than the timeout specified in
/proc/sys/fs/fanotify/watchdog_timeout.  The watchdog can be turned off by
setting the timeout to zero (which is the default).

The timeout is very coarse (T <= t < 2T) but I guess it's good enough for
the purpose.

Overhead should be minimal.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20250909143053.112171-1-mszeredi@redhat.com
Signed-off-by: Jan Kara <jack@suse.cz>
2025-09-11 16:34:50 +02:00
Linus Torvalds
4f553c1e2c Merge tag 'mm-hotfixes-stable-2025-09-10-20-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "20 hotfixes. 15 are cc:stable and the remainder address post-6.16
  issues or aren't considered necessary for -stable kernels. 14 of these
  fixes are for MM.

  This includes

   - kexec fixes from Breno for a recently introduced
     use-uninitialized bug

   - DAMON fixes from Quanmin Yan to avoid div-by-zero crashes
     which can occur if the operator uses poorly-chosen insmod
     parameters

   and misc singleton fixes"

* tag 'mm-hotfixes-stable-2025-09-10-20-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  MAINTAINERS: add tree entry to numa memblocks and emulation block
  mm/damon/sysfs: fix use-after-free in state_show()
  proc: fix type confusion in pde_set_flags()
  compiler-clang.h: define __SANITIZE_*__ macros only when undefined
  mm/vmalloc, mm/kasan: respect gfp mask in kasan_populate_vmalloc()
  ocfs2: fix recursive semaphore deadlock in fiemap call
  mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory
  mm/mremap: fix regression in vrm->new_addr check
  percpu: fix race on alloc failed warning limit
  mm/memory-failure: fix redundant updates for already poisoned pages
  s390: kexec: initialize kexec_buf struct
  riscv: kexec: initialize kexec_buf struct
  arm64: kexec: initialize kexec_buf struct in load_other_segments()
  mm/damon/reclaim: avoid divide-by-zero in damon_reclaim_apply_parameters()
  mm/damon/lru_sort: avoid divide-by-zero in damon_lru_sort_apply_parameters()
  mm/damon/core: set quota->charged_from to jiffies at first charge window
  mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range()
  init/main.c: fix boot time tracing crash
  mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range()
  mm/khugepaged: fix the address passed to notifier on testing young
2025-09-10 21:19:34 -07:00
Linus Torvalds
7aac71907b Merge tag 'nfs-for-6.17-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
 "Stable patches:

   - Revert "SUNRPC: Don't allow waiting for exiting tasks" as it is
     breaking ltp tests

  Bugfixes:

   - Another set of fixes to the tracking of NFSv4 server capabilities
     when crossing filesystem boundaries

   - Localio fix to restore credentials and prevent triggering a
     BUG_ON()

   - Fix to prevent flapping of the localio on/off trigger

   - Protections against 'eof page pollution' as demonstrated in
     xfstests generic/363

   - Series of patches to ensure correct ordering of O_DIRECT i/o and
     truncate, fallocate and copy functions

   - Fix a NULL pointer check in flexfiles reads that regresses 6.17

   - Correct a typo that breaks flexfiles layout segment processing"

* tag 'nfs-for-6.17-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4/flexfiles: Fix layout merge mirror check.
  SUNRPC: call xs_sock_process_cmsg for all cmsg
  Revert "SUNRPC: Don't allow waiting for exiting tasks"
  NFS: Fix the marking of the folio as up to date
  NFS: nfs_invalidate_folio() must observe the offset and size arguments
  NFSv4.2: Serialise O_DIRECT i/o and copy range
  NFSv4.2: Serialise O_DIRECT i/o and clone range
  NFSv4.2: Serialise O_DIRECT i/o and fallocate()
  NFS: Serialise O_DIRECT i/o and truncate()
  NFSv4.2: Protect copy offload and clone against 'eof page pollution'
  NFS: Protect against 'eof page pollution'
  flexfiles/pNFS: fix NULL checks on result of ff_layout_choose_ds_for_read
  nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
  nfs/localio: restore creds before releasing pageio data
  NFSv4: Clear the NFS_CAP_XATTR flag if not supported by the server
  NFSv4: Clear NFS_CAP_OPEN_XOR and NFS_CAP_DELEGTIME if not supported
  NFSv4: Clear the NFS_CAP_FS_LOCATIONS flag if it is not set
  NFSv4: Don't clear capabilities that won't be reset
2025-09-10 12:38:41 -07:00