Commit Graph

1296158 Commits

Author SHA1 Message Date
Huang Ying
b4afe4183e resource: fix region_intersects() vs add_memory_driver_managed()
On a system with CXL memory, the resource tree (/proc/iomem) related to
CXL memory may look like something as follows.

490000000-50fffffff : CXL Window 0
  490000000-50fffffff : region0
    490000000-50fffffff : dax0.0
      490000000-50fffffff : System RAM (kmem)

Because drivers/dax/kmem.c calls add_memory_driver_managed() during
onlining CXL memory, which makes "System RAM (kmem)" a descendant of "CXL
Window X".  This confuses region_intersects(), which expects all "System
RAM" resources to be at the top level of iomem_resource.  This can lead to
bugs.

For example, when the following command line is executed to write some
memory in CXL memory range via /dev/mem,

 $ dd if=data of=/dev/mem bs=$((1 << 10)) seek=$((0x490000000 >> 10)) count=1
 dd: error writing '/dev/mem': Bad address
 1+0 records in
 0+0 records out
 0 bytes copied, 0.0283507 s, 0.0 kB/s

the command fails as expected.  However, the error code is wrong.  It
should be "Operation not permitted" instead of "Bad address".  More
seriously, the /dev/mem permission checking in devmem_is_allowed() passes
incorrectly.  Although the accessing is prevented later because ioremap()
isn't allowed to map system RAM, it is a potential security issue.  During
command executing, the following warning is reported in the kernel log for
calling ioremap() on system RAM.

 ioremap on RAM at 0x0000000490000000 - 0x0000000490000fff
 WARNING: CPU: 2 PID: 416 at arch/x86/mm/ioremap.c:216 __ioremap_caller.constprop.0+0x131/0x35d
 Call Trace:
  memremap+0xcb/0x184
  xlate_dev_mem_ptr+0x25/0x2f
  write_mem+0x94/0xfb
  vfs_write+0x128/0x26d
  ksys_write+0xac/0xfe
  do_syscall_64+0x9a/0xfd
  entry_SYSCALL_64_after_hwframe+0x4b/0x53

The details of command execution process are as follows.  In the above
resource tree, "System RAM" is a descendant of "CXL Window 0" instead of a
top level resource.  So, region_intersects() will report no System RAM
resources in the CXL memory region incorrectly, because it only checks the
top level resources.  Consequently, devmem_is_allowed() will return 1
(allow access via /dev/mem) for CXL memory region incorrectly. 
Fortunately, ioremap() doesn't allow to map System RAM and reject the
access.

So, region_intersects() needs to be fixed to work correctly with the
resource tree with "System RAM" not at top level as above.  To fix it, if
we found a unmatched resource in the top level, we will continue to search
matched resources in its descendant resources.  So, we will not miss any
matched resources in resource tree anymore.

In the new implementation, an example resource tree

|------------- "CXL Window 0" ------------|
|-- "System RAM" --|

will behave similar as the following fake resource tree for
region_intersects(, IORESOURCE_SYSTEM_RAM, ),

|-- "System RAM" --||-- "CXL Window 0a" --|

Where "CXL Window 0a" is part of the original "CXL Window 0" that
isn't covered by "System RAM".

Link: https://lkml.kernel.org/r/20240906030713.204292-2-ying.huang@intel.com
Fixes: c221c0b030 ("device-dax: "Hotplug" persistent memory for use like normal RAM")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 00:58:04 -07:00
Sergey Senozhatsky
6040f650c5 zsmalloc: use unique zsmalloc caches names
Each zsmalloc pool maintains several named kmem-caches for zs_handle-s and
zspage-s.  On a system with multiple zsmalloc pools and CONFIG_DEBUG_VM
this triggers kmem_cache_sanity_check():

  kmem_cache of name 'zspage' already exists
  WARNING: at mm/slab_common.c:108 do_kmem_cache_create_usercopy+0xb5/0x310
  ...

  kmem_cache of name 'zs_handle' already exists
  WARNING: at mm/slab_common.c:108 do_kmem_cache_create_usercopy+0xb5/0x310
  ...

We provide zram device name when init its zsmalloc pool, so we can use
that same name for zsmalloc caches and, hence, create unique names that
can easily be linked to zram device that has created them.

So instead of having this

cat /proc/slabinfo
slabinfo - version: 2.1
zspage                46     46    ...
zs_handle            128    128    ...
zspage             34270  34270    ...
zs_handle          34816  34816    ...
zspage                 0      0    ...
zs_handle              0      0    ...

We now have this

cat /proc/slabinfo
slabinfo - version: 2.1
zspage-zram2          46     46    ...
zs_handle-zram2      128    128    ...
zspage-zram0       34270  34270    ...
zs_handle-zram0    34816  34816    ...
zspage-zram1           0      0    ...
zs_handle-zram1        0      0    ...

Link: https://lkml.kernel.org/r/20240906035103.2435557-1-senozhatsky@chromium.org
Fixes: 2e40e163a2 ("zsmalloc: decouple handle and object")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 00:58:04 -07:00
Liam R. Howlett
fb497d6db7 mm/damon/vaddr: protect vma traversal in __damon_va_thre_regions() with rcu read lock
Traversing VMAs of a given maple tree should be protected by rcu read
lock.  However, __damon_va_three_regions() is not doing the protection. 
Hold the lock.

Link: https://lkml.kernel.org/r/20240905001204.1481-1-sj@kernel.org
Fixes: d0cf3dd47f ("damon: convert __damon_va_three_regions to use the VMA iterator")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/b83651a0-5b24-4206-b860-cb54ffdf209b@roeck-us.net
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 15:15:54 -07:00
Chris Li
0885ef4705 mm: vmscan.c: fix OOM on swap stress test
I found a regression on mm-unstable during my swap stress test, using
tmpfs to compile linux.  The test OOM very soon after the make spawns many
cc processes.

It bisects down to this change: 33dfe9204f
(mm/gup: clear the LRU flag of a page before adding to LRU batch)

Yu Zhao propose the fix: "I think this is one of the potential side
effects -- Huge mentioned earlier about isolate_lru_folios():"

I test that with it the swap stress test no longer OOM.

Link: https://lore.kernel.org/r/CAOUHufYi9h0kz5uW3LHHS3ZrVwEq-kKp8S6N-MZUmErNAXoXmw@mail.gmail.com/
Link: https://lkml.kernel.org/r/20240905-lru-flag-v2-1-8a2d9046c594@kernel.org
Fixes: 33dfe9204f ("mm/gup: clear the LRU flag of a page before adding to LRU batch")
Signed-off-by: Chris Li <chrisl@kernel.org>
Suggested-by: Yu Zhao <yuzhao@google.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Closes: https://lore.kernel.org/all/CAF8kJuNP5iTj2p07QgHSGOJsiUfYpJ2f4R1Q5-3BN9JiD9W_KA@mail.gmail.com/
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 15:15:54 -07:00
Joseph Qi
35fccce29f ocfs2: cancel dqi_sync_work before freeing oinfo
ocfs2_global_read_info() will initialize and schedule dqi_sync_work at the
end, if error occurs after successfully reading global quota, it will
trigger the following warning with CONFIG_DEBUG_OBJECTS_* enabled:

ODEBUG: free active (active state 0) object: 00000000d8b0ce28 object type: timer_list hint: qsync_work_fn+0x0/0x16c

This reports that there is an active delayed work when freeing oinfo in
error handling, so cancel dqi_sync_work first.  BTW, return status instead
of -1 when .read_file_info fails.

Link: https://syzkaller.appspot.com/bug?extid=f7af59df5d6b25f0febd
Link: https://lkml.kernel.org/r/20240904071004.2067695-1-joseph.qi@linux.alibaba.com
Fixes: 171bf93ce1 ("ocfs2: Periodic quota syncing")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Reported-by: syzbot+f7af59df5d6b25f0febd@syzkaller.appspotmail.com
Tested-by: syzbot+f7af59df5d6b25f0febd@syzkaller.appspotmail.com
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 15:15:54 -07:00
Lizhi Xu
33b525cef4 ocfs2: fix possible null-ptr-deref in ocfs2_set_buffer_uptodate
When doing cleanup, if flags without OCFS2_BH_READAHEAD, it may trigger
NULL pointer dereference in the following ocfs2_set_buffer_uptodate() if
bh is NULL.

Link: https://lkml.kernel.org/r/20240902023636.1843422-3-joseph.qi@linux.alibaba.com
Fixes: cf76c78595 ("ocfs2: don't put and assigning null to bh allocated outside")
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Heming Zhao <heming.zhao@suse.com>
Suggested-by: Heming Zhao <heming.zhao@suse.com>
Cc: <stable@vger.kernel.org>	[4.20+]
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 15:15:54 -07:00
Lizhi Xu
c03a82b4a0 ocfs2: remove unreasonable unlock in ocfs2_read_blocks
Patch series "Misc fixes for ocfs2_read_blocks", v5.

This series contains 2 fixes for ocfs2_read_blocks().  The first patch fix
the issue reported by syzbot, which detects bad unlock balance in
ocfs2_read_blocks().  The second patch fixes an issue reported by Heming
Zhao when reviewing above fix.


This patch (of 2):

There was a lock release before exiting, so remove the unreasonable unlock.

Link: https://lkml.kernel.org/r/20240902023636.1843422-1-joseph.qi@linux.alibaba.com
Link: https://lkml.kernel.org/r/20240902023636.1843422-2-joseph.qi@linux.alibaba.com
Fixes: cf76c78595 ("ocfs2: don't put and assigning null to bh allocated outside")
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: syzbot+ab134185af9ef88dfed5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ab134185af9ef88dfed5
Tested-by: syzbot+ab134185af9ef88dfed5@syzkaller.appspotmail.com
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>	[4.20+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 15:15:53 -07:00
Julian Sun
5784d9fcfd ocfs2: fix null-ptr-deref when journal load failed.
During the mounting process, if journal_reset() fails because of too short
journal, then lead to jbd2_journal_load() fails with NULL j_sb_buffer. 
Subsequently, ocfs2_journal_shutdown() calls
jbd2_journal_flush()->jbd2_cleanup_journal_tail()->
__jbd2_update_log_tail()->jbd2_journal_update_sb_log_tail()
->lock_buffer(journal->j_sb_buffer), resulting in a null-pointer
dereference error.

To resolve this issue, we should check the JBD2_LOADED flag to ensure the
journal was properly loaded.  Additionally, use journal instead of
osb->journal directly to simplify the code.

Link: https://syzkaller.appspot.com/bug?extid=05b9b39d8bdfe1a0861f
Link: https://lkml.kernel.org/r/20240902030844.422725-1-sunjunchao2870@gmail.com
Fixes: f6f50e28f0 ("jbd2: Fail to load a journal if it is too short")
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Reported-by: syzbot+05b9b39d8bdfe1a0861f@syzkaller.appspotmail.com
Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 15:15:53 -07:00
Suren Baghdasaryan
052a45c1cb alloc_tag: fix allocation tag reporting when CONFIG_MODULES=n
codetag_module_init() is used to initialize sections containing allocation
tags.  This function is used to initialize module sections as well as core
kernel sections, in which case the module parameter is set to NULL.  This
function has to be called even when CONFIG_MODULES=n to initialize core
kernel allocation tag sections.  When CONFIG_MODULES=n, this function is a
NOP, which is wrong.  This leads to /proc/allocinfo reported as empty. 
Fix this by making it independent of CONFIG_MODULES.

Link: https://lkml.kernel.org/r/20240828231536.1770519-1-surenb@google.com
Fixes: 916cc5167c ("lib: code tagging framework")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>	[6.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:03 -07:00
Adrian Huang
409faf8c97 mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area
When running the vmalloc stress on a 448-core system, observe the average
latency of purge_vmap_node() is about 2 seconds by using the eBPF/bcc
'funclatency.py' tool [1].

  # /your-git-repo/bcc/tools/funclatency.py -u purge_vmap_node & pid1=$! && sleep 8 && modprobe test_vmalloc nr_threads=$(nproc) run_test_mask=0x7; kill -SIGINT $pid1

     usecs             : count    distribution
        0 -> 1         : 0       |                                        |
        2 -> 3         : 29      |                                        |
        4 -> 7         : 19      |                                        |
        8 -> 15        : 56      |                                        |
       16 -> 31        : 483     |****                                    |
       32 -> 63        : 1548    |************                            |
       64 -> 127       : 2634    |*********************                   |
      128 -> 255       : 2535    |*********************                   |
      256 -> 511       : 1776    |**************                          |
      512 -> 1023      : 1015    |********                                |
     1024 -> 2047      : 573     |****                                    |
     2048 -> 4095      : 488     |****                                    |
     4096 -> 8191      : 1091    |*********                               |
     8192 -> 16383     : 3078    |*************************               |
    16384 -> 32767     : 4821    |****************************************|
    32768 -> 65535     : 3318    |***************************             |
    65536 -> 131071    : 1718    |**************                          |
   131072 -> 262143    : 2220    |******************                      |
   262144 -> 524287    : 1147    |*********                               |
   524288 -> 1048575   : 1179    |*********                               |
  1048576 -> 2097151   : 822     |******                                  |
  2097152 -> 4194303   : 906     |*******                                 |
  4194304 -> 8388607   : 2148    |*****************                       |
  8388608 -> 16777215  : 4497    |*************************************   |
 16777216 -> 33554431  : 289     |**                                      |

  avg = 2041714 usecs, total: 78381401772 usecs, count: 38390

  The worst case is over 16-33 seconds, so soft lockup is triggered [2].

[Root Cause]
1) Each purge_list has the long list. The following shows the number of
   vmap_area is purged.

   crash> p vmap_nodes
   vmap_nodes = $27 = (struct vmap_node *) 0xff2de5a900100000
   crash> vmap_node 0xff2de5a900100000 128 | grep nr_purged
     nr_purged = 663070
     ...
     nr_purged = 821670
     nr_purged = 692214
     nr_purged = 726808
     ...

2) atomic_long_sub() employs the 'lock' prefix to ensure the atomic
   operation when purging each vmap_area. However, the iteration is over
   600000 vmap_area (See 'nr_purged' above).

   Here is objdump output:

     $ objdump -D vmlinux
     ffffffff813e8c80 <purge_vmap_node>:
     ...
     ffffffff813e8d70:  f0 48 29 2d 68 0c bb  lock sub %rbp,0x2bb0c68(%rip)
     ...

   Quote from "Instruction tables" pdf file [3]:
     Instructions with a LOCK prefix have a long latency that depends on
     cache organization and possibly RAM speed. If there are multiple
     processors or cores or direct memory access (DMA) devices, then all
     locked instructions will lock a cache line for exclusive access,
     which may involve RAM access. A LOCK prefix typically costs more
     than a hundred clock cycles, even on single-processor systems.

   That's why the latency of purge_vmap_node() dramatically increases
   on a many-core system: One core is busy on purging each vmap_area of
   the *long* purge_list and executing atomic_long_sub() for each
   vmap_area, while other cores free vmalloc allocations and execute
   atomic_long_add_return() in free_vmap_area_noflush().

[Solution]
Employ a local variable to record the total purged pages, and execute
atomic_long_sub() after the traversal of the purge_list is done. The
experiment result shows the latency improvement is 99%.

[Experiment Result]
1) System Configuration: Three servers (with HT-enabled) are tested.
     * 72-core server: 3rd Gen Intel Xeon Scalable Processor*1
     * 192-core server: 5th Gen Intel Xeon Scalable Processor*2
     * 448-core server: AMD Zen 4 Processor*2

2) Kernel Config
     * CONFIG_KASAN is disabled

3) The data in column "w/o patch" and "w/ patch"
     * Unit: micro seconds (us)
     * Each data is the average of 3-time measurements

         System        w/o patch (us)   w/ patch (us)    Improvement (%)
     ---------------   --------------   -------------    -------------
     72-core server          2194              14            99.36%
     192-core server       143799            1139            99.21%
     448-core server      1992122            6883            99.65%

[1] https://github.com/iovisor/bcc/blob/master/tools/funclatency.py
[2] https://gist.github.com/AdrianHuang/37c15f67b45407b83c2d32f918656c12
[3] https://www.agner.org/optimize/instruction_tables.pdf

Link: https://lkml.kernel.org/r/20240829130633.2184-1-ahuang12@lenovo.com
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:02 -07:00
Jan Kuliga
4f295229b2 mailmap: update entry for Jan Kuliga
Soon I won't be able to use my current email address.

Link: https://lkml.kernel.org/r/20240830095658.1203198-1-jankul@alatek.krakow.pl
Signed-off-by: Jan Kuliga <jankul@alatek.krakow.pl>
Cc: David S. Miller <davem@davemloft.net>
Cc: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:02 -07:00
Hao Ge
5e9784e997 codetag: debug: mark codetags for poisoned page as empty
When PG_hwpoison pages are freed they are treated differently in
free_pages_prepare() and instead of being released they are isolated.

Page allocation tag counters are decremented at this point since the page
is considered not in use.  Later on when such pages are released by
unpoison_memory(), the allocation tag counters will be decremented again
and the following warning gets reported:

[  113.930443][ T3282] ------------[ cut here ]------------
[  113.931105][ T3282] alloc_tag was not set
[  113.931576][ T3282] WARNING: CPU: 2 PID: 3282 at ./include/linux/alloc_tag.h:130 pgalloc_tag_sub.part.66+0x154/0x164
[  113.932866][ T3282] Modules linked in: hwpoison_inject fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_man4
[  113.941638][ T3282] CPU: 2 UID: 0 PID: 3282 Comm: madvise11 Kdump: loaded Tainted: G        W          6.11.0-rc4-dirty #18
[  113.943003][ T3282] Tainted: [W]=WARN
[  113.943453][ T3282] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[  113.944378][ T3282] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  113.945319][ T3282] pc : pgalloc_tag_sub.part.66+0x154/0x164
[  113.946016][ T3282] lr : pgalloc_tag_sub.part.66+0x154/0x164
[  113.946706][ T3282] sp : ffff800087093a10
[  113.947197][ T3282] x29: ffff800087093a10 x28: ffff0000d7a9d400 x27: ffff80008249f0a0
[  113.948165][ T3282] x26: 0000000000000000 x25: ffff80008249f2b0 x24: 0000000000000000
[  113.949134][ T3282] x23: 0000000000000001 x22: 0000000000000001 x21: 0000000000000000
[  113.950597][ T3282] x20: ffff0000c08fcad8 x19: ffff80008251e000 x18: ffffffffffffffff
[  113.952207][ T3282] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800081746210
[  113.953161][ T3282] x14: 0000000000000000 x13: 205d323832335420 x12: 5b5d353031313339
[  113.954120][ T3282] x11: ffff800087093500 x10: 000000000000005d x9 : 00000000ffffffd0
[  113.955078][ T3282] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008236ba90 x6 : c0000000ffff7fff
[  113.956036][ T3282] x5 : ffff000b34bf4dc8 x4 : ffff8000820aba90 x3 : 0000000000000001
[  113.956994][ T3282] x2 : ffff800ab320f000 x1 : 841d1e35ac932e00 x0 : 0000000000000000
[  113.957962][ T3282] Call trace:
[  113.958350][ T3282]  pgalloc_tag_sub.part.66+0x154/0x164
[  113.959000][ T3282]  pgalloc_tag_sub+0x14/0x1c
[  113.959539][ T3282]  free_unref_page+0xf4/0x4b8
[  113.960096][ T3282]  __folio_put+0xd4/0x120
[  113.960614][ T3282]  folio_put+0x24/0x50
[  113.961103][ T3282]  unpoison_memory+0x4f0/0x5b0
[  113.961678][ T3282]  hwpoison_unpoison+0x30/0x48 [hwpoison_inject]
[  113.962436][ T3282]  simple_attr_write_xsigned.isra.34+0xec/0x1cc
[  113.963183][ T3282]  simple_attr_write+0x38/0x48
[  113.963750][ T3282]  debugfs_attr_write+0x54/0x80
[  113.964330][ T3282]  full_proxy_write+0x68/0x98
[  113.964880][ T3282]  vfs_write+0xdc/0x4d0
[  113.965372][ T3282]  ksys_write+0x78/0x100
[  113.965875][ T3282]  __arm64_sys_write+0x24/0x30
[  113.966440][ T3282]  invoke_syscall+0x7c/0x104
[  113.966984][ T3282]  el0_svc_common.constprop.1+0x88/0x104
[  113.967652][ T3282]  do_el0_svc+0x2c/0x38
[  113.968893][ T3282]  el0_svc+0x3c/0x1b8
[  113.969379][ T3282]  el0t_64_sync_handler+0x98/0xbc
[  113.969980][ T3282]  el0t_64_sync+0x19c/0x1a0
[  113.970511][ T3282] ---[ end trace 0000000000000000 ]---

To fix this, clear the page tag reference after the page got isolated
and accounted for.

Link: https://lkml.kernel.org/r/20240825163649.33294-1-hao.ge@linux.dev
Fixes: d224eb0287 ("codetag: debug: mark codetags for reserved pages as empty")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hao Ge <gehao@kylinos.cn>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: <stable@vger.kernel.org>	[6.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:02 -07:00
Mike Yuan
e399257349 mm/memcontrol: respect zswap.writeback setting from parent cg too
Currently, the behavior of zswap.writeback wrt.  the cgroup hierarchy
seems a bit odd.  Unlike zswap.max, it doesn't honor the value from parent
cgroups.  This surfaced when people tried to globally disable zswap
writeback, i.e.  reserve physical swap space only for hibernation [1] -
disabling zswap.writeback only for the root cgroup results in subcgroups
with zswap.writeback=1 still performing writeback.

The inconsistency became more noticeable after I introduced the
MemoryZSwapWriteback= systemd unit setting [2] for controlling the knob.
The patch assumed that the kernel would enforce the value of parent
cgroups.  It could probably be workarounded from systemd's side, by going
up the slice unit tree and inheriting the value.  Yet I think it's more
sensible to make it behave consistently with zswap.max and friends.

[1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation
[2] https://github.com/systemd/systemd/pull/31734

Link: https://lkml.kernel.org/r/20240823162506.12117-1-me@yhndnzj.com
Fixes: 501a06fe8e ("zswap: memcontrol: implement zswap writeback disabling")
Signed-off-by: Mike Yuan <me@yhndnzj.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:02 -07:00
Marc Zyngier
a3f6a89c83 scripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum
Richard reports that since 772dd03427 ("mm: enumerate all gfp flags"),
gfp-translate is broken, as the bit numbers are implicit, leaving the
shell script unable to extract them.  Even more, some bits are now at a
variable location, making it double extra hard to parse using a simple
shell script.

Use a brute-force approach to the problem by generating a small C stub
that will use the enum to dump the interesting bits.

As an added bonus, we are now able to identify invalid bits for a given
configuration.  As an added drawback, we cannot parse include files that
predate this change anymore.  Tough luck.

Link: https://lkml.kernel.org/r/20240823163850.3791201-1-maz@kernel.org
Fixes: 772dd03427 ("mm: enumerate all gfp flags")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reported-by: Richard Weinberger <richard@nod.at>
Cc: Petr Tesařík <petr@tesarici.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:01 -07:00
Usama Arif
bfe0857c20 Revert "mm: skip CMA pages when they are not available"
This reverts commit 5da226dbfc ("mm: skip CMA pages when they are not
available") and b7108d6631 ("Multi-gen LRU: skip CMA pages when they are
not eligible").

lruvec->lru_lock is highly contended and is held when calling
isolate_lru_folios.  If the lru has a large number of CMA folios
consecutively, while the allocation type requested is not MIGRATE_MOVABLE,
isolate_lru_folios can hold the lock for a very long time while it skips
those.  For FIO workload, ~150million order=0 folios were skipped to
isolate a few ZONE_DMA folios [1].  This can cause lockups [1] and high
memory pressure for extended periods of time [2].

Remove skipping CMA for MGLRU as well, as it was introduced in sort_folio
for the same resaon as 5da226dbfc.

[1] https://lore.kernel.org/all/CAOUHufbkhMZYz20aM_3rHZ3OcK4m2puji2FGpUpn_-DevGk3Kg@mail.gmail.com/
[2] https://lore.kernel.org/all/ZrssOrcJIDy8hacI@gmail.com/

[usamaarif642@gmail.com: also revert b7108d6631, per Johannes]
  Link: https://lkml.kernel.org/r/9060a32d-b2d7-48c0-8626-1db535653c54@gmail.com
  Link: https://lkml.kernel.org/r/357ac325-4c61-497a-92a3-bdbd230d5ec9@gmail.com
Link: https://lkml.kernel.org/r/9060a32d-b2d7-48c0-8626-1db535653c54@gmail.com
Fixes: 5da226dbfc ("mm: skip CMA pages when they are not available")
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zhaoyang Huang <huangzhaoyang@gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:01 -07:00
Liam R. Howlett
f806de88d8 maple_tree: remove rcu_read_lock() from mt_validate()
The write lock should be held when validating the tree to avoid updates
racing with checks.  Holding the rcu read lock during a large tree
validation may also cause a prolonged rcu read window and "rcu_preempt
detected stalls" warnings.

Link: https://lore.kernel.org/all/0000000000001d12d4062005aea1@google.com/
Link: https://lkml.kernel.org/r/20240820175417.2782532-1-Liam.Howlett@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reported-by: syzbot+036af2f0c7338a33b0cd@syzkaller.appspotmail.com
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:01 -07:00
Petr Tesarik
6dacd79d28 kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
Fix the condition to exclude the elfcorehdr segment from the SHA digest
calculation.

The j iterator is an index into the output sha_regions[] array, not into
the input image->segment[] array.  Once it reaches
image->elfcorehdr_index, all subsequent segments are excluded.  Besides,
if the purgatory segment precedes the elfcorehdr segment, the elfcorehdr
may be wrongly included in the calculation.

Link: https://lkml.kernel.org/r/20240805150750.170739-1-petr.tesarik@suse.com
Fixes: f7cc804a9f ("kexec: exclude elfcorehdr from the segment digest")
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Eric DeVolder <eric_devolder@yahoo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:01 -07:00
Hao Ge
ab7ca09520 mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook
When enable CONFIG_MEMCG & CONFIG_KFENCE & CONFIG_KMEMLEAK, the following
warning always occurs,This is because the following call stack occurred:
mem_pool_alloc
    kmem_cache_alloc_noprof
        slab_alloc_node
            kfence_alloc

Once the kfence allocation is successful,slab->obj_exts will not be empty,
because it has already been assigned a value in kfence_init_pool.

Since in the prepare_slab_obj_exts_hook function,we perform a check for
s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE),the alloc_tag_add function
will not be called as a result.Therefore,ref->ct remains NULL.

However,when we call mem_pool_free,since obj_ext is not empty, it
eventually leads to the alloc_tag_sub scenario being invoked.  This is
where the warning occurs.

So we should add corresponding checks in the alloc_tagging_slab_free_hook.
For __GFP_NO_OBJ_EXT case,I didn't see the specific case where it's using
kfence,so I won't add the corresponding check in
alloc_tagging_slab_free_hook for now.

[    3.734349] ------------[ cut here ]------------
[    3.734807] alloc_tag was not set
[    3.735129] WARNING: CPU: 4 PID: 40 at ./include/linux/alloc_tag.h:130 kmem_cache_free+0x444/0x574
[    3.735866] Modules linked in: autofs4
[    3.736211] CPU: 4 UID: 0 PID: 40 Comm: ksoftirqd/4 Tainted: G        W          6.11.0-rc3-dirty #1
[    3.736969] Tainted: [W]=WARN
[    3.737258] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[    3.737875] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    3.738501] pc : kmem_cache_free+0x444/0x574
[    3.738951] lr : kmem_cache_free+0x444/0x574
[    3.739361] sp : ffff80008357bb60
[    3.739693] x29: ffff80008357bb70 x28: 0000000000000000 x27: 0000000000000000
[    3.740338] x26: ffff80008207f000 x25: ffff000b2eb2fd60 x24: ffff0000c0005700
[    3.740982] x23: ffff8000804229e4 x22: ffff800082080000 x21: ffff800081756000
[    3.741630] x20: fffffd7ff8253360 x19: 00000000000000a8 x18: ffffffffffffffff
[    3.742274] x17: ffff800ab327f000 x16: ffff800083398000 x15: ffff800081756df0
[    3.742919] x14: 0000000000000000 x13: 205d344320202020 x12: 5b5d373038343337
[    3.743560] x11: ffff80008357b650 x10: 000000000000005d x9 : 00000000ffffffd0
[    3.744231] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008237bad0 x6 : c0000000ffff7fff
[    3.744907] x5 : ffff80008237ba78 x4 : ffff8000820bbad0 x3 : 0000000000000001
[    3.745580] x2 : 68d66547c09f7800 x1 : 68d66547c09f7800 x0 : 0000000000000000
[    3.746255] Call trace:
[    3.746530]  kmem_cache_free+0x444/0x574
[    3.746931]  mem_pool_free+0x44/0xf4
[    3.747306]  free_object_rcu+0xc8/0xdc
[    3.747693]  rcu_do_batch+0x234/0x8a4
[    3.748075]  rcu_core+0x230/0x3e4
[    3.748424]  rcu_core_si+0x14/0x1c
[    3.748780]  handle_softirqs+0x134/0x378
[    3.749189]  run_ksoftirqd+0x70/0x9c
[    3.749560]  smpboot_thread_fn+0x148/0x22c
[    3.749978]  kthread+0x10c/0x118
[    3.750323]  ret_from_fork+0x10/0x20
[    3.750696] ---[ end trace 0000000000000000 ]---

Link: https://lkml.kernel.org/r/20240816013336.17505-1-hao.ge@linux.dev
Fixes: 4b87369646 ("mm/slab: add allocation accounting into slab allocation and free paths")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:01 -07:00
Ryusuke Konishi
6576dd6695 nilfs2: fix state management in error path of log writing function
After commit a694291a62 ("nilfs2: separate wait function from
nilfs_segctor_write") was applied, the log writing function
nilfs_segctor_do_construct() was able to issue I/O requests continuously
even if user data blocks were split into multiple logs across segments,
but two potential flaws were introduced in its error handling.

First, if nilfs_segctor_begin_construction() fails while creating the
second or subsequent logs, the log writing function returns without
calling nilfs_segctor_abort_construction(), so the writeback flag set on
pages/folios will remain uncleared.  This causes page cache operations to
hang waiting for the writeback flag.  For example,
truncate_inode_pages_final(), which is called via nilfs_evict_inode() when
an inode is evicted from memory, will hang.

Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared. 
As a result, if the next log write involves checkpoint creation, that's
fine, but if a partial log write is performed that does not, inodes with
NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files"
list, and their data and b-tree blocks may not be written to the device,
corrupting the block mapping.

Fix these issues by uniformly calling nilfs_segctor_abort_construction()
on failure of each step in the loop in nilfs_segctor_do_construct(),
having it clean up logs and segment usages according to progress, and
correcting the conditions for calling nilfs_redirty_inodes() to ensure
that the NILFS_I_COLLECTED flag is cleared.

Link: https://lkml.kernel.org/r/20240814101119.4070-1-konishi.ryusuke@gmail.com
Fixes: a694291a62 ("nilfs2: separate wait function from nilfs_segctor_write")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:00 -07:00
Ryusuke Konishi
5787fcaab9 nilfs2: fix missing cleanup on rollforward recovery error
In an error injection test of a routine for mount-time recovery, KASAN
found a use-after-free bug.

It turned out that if data recovery was performed using partial logs
created by dsync writes, but an error occurred before starting the log
writer to create a recovered checkpoint, the inodes whose data had been
recovered were left in the ns_dirty_files list of the nilfs object and
were not freed.

Fix this issue by cleaning up inodes that have read the recovery data if
the recovery routine fails midway before the log writer starts.

Link: https://lkml.kernel.org/r/20240810065242.3701-1-konishi.ryusuke@gmail.com
Fixes: 0f3e1c7f23 ("nilfs2: recovery functions")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:00 -07:00
Ryusuke Konishi
6834082589 nilfs2: protect references to superblock parameters exposed in sysfs
The superblock buffers of nilfs2 can not only be overwritten at runtime
for modifications/repairs, but they are also regularly swapped, replaced
during resizing, and even abandoned when degrading to one side due to
backing device issues.  So, accessing them requires mutual exclusion using
the reader/writer semaphore "nilfs->ns_sem".

Some sysfs attribute show methods read this superblock buffer without the
necessary mutual exclusion, which can cause problems with pointer
dereferencing and memory access, so fix it.

Link: https://lkml.kernel.org/r/20240811100320.9913-1-konishi.ryusuke@gmail.com
Fixes: da7141fb78 ("nilfs2: add /sys/fs/nilfs2/<device> group")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:00 -07:00
Jann Horn
4828d207dc userfaultfd: don't BUG_ON() if khugepaged yanks our page table
Since khugepaged was changed to allow retracting page tables in file
mappings without holding the mmap lock, these BUG_ON()s are wrong - get
rid of them.

We could also remove the preceding "if (unlikely(...))" block, but then we
could reach pte_offset_map_lock() with transhuge pages not just for file
mappings but also for anonymous mappings - which would probably be fine
but I think is not necessarily expected.

Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-2-5efa61078a41@google.com
Fixes: 1d65b771bc ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:00 -07:00
Jann Horn
71c186efc1 userfaultfd: fix checks for huge PMDs
Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.

The pmd_trans_huge() code in mfill_atomic() is wrong in three different
ways depending on kernel version:

1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
   the right two race windows) - I've tested this in a kernel build with
   some extra mdelay() calls. See the commit message for a description
   of the race scenario.
   On older kernels (before 6.5), I think the same bug can even
   theoretically lead to accessing transhuge page contents as a page table
   if you hit the right 5 narrow race windows (I haven't tested this case).
2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
   detecting PMDs that don't point to page tables.
   On older kernels (before 6.5), you'd just have to win a single fairly
   wide race to hit this.
   I've tested this on 6.1 stable by racing migration (with a mdelay()
   patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
   VM, that causes a kernel oops in ptlock_ptr().
3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
   to yank page tables out from under us (though I haven't tested that),
   so I think the BUG_ON() checks in mfill_atomic() are just wrong.

I decided to write two separate fixes for these (one fix for bugs 1+2, one
fix for bug 3), so that the first fix can be backported to kernels
affected by bugs 1+2.


This patch (of 2):

This fixes two issues.

I discovered that the following race can occur:

  mfill_atomic                other thread
  ============                ============
                              <zap PMD>
  pmdp_get_lockless() [reads none pmd]
  <bail if trans_huge>
  <if none:>
                              <pagefault creates transhuge zeropage>
    __pte_alloc [no-op]
                              <zap PMD>
  <bail if pmd_trans_huge(*dst_pmd)>
  BUG_ON(pmd_none(*dst_pmd))

I have experimentally verified this in a kernel with extra mdelay() calls;
the BUG_ON(pmd_none(*dst_pmd)) triggers.

On kernels newer than commit 0d940a9b27 ("mm/pgtable: allow
pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
a BUG_ON(), since the page table access helpers are actually designed to
deal with page tables concurrently disappearing; but on older kernels
(<=6.4), I think we could probably theoretically race past the two
BUG_ON() checks and end up treating a hugepage as a page table.

The second issue is that, as Qi Zheng pointed out, there are other types
of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
(in particular, migration PMDs).

On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
PMD that contains a migration entry (which just requires winning a single,
fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
assumes that the PMD points to a page table.

Breakage follows: First, the kernel tries to take the PTE lock (which will
crash or maybe worse if there is no "struct page" for the address bits in
the migration entry PMD - I think at least on X86 there usually is no
corresponding "struct page" thanks to the PTE inversion mitigation, amd64
looks different).

If that didn't crash, the kernel would next try to write a PTE into what
it wrongly thinks is a page table.

As part of fixing these issues, get rid of the check for pmd_trans_huge()
before __pte_alloc() - that's redundant, we're going to have to check for
that after the __pte_alloc() anyway.

Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.

Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@google.com
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@google.com
Fixes: c1a4de99fa ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:58:59 -07:00
Will Deacon
3e3de7947c mm: vmalloc: ensure vmap_block is initialised before adding to queue
Commit 8c61291fd8 ("mm: fix incorrect vbq reference in
purge_fragmented_block") extended the 'vmap_block' structure to contain a
'cpu' field which is set at allocation time to the id of the initialising
CPU.

When a new 'vmap_block' is being instantiated by new_vmap_block(), the
partially initialised structure is added to the local 'vmap_block_queue'
xarray before the 'cpu' field has been initialised.  If another CPU is
concurrently walking the xarray (e.g.  via vm_unmap_aliases()), then it
may perform an out-of-bounds access to the remote queue thanks to an
uninitialised index.

This has been observed as UBSAN errors in Android:

 | Internal error: UBSAN: array index out of bounds: 00000000f2005512 [#1] PREEMPT SMP
 |
 | Call trace:
 |  purge_fragmented_block+0x204/0x21c
 |  _vm_unmap_aliases+0x170/0x378
 |  vm_unmap_aliases+0x1c/0x28
 |  change_memory_common+0x1dc/0x26c
 |  set_memory_ro+0x18/0x24
 |  module_enable_ro+0x98/0x238
 |  do_init_module+0x1b0/0x310

Move the initialisation of 'vb->cpu' in new_vmap_block() ahead of the
addition to the xarray.

Link: https://lkml.kernel.org/r/20240812171606.17486-1-will@kernel.org
Fixes: 8c61291fd8 ("mm: fix incorrect vbq reference in purge_fragmented_block")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Cc: Hailong.Liu <hailong.liu@oppo.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:58:59 -07:00
Muhammad Usama Anjum
b808f62921 selftests: mm: fix build errors on armhf
The __NR_mmap isn't found on armhf.  The mmap() is commonly available
system call and its wrapper is present on all architectures.  So it should
be used directly.  It solves problem for armhf and doesn't create problem
for other architectures.

Remove sys_mmap() functions as they aren't doing anything else other than
calling mmap().  There is no need to set errno = 0 manually as glibc
always resets it.

For reference errors are as following:

  CC       seal_elf
seal_elf.c: In function 'sys_mmap':
seal_elf.c:39:33: error: '__NR_mmap' undeclared (first use in this function)
   39 |         sret = (void *) syscall(__NR_mmap, addr, len, prot,
      |                                 ^~~~~~~~~

mseal_test.c: In function 'sys_mmap':
mseal_test.c:90:33: error: '__NR_mmap' undeclared (first use in this function)
   90 |         sret = (void *) syscall(__NR_mmap, addr, len, prot,
      |                                 ^~~~~~~~~

Link: https://lkml.kernel.org/r/20240809082511.497266-1-usama.anjum@collabora.com
Fixes: 4926c7a52d ("selftest mm/mseal memory sealing")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:58:59 -07:00
Linus Torvalds
431c1646e1 Linux 6.11-rc6 v6.11-rc6 2024-09-01 19:46:02 +12:00
Linus Torvalds
6b9ffc4595 Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:

 - copy_file_range fix

 - two read fixes including read past end of file rc fix and read retry
   crediting fix

 - falloc zero range fix

* tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region
  cifs: Fix copy offload to flush destination region
  netfs, cifs: Fix handling of short DIO read
  cifs: Fix lack of credit renegotiation on read retry
2024-09-01 15:49:26 +12:00
Linus Torvalds
a4c763129f Merge tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs
Push bcachefs fixes from Kent Overstreet:
 "The data corruption in the buffered write path is troubling; inode
  lock should not have been able to cause that...

   - Fix a rare data corruption in the rebalance path, caught as a nonce
     inconsistency on encrypted filesystems

   - Revert lockless buffered write path

   - Mark more errors as autofix"

* tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs:
  bcachefs: Mark more errors as autofix
  bcachefs: Revert lockless buffered IO path
  bcachefs: Fix bch2_extents_match() false positive
  bcachefs: Fix failure to return error in data_update_index_update()
2024-09-01 15:23:20 +12:00
Kent Overstreet
3d3020c461 bcachefs: Mark more errors as autofix
errors that are known to always be safe to fix should be autofix: this
should be most errors even at this point, but that will need some
thorough review.

note that errors are still logged in the superblock, so we'll still know
that they happened.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-31 19:27:01 -04:00
Kent Overstreet
e3e6940940 bcachefs: Revert lockless buffered IO path
We had a report of data corruption on nixos when building installer
images.

https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334

It seems that writes are being dropped, but only when issued by QEMU,
and possibly only in snapshot mode. It's undetermined if it's write
calls are being dropped or dirty folios.

Further testing, via minimizing the original patch to just the change
that skips the inode lock on non appends/truncates, reveals that it
really is just not taking the inode lock that causes the corruption: it
has nothing to do with the other logic changes for preserving write
atomicity in corner cases.

It's also kernel config dependent: it doesn't reproduce with the minimal
kernel config that ktest uses, but it does reproduce with nixos's distro
config. Bisection the kernel config initially pointer the finger at page
migration or compaction, but it appears that was erroneous; we haven't
yet determined what kernel config option actually triggers it.

Sadly it appears this will have to be reverted since we're getting too
close to release and my plate is full, but we'd _really_ like to fully
debug it.

My suspicion is that this patch is exposing a preexisting bug - the
inode lock actually covers very little in IO paths, and we have a
different lock (the pagecache add lock) that guards against races with
truncate here.

Fixes: 7e64c86cdc ("bcachefs: Buffered write path now can avoid the inode lock")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-31 19:26:08 -04:00
Linus Torvalds
6cd90e5ea7 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull misc fixes from Guenter Roeck.

These are fixes for regressions that Guenther has been reporting, and
the maintainers haven't picked up and sent in. With rc6 fairly imminent,
I'm taking them directly from Guenter.

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  apparmor: fix policy_unpack_test on big endian systems
  Revert "MIPS: csrc-r4k: Apply verification clocksource flags"
  microblaze: don't treat zero reserved memory regions as error
2024-09-01 09:18:48 +12:00
Linus Torvalds
8463be8448 Merge tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull power sequencing fix from Bartosz Golaszewski:
 "A follow-up fix for the power sequencing subsystem. It turned out the
  previous fix for this driver was incomplete and broke the WLAN support
  on some platforms. This addresses the issue.

   - set the direction of the wlan-enable GPIO to output after
     requesting it as-is"

* tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  power: sequencing: qcom-wcn: set the wlan-enable GPIO to output
2024-09-01 09:07:44 +12:00
Bartosz Golaszewski
d8b762070c power: sequencing: qcom-wcn: set the wlan-enable GPIO to output
Commit a9aaf1ff88 ("power: sequencing: request the WLAN enable GPIO
as-is") broke WLAN on boards on which the wlan-enable GPIO enabling the
wifi module isn't in output mode by default. We need to set direction to
output while retaining the value that was already set to keep the ath
module on if it's already started.

Fixes: a9aaf1ff88 ("power: sequencing: request the WLAN enable GPIO as-is")
Link: https://lore.kernel.org/r/20240823115500.37280-1-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-08-31 21:32:19 +02:00
Linus Torvalds
e8784b0aef Merge tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are some small USB fixes for 6.11-rc6.  Included in here are:

   - dwc3 driver fixes for reported issues

   - MAINTAINER file update, marking a driver as unsupported :(

   - cdnsp driver fixes

   - USB gadget driver fix

   - USB sysfs fix

   - other tiny fixes

   - new device ids for usb serial driver

  All of these have been in linux-next this week with no reported
  issues"

* tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: option: add MeiG Smart SRM825L
  usb: cdnsp: fix for Link TRB with TC
  usb: dwc3: st: add missing depopulate in probe error path
  usb: dwc3: st: fix probed platform device ref count on probe error path
  usb: dwc3: ep0: Don't reset resource alloc flag (including ep0)
  usb: core: sysfs: Unmerge @usb3_hardware_lpm_attr_group in remove_power_attributes()
  usb: typec: fsa4480: Relax CHIP_ID check
  usb: dwc3: xilinx: add missing depopulate in probe error path
  usb: dwc3: omap: add missing depopulate in probe error path
  dt-bindings: usb: microchip,usb2514: Fix reference USB device schema
  usb: gadget: uvc: queue pump work in uvcg_video_enable()
  cdc-acm: Add DISABLE_ECHO quirk for GE HealthCare UI Controller
  usb: cdnsp: fix incorrect index in cdnsp_get_hw_deq function
  usb: dwc3: core: Prevent USB core invalid event buffer address access
  MAINTAINERS: Mark UVC gadget driver as orphan
2024-09-01 07:06:28 +12:00
Linus Torvalds
770b0ffe28 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Minor fixes only.

  The sd.c one ignores a sync cache request if format is in progress
  which can happen if formatting a drive across suspend/resume"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: Ignore command SYNCHRONIZE CACHE error if format in progress
  scsi: aacraid: Fix double-free on probe failure
  scsi: lpfc: Fix overflow build issue
2024-09-01 07:00:38 +12:00
Linus Torvalds
6a2fcc51a7 Merge tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:

 - One more write delegation fix

* tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease
2024-09-01 06:55:47 +12:00
Linus Torvalds
0efdc09796 Merge tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:

 - Do not call out v1 inodes with non-zero di_nlink field as being
   corrupt

 - Change xfs_finobt_count_blocks() to count "free inode btree" blocks
   rather than "inode btree" blocks

 - Don't report the number of trimmed bytes via FITRIM because the
   underlying storage isn't required to do anything and failed discard
   IOs aren't reported to the caller anyway

 - Fix incorrect setting of rm_owner field in an rmap query

 - Report missing disk offset range in an fsmap query

 - Obtain m_growlock when extending realtime section of the filesystem

 - Reset rootdir extent size hint after extending realtime section of
   the filesystem

* tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: reset rootdir extent size hint after growfsrt
  xfs: take m_growlock when running growfsrt
  xfs: Fix missing interval for missing_owner in xfs fsmap
  xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
  xfs: Fix the owner setting issue for rmap query in xfs fsmap
  xfs: don't bother reporting blocks trimmed via FITRIM
  xfs: xfs_finobt_count_blocks() walks the wrong btree
  xfs: fix folio dirtying for XFILE_ALLOC callers
  xfs: fix di_onlink checking for V1/V2 inodes
2024-09-01 06:48:37 +12:00
Linus Torvalds
35667a2969 Merge tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
 "There is a fairly large number of bug fixes for Qualcomm platforms,
  most of them addressing issues with the devicetree files for the newly
  added Snapdragon X1 based laptops to make them more reliable.

  The Qualcomm driver changes address a few build-time issues as well as
  runtime problems in the tzmem and scm firmware, the USB Type-C driver,
  and the cmd-db and pmic_glink soc drivers.

  The NXP i.MX usually gets a bunch of devicetree fixes that is
  proportional to the number of supported machines. This includes both
  warning fixes and correctness for the 64-bit i.MX9, i.MX8 and
  layerscape platforms, as well as a single fix for a 32-bit i.MX6 based
  board.

  The other changes are the usual minor changes, including an update to
  the MAINTAINERS file, an omap3 dts file and a SoC driver for mpfs
  (risc-v)"

* tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits)
  firmware: microchip: fix incorrect error report of programming:timeout on success
  soc: qcom: pd-mapper: Fix singleton refcount
  firmware: qcom: tzmem: disable sdm670 platform
  soc: qcom: pmic_glink: Actually communicate when remote goes down
  usb: typec: ucsi: Move unregister out of atomic section
  soc: qcom: pmic_glink: Fix race during initialization
  firmware: qcom: qseecom: remove unused functions
  firmware: qcom: tzmem: fix virtual-to-physical address conversion
  firmware: qcom: scm: Mark get_wq_ctx() as atomic call
  arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt
  arm64: dts: qcom: disable GPU on x1e80100 by default
  arm64: dts: imx8mm-phygate: fix typo pinctrcl-0
  arm64: dts: imx95: correct L3Cache cache-sets
  arm64: dts: imx95: correct a55 power-domains
  arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo
  arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges
  ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design
  arm64: dts: imx93: update default value for snps,clk-csr
  arm64: dts: freescale: tqma9352: Fix watchdog reset
  arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962
  ...
2024-09-01 06:42:13 +12:00
Linus Torvalds
1934261d89 Merge tag 'input-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fix from Dmitry Torokhov:

 - a fix for Cypress PS/2 touchpad for regression introduced in 6.11
   merge window where a timeout condition is incorrectly reported for
   all extended Cypress commands

* tag 'input-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: cypress_ps2 - fix waiting for command response
2024-08-31 15:32:38 +12:00
Linus Torvalds
8101b2766d Merge tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:

 - Add Manivannan Sadhasivam as PCI native host bridge and endpoint
   driver reviewer (Manivannan Sadhasivam)

 - Disable MHI RAM data parity error interrupt for qcom SA8775P SoC to
   work around hardware erratum that causes a constant stream of
   interrupts (Manivannan Sadhasivam)

 - Don't try to fall back to qcom Operating Performance Points (OPP)
   support unless the platform actually supports OPP (Manivannan
   Sadhasivam)

 - Add imx@lists.linux.dev mailing list to MAINTAINERS for NXP
   layerscape and imx6 PCI controller drivers (Frank Li)

* tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  MAINTAINERS: PCI: Add NXP PCI controller mailing list imx@lists.linux.dev
  PCI: qcom: Use OPP only if the platform supports it
  PCI: qcom-ep: Disable MHI RAM data parity error interrupt for SA8775P SoC
  MAINTAINERS: Add Manivannan Sadhasivam as Reviewer for PCI native host bridge and endpoint drivers
2024-08-31 14:54:11 +12:00
Linus Torvalds
216d163165 Merge tag 'block-6.11-20240830' of git://git.kernel.dk/linux
Pull block fix from Jens Axboe:
 "Fix for a single regression for WRITE_SAME introduced in the 6.11
  merge window"

* tag 'block-6.11-20240830' of git://git.kernel.dk/linux:
  block: fix detection of unsupported WRITE SAME in blkdev_issue_write_zeroes
2024-08-31 13:54:05 +12:00
Linus Torvalds
ad246d9f04 Merge tag 'io_uring-6.11-20240830' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:

 - A fix for a regression that happened in 6.11 merge window, where the
   copying of iovecs for compat mode applications got broken for certain
   cases.

 - Fix for a bug introduced in 6.10, where if using recv/send bundles
   with classic provided buffers, the recv/send would fail to set the
   right iovec count. This caused 0 byte send/recv results. Found via
   code coverage testing and writing a test case to exercise it.

* tag 'io_uring-6.11-20240830' of git://git.kernel.dk/linux:
  io_uring/kbuf: return correct iovec count from classic buffer peek
  io_uring/rsrc: ensure compat iovecs are copied correctly
2024-08-31 13:51:27 +12:00
Arnd Bergmann
9cc7b17051 Merge tag 'at91-fixes-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/fixes
Microchip AT91 fixes for v6.11

It contains:
- DTS directory update to match all entries not only those starting with
  at91 or sama
2024-08-30 19:52:28 +00:00
Linus Torvalds
fb24560f31 Merge tag 'lsm-pr-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm fix from Paul Moore:
 "One small patch to correct a NFS permissions problem with SELinux and
  Smack"

* tag 'lsm-pr-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  selinux,smack: don't bypass permissions check in inode_setsecctx hook
2024-08-31 06:33:59 +12:00
Linus Torvalds
fb1a804535 Merge tag 'pm-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These fix three issues in the amd-pstate cpufreq driver.

  Specifics:

   - Remove checks for highest performance match on preferred cores when
     updating preferred core ranking in amd-pstate (Mario Limonciello)

   - Make amd-pstate call topology_logical_package_id() instead of
     logical_die_id() to get a socked ID for a CPU (Gautham Shenoy)

   - Fix uninitialized variable in amd_pstate_cpu_boost_update() (Dan
     Carpenter)"

* tag 'pm-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq/amd-pstate-ut: Don't check for highest perf matching on prefcore
  cpufreq/amd-pstate: Use topology_logical_package_id() instead of logical_die_id()
  cpufreq: amd-pstate: Fix uninitialized variable in amd_pstate_cpu_boost_update()
2024-08-31 06:25:34 +12:00
Linus Torvalds
922842a3bf Merge tag 'dmaengine-fix-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:

 - A bunch of dw driver changes to fix the src/dst addr width config

 - Omap driver fix for sglen initialization

 - stm32-dma3 driver lli_size init fix

 - dw edma driver fixes for watermark interrupts and unmasking STOP and
   ABORT interrupts

* tag 'dmaengine-fix-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: dw-edma: Do not enable watermark interrupts for HDMA
  dmaengine: dw-edma: Fix unmasking STOP and ABORT interrupts for HDMA
  dmaengine: stm32-dma3: Set lli_size after allocation
  dmaengine: ti: omap-dma: Initialize sglen after allocation
  dmaengine: dw: Unify ret-val local variables naming
  dmaengine: dw: Simplify max-burst calculation procedure
  dmaengine: dw: Define encode_maxburst() above prepare_ctllo() callbacks
  dmaengine: dw: Simplify prepare CTL_LO methods
  dmaengine: dw: Add memory bus width verification
  dmaengine: dw: Add peripheral bus width verification
2024-08-31 06:20:48 +12:00
Linus Torvalds
32fafaf2ab Merge tag 'phy-fixes-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy fixes from Vinod Koul:

 - Qualcomm QMP X1E80100 PCIe Gen4 PHY initialisation fix

 - Freescale imx8mq tuning parameter name fix

 - Samsung exynos5 fir for error code in probe()

 - Xilinx Zynqmp SGMII linkup failure fix

* tag 'phy-fixes-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: xilinx: phy-zynqmp: Fix SGMII linkup failure on resume
  phy: exynos5-usbdrd: fix error code in probe()
  phy: fsl-imx8mq-usb: fix tuning parameter name
  phy: qcom: qmp-pcie: Fix X1E80100 PCIe Gen4 PHY initialisation
2024-08-31 06:18:07 +12:00
Linus Torvalds
8d80c9903e Merge tag 'soundwire-6.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fix from Vinod Koul:

 - Single fix for non-continous port map programming

* tag 'soundwire-6.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  soundwire: stream: fix programming slave ports for non-continous port maps
2024-08-31 06:15:02 +12:00
Linus Torvalds
13c6bba601 Merge tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:

 - Fix a device-stall problem in bad io-page-fault setups (faults
   received from devices with no supporting domain attached).

 - Context flush fix for Intel VT-d.

 - Do not allow non-read+non-write mapping through iommufd as most
   implementations can not handle that.

 - Fix a possible infinite-loop issue in map_pages() path.

 - Add Jean-Philippe as reviewer for SMMUv3 SVA support

* tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  MAINTAINERS: Add Jean-Philippe as SMMUv3 SVA reviewer
  iommu: Do not return 0 from map_pages if it doesn't do anything
  iommufd: Do not allow creating areas without READ or WRITE
  iommu/vt-d: Fix incorrect domain ID in context flush helper
  iommu: Handle iommu faults for a bad iopf setup
2024-08-31 06:11:34 +12:00
Frank Li
150b572a7c MAINTAINERS: PCI: Add NXP PCI controller mailing list imx@lists.linux.dev
Add imx mailing list imx@lists.linux.dev for PCI controller of NXP chips
(Layerscape and iMX).

Link: https://lore.kernel.org/r/20240826202740.970015-1-Frank.Li@nxp.com
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Richard Zhu <hongxing.zhu@nxp.com>
2024-08-30 13:07:21 -05:00