Commit Graph

1337705 Commits

Author SHA1 Message Date
Linus Torvalds
0ec0d4ecdd Merge tag 'vfs-6.15-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs iomap updates from Christian Brauner:

 - Allow the filesystem to submit the writeback bios.

    - Allow the filsystem to track completions on a per-bio bases
      instead of the entire I/O.

    - Change writeback_ops so that ->submit_bio can be done by the
      filesystem.

    - A new ANON_WRITE flag for writes that don't have a block number
      assigned to them at the iomap level leaving the filesystem to do
      that work in the submission handler.

 - Incremental iterator advance

   The folio_batch support for zero range where the filesystem provides
   a batch of folios to process that might not be logically continguous
   requires more flexibility than the current offset based iteration
   currently offers.

   Update all iomap operations to advance the iterator within the
   operation and thus remove the need to advance from the core iomap
   iterator.

 - Make buffered writes work with RWF_DONTCACHE

   If RWF_DONTCACHE is set for a write, mark the folios being written as
   uncached. On writeback completion the pages will be dropped.

 - Introduce infrastructure for large atomic writes

   This will eventually be used by xfs and ext4.

* tag 'vfs-6.15-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits)
  iomap: rework IOMAP atomic flags
  iomap: comment on atomic write checks in iomap_dio_bio_iter()
  iomap: inline iomap_dio_bio_opflags()
  iomap: fix inline data on buffered read
  iomap: Lift blocksize restriction on atomic writes
  iomap: Support SW-based atomic writes
  iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW
  xfs: flag as supporting FOP_DONTCACHE
  iomap: make buffered writes work with RWF_DONTCACHE
  iomap: introduce a full map advance helper
  iomap: rename iomap_iter processed field to status
  iomap: remove unnecessary advance from iomap_iter()
  dax: advance the iomap_iter on pte and pmd faults
  dax: advance the iomap_iter on dedupe range
  dax: advance the iomap_iter on unshare range
  dax: advance the iomap_iter on zero range
  dax: push advance down into dax_iomap_iter() for read and write
  dax: advance the iomap_iter in the read/write path
  iomap: convert misc simple ops to incremental advance
  iomap: advance the iter on direct I/O
  ...
2025-03-24 10:19:31 -07:00
Linus Torvalds
df00ded23a Merge tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs pidfs updates from Christian Brauner:

 - Allow retrieving exit information after a process has been reaped
   through pidfds via the new PIDFD_INTO_EXIT extension for the
   PIDFD_GET_INFO ioctl. Various tools need access to information about
   a process/task even after it has already been reaped.

   Pidfd polling allows waiting on either task exit or for a task to
   have been reaped. The contract for PIDFD_INFO_EXIT is simply that
   EPOLLHUP must be observed before exit information can be retrieved,
   i.e., exit information is only provided once the task has been reaped
   and then can be retrieved as long as the pidfd is open.

 - Add PIDFD_SELF_{THREAD,THREAD_GROUP} sentinels allowing userspace to
   forgo allocating a file descriptor for their own process. This is
   useful in scenarios where users want to act on their own process
   through pidfds and is akin to AT_FDCWD.

 - Improve premature thread-group leader and subthread exec behavior
   when polling on pidfds:

   (1) During a multi-threaded exec by a subthread, i.e.,
       non-thread-group leader thread, all other threads in the
       thread-group including the thread-group leader are killed and the
       struct pid of the thread-group leader will be taken over by the
       subthread that called exec. IOW, two tasks change their TIDs.

   (2) A premature thread-group leader exit means that the thread-group
       leader exited before all of the other subthreads in the
       thread-group have exited.

   Both cases lead to inconsistencies for pidfd polling with
   PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the
   current thread-group leader may or may not see an exit notification
   on the file descriptor depending on when poll is performed. If the
   poll is performed before the exec of the subthread has concluded an
   exit notification is generated for the old thread-group leader. If
   the poll is performed after the exec of the subthread has concluded
   no exit notification is generated for the old thread-group leader.

   The correct behavior is to simply not generate an exit notification
   on the struct pid of a subhthread exec because the struct pid is
   taken over by the subthread and thus remains alive.

   But this is difficult to handle because a thread-group may exit
   premature as mentioned in (2). In that case an exit notification is
   reliably generated but the subthreads may continue to run for an
   indeterminate amount of time and thus also may exec at some point.

   After this pull no exit notifications will be generated for a
   PIDFD_THREAD pidfd for a thread-group leader until all subthreads
   have been reaped. If a subthread should exec before no exit
   notification will be generated until that task exits or it creates
   subthreads and repeates the cycle.

   This means an exit notification indicates the ability for the father
   to reap the child.

* tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  selftests/pidfd: third test for multi-threaded exec polling
  selftests/pidfd: second test for multi-threaded exec polling
  selftests/pidfd: first test for multi-threaded exec polling
  pidfs: improve multi-threaded exec and premature thread-group leader exit polling
  pidfs: ensure that PIDFS_INFO_EXIT is available
  selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest
  selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add third PIDFD_INFO_EXIT selftest
  selftests/pidfd: add second PIDFD_INFO_EXIT selftest
  selftests/pidfd: add first PIDFD_INFO_EXIT selftest
  selftests/pidfd: expand common pidfd header
  pidfs/selftests: ensure correct headers for ioctl handling
  selftests/pidfd: fix header inclusion
  pidfs: allow to retrieve exit information
  pidfs: record exit code and cgroupid at exit
  pidfs: use private inode slab cache
  pidfs: move setting flags into pidfs_alloc_file()
  pidfd: rely on automatic cleanup in __pidfd_prepare()
  ...
2025-03-24 10:16:37 -07:00
Linus Torvalds
71ee2fde57 Merge tag 'vfs-6.15-rc1.pipe' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs pipe updates from Christian Brauner:

 - Introduce struct file_operations pipeanon_fops

 - Don't update {a,c,m}time for anonymous pipes to avoid the performance
   costs associated with it

 - Change pipe_write() to never add a zero-sized buffer

 - Limit the slots in pipe_resize_ring()

 - Use pipe_buf() to retrieve the pipe buffer everywhere

 - Drop an always true check in anon_pipe_write()

 - Cache 2 pages instead of 1

 - Avoid spurious calls to prepare_to_wait_event() in ___wait_event()

* tag 'vfs-6.15-rc1.pipe' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs/splice: Use pipe_buf() helper to retrieve pipe buffer
  fs/pipe: Use pipe_buf() helper to retrieve pipe buffer
  kernel/watch_queue: Use pipe_buf() to retrieve the pipe buffer
  fs/pipe: Limit the slots in pipe_resize_ring()
  wait: avoid spurious calls to prepare_to_wait_event() in ___wait_event()
  pipe: cache 2 pages instead of 1
  pipe: drop an always true check in anon_pipe_write()
  pipe: change pipe_write() to never add a zero-sized buffer
  pipe: don't update {a,c,m}time for anonymous pipes
  pipe: introduce struct file_operations pipeanon_fops
2025-03-24 09:52:37 -07:00
Linus Torvalds
fd101da676 Merge tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:

 - Mount notifications

   The day has come where we finally provide a new api to listen for
   mount topology changes outside of /proc/<pid>/mountinfo. A mount
   namespace file descriptor can be supplied and registered with
   fanotify to listen for mount topology changes.

   Currently notifications for mount, umount and moving mounts are
   generated. The generated notification record contains the unique
   mount id of the mount.

   The listmount() and statmount() api can be used to query detailed
   information about the mount using the received unique mount id.

   This allows userspace to figure out exactly how the mount topology
   changed without having to generating diffs of /proc/<pid>/mountinfo
   in userspace.

 - Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount
   api

 - Support detached mounts in overlayfs

   Since last cycle we support specifying overlayfs layers via file
   descriptors. However, we don't allow detached mounts which means
   userspace cannot user file descriptors received via
   open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to
   attach them to a mount namespace via move_mount() first.

   This is cumbersome and means they have to undo mounts via umount().
   Allow them to directly use detached mounts.

 - Allow to retrieve idmappings with statmount

   Currently it isn't possible to figure out what idmapping has been
   attached to an idmapped mount. Add an extension to statmount() which
   allows to read the idmapping from the mount.

 - Allow creating idmapped mounts from mounts that are already idmapped

   So far it isn't possible to allow the creation of idmapped mounts
   from already idmapped mounts as this has significant lifetime
   implications. Make the creation of idmapped mounts atomic by allow to
   pass struct mount_attr together with the open_tree_attr() system call
   allowing to solve these issues without complicating VFS lookup in any
   way.

   The system call has in general the benefit that creating a detached
   mount and applying mount attributes to it becomes an atomic operation
   for userspace.

 - Add a way to query statmount() for supported options

   Allow userspace to query which mount information can be retrieved
   through statmount().

 - Allow superblock owners to force unmount

* tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
  umount: Allow superblock owners to force umount
  selftests: add tests for mount notification
  selinux: add FILE__WATCH_MOUNTNS
  samples/vfs: fix printf format string for size_t
  fs: allow changing idmappings
  fs: add kflags member to struct mount_kattr
  fs: add open_tree_attr()
  fs: add copy_mount_setattr() helper
  fs: add vfs_open_tree() helper
  statmount: add a new supported_mask field
  samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
  selftests: add tests for using detached mount with overlayfs
  samples/vfs: check whether flag was raised
  statmount: allow to retrieve idmappings
  uidgid: add map_id_range_up()
  fs: allow detached mounts in clone_private_mount()
  selftests/overlayfs: test specifying layers as O_PATH file descriptors
  fs: support O_PATH fds with FSCONFIG_SET_FD
  vfs: add notifications for mount attach and detach
  fanotify: notify on mount attach and detach
  ...
2025-03-24 09:34:10 -07:00
Linus Torvalds
a79a09a025 Merge tag 'vfs-6.15-rc1.eventpoll' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs eventpoll updates from Christian Brauner:
 "This contains a few preparatory changes to eventpoll to allow io_uring
  to support epoll"

* tag 'vfs-6.15-rc1.eventpoll' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  eventpoll: add epoll_sendevents() helper
  eventpoll: abstract out ep_try_send_events() helper
  eventpoll: abstract out parameter sanity checking
2025-03-24 09:31:24 -07:00
Linus Torvalds
99c21beaab Merge tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
 "Features:

   - Add CONFIG_DEBUG_VFS infrastucture:
      - Catch invalid modes in open
      - Use the new debug macros in inode_set_cached_link()
      - Use debug-only asserts around fd allocation and install

   - Place f_ref to 3rd cache line in struct file to resolve false
     sharing

Cleanups:

   - Start using anon_inode_getfile_fmode() helper in various places

   - Don't take f_lock during SEEK_CUR if exclusion is guaranteed by
     f_pos_lock

   - Add unlikely() to kcmp()

   - Remove legacy ->remount_fs method from ecryptfs after port to the
     new mount api

   - Remove invalidate_inodes() in favour of evict_inodes()

   - Simplify ep_busy_loopER by removing unused argument

   - Avoid mmap sem relocks when coredumping with many missing pages

   - Inline getname()

   - Inline new_inode_pseudo() and de-staticize alloc_inode()

   - Dodge an atomic in putname if ref == 1

   - Consistently deref the files table with rcu_dereference_raw()

   - Dedup handling of struct filename init and refcounts bumps

   - Use wq_has_sleeper() in end_dir_add()

   - Drop the lock trip around I_NEW wake up in evict()

   - Load the ->i_sb pointer once in inode_sb_list_{add,del}

   - Predict not reaching the limit in alloc_empty_file()

   - Tidy up do_sys_openat2() with likely/unlikely

   - Call inode_sb_list_add() outside of inode hash lock

   - Sort out fd allocation vs dup2 race commentary

   - Turn page_offset() into a wrapper around folio_pos()

   - Remove locking in exportfs around ->get_parent() call

   - try_lookup_one_len() does not need any locks in autofs

   - Fix return type of several functions from long to int in open

   - Fix return type of several functions from long to int in ioctls

  Fixes:

   - Fix watch queue accounting mismatch"

* tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
  fs: sort out fd allocation vs dup2 race commentary, take 2
  fs: call inode_sb_list_add() outside of inode hash lock
  fs: tidy up do_sys_openat2() with likely/unlikely
  fs: predict not reaching the limit in alloc_empty_file()
  fs: load the ->i_sb pointer once in inode_sb_list_{add,del}
  fs: drop the lock trip around I_NEW wake up in evict()
  fs: use wq_has_sleeper() in end_dir_add()
  VFS/autofs: try_lookup_one_len() does not need any locks
  fs: dedup handling of struct filename init and refcounts bumps
  fs: consistently deref the files table with rcu_dereference_raw()
  exportfs: remove locking around ->get_parent() call.
  fs: use debug-only asserts around fd allocation and install
  fs: dodge an atomic in putname if ref == 1
  vfs: Remove invalidate_inodes()
  ecryptfs: remove NULL remount_fs from super_operations
  watch_queue: fix pipe accounting mismatch
  fs: place f_ref to 3rd cache line in struct file to resolve false sharing
  epoll: simplify ep_busy_loop by removing always 0 argument
  fs: Turn page_offset() into a wrapper around folio_pos()
  kcmp: improve performance adding an unlikely hint to task comparisons
  ...
2025-03-24 09:13:50 -07:00
Linus Torvalds
c4cff1ea37 Merge tag 'vfs-6.15-rc1.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount API updates from Christian Brauner:
 "This converts the remaining pseudo filesystems to the new mount api.

  The sysv conversion is a bit gratuitous because we remove sysv in
  another pull request. But if we have to revert the removal we at least
  will have it converted to the new mount api already"

* tag 'vfs-6.15-rc1.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  sysv: convert sysv to use the new mount api
  vfs: remove some unused old mount api code
  devtmpfs: replace ->mount with ->get_tree in public instance
  vfs: Convert devpts to use the new mount API
  pstore: convert to the new mount API
2025-03-24 08:49:48 -07:00
Darrick J. Wong
3514818522 MAINTAINERS: remove myself as reviewer
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-24 08:44:22 -07:00
Linus Torvalds
38fec10eb6 Linux 6.14 v6.14 2025-03-24 07:02:41 -07:00
Linus Torvalds
586de92313 Merge tag 'i2c-for-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
 "Fix double free of irq in amd-mp2 driver"

* tag 'i2c-for-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: amd-mp2: drop free_irq() of devm_request_irq() allocated irq
2025-03-22 17:33:38 -07:00
Linus Torvalds
183601b78a Merge tag 'perf-urgent-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf events fix from Ingo Molnar:
 "Fix an information leak regression in the AMD IBS PMU code"

* tag 'perf-urgent-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/amd/ibs: Prevent leaking sensitive data to userspace
2025-03-22 14:40:27 -07:00
Linus Torvalds
fcea541800 Merge tag 'keys-next-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull keys fix from Jarkko Sakkinen:
 "Fix potential use-after-free in key_put()"

* tag 'keys-next-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  keys: Fix UAF in key_put()
2025-03-22 14:10:07 -07:00
Linus Torvalds
bb18645ac1 Merge tag 'io_uring-6.14-20250322' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
 "Just a single fix for the commit that went into your tree yesterday,
  which exposed an issue with not always clearing notifications. That
  could cause them to be used more than once"

* tag 'io_uring-6.14-20250322' of git://git.kernel.dk/linux:
  io_uring/net: fix sendzc double notif flush
2025-03-22 10:45:44 -07:00
Pavel Begunkov
67c007d6c1 io_uring/net: fix sendzc double notif flush
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 5823 at lib/refcount.c:28 refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28
RIP: 0010:refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28
Call Trace:
 <TASK>
 io_notif_flush io_uring/notif.h:40 [inline]
 io_send_zc_cleanup+0x121/0x170 io_uring/net.c:1222
 io_clean_op+0x58c/0x9a0 io_uring/io_uring.c:406
 io_free_batch_list io_uring/io_uring.c:1429 [inline]
 __io_submit_flush_completions+0xc16/0xd20 io_uring/io_uring.c:1470
 io_submit_flush_completions io_uring/io_uring.h:159 [inline]

Before the blamed commit, sendzc relied on io_req_msg_cleanup() to clear
REQ_F_NEED_CLEANUP, so after the following snippet the request will
never hit the core io_uring cleanup path.

io_notif_flush();
io_req_msg_cleanup();

The easiest fix is to null the notification. io_send_zc_cleanup() can
still be called after, but it's tolerated.

Reported-by: syzbot+cf285a028ffba71b2ef5@syzkaller.appspotmail.com
Tested-by: syzbot+cf285a028ffba71b2ef5@syzkaller.appspotmail.com
Fixes: cc34d8330e ("io_uring/net: don't clear REQ_F_NEED_CLEANUP unconditionally")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e1306007458b8891c88c4f20c966a17595f766b0.1742643795.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:14:36 -06:00
David Howells
75845c6c1a keys: Fix UAF in key_put()
Once a key's reference count has been reduced to 0, the garbage collector
thread may destroy it at any time and so key_put() is not allowed to touch
the key after that point.  The most key_put() is normally allowed to do is
to touch key_gc_work as that's a static global variable.

However, in an effort to speed up the reclamation of quota, this is now
done in key_put() once the key's usage is reduced to 0 - but now the code
is looking at the key after the deadline, which is forbidden.

Fix this by using a flag to indicate that a key can be gc'd now rather than
looking at the key's refcount in the garbage collector.

Fixes: 9578e327b2 ("keys: update key quotas in key_put()")
Reported-by: syzbot+6105ffc1ded71d194d6d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/673b6aec.050a0220.87769.004a.GAE@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: syzbot+6105ffc1ded71d194d6d@syzkaller.appspotmail.com
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-03-22 15:36:49 +02:00
Namhyung Kim
50a53b60e1 perf/amd/ibs: Prevent leaking sensitive data to userspace
Although IBS "swfilt" can prevent leaking samples with kernel RIP to the
userspace, there are few subtle cases where a 'data' address and/or a
'branch target' address can fall under kernel address range although RIP
is from userspace. Prevent leaking kernel 'data' addresses by discarding
such samples when {exclude_kernel=1,swfilt=1}.

IBS can now be invoked by unprivileged user with the introduction of
"swfilt". However, this creates a loophole in the interface where an
unprivileged user can get physical address of the userspace virtual
addresses through IBS register raw dump (PERF_SAMPLE_RAW). Prevent this
as well.

This upstream commit fixed the most obvious leak:

  65a99264f5 perf/x86: Check data address for IBS software filter

Follow that up with a more complete fix.

Fixes: d29e744c71 ("perf/x86: Relax privilege filter restriction on AMD IBS")
Suggested-by: Matteo Rizzo <matteorizzo@google.com>
Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250321161251.1033-1-ravi.bangoria@amd.com
2025-03-22 08:18:24 +01:00
Linus Torvalds
88d324e69e Merge tag 'spi-fix-v6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
 "This is a straightforward fix for a reference count leak in the rarely
  used SPI device mode functionality"

* tag 'spi-fix-v6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: Fix reference count leak in slave_show()
2025-03-21 14:07:40 -07:00
Linus Torvalds
21d1ccf0e9 Merge tag 'regulator-fix-v6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
 "More fixes than I'd like at this point, some of which is due to me
  cooking things in -next for a bit and resetting that cooking time as
  more fixes came in.

   - Christian Eggers fixed some race conditions with the dummy
     regulator not being available very early in boot due to the use of
     asynchronous probing, both the provider side (ensuring that it's
     availalbe) and consumer side (handling things if that goes wrong)
     are fixed

   - Ludvig Pärsson fixed some lockdep issues with the debugfs
     registration for regulators holding more locks than it really needs
     causing issues later when looking at the resulting debugfs.boot

   - Some device specific fixes for incorrect descriptions of the
     RTQ2208 from ChiYuan Huang"

* tag 'regulator-fix-v6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: rtq2208: Fix the LDO DVS capability
  regulator: rtq2208: Fix incorrect buck converter phase mapping
  regulator: check that dummy regulator has been probed before using it
  regulator: dummy: force synchronous probing
  regulator: core: Fix deadlock in create_regulator()
2025-03-21 13:42:55 -07:00
Linus Torvalds
3e49db00df Merge tag 'pinctrl-v6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fix from Linus Walleij:

 - A single patch for Spacemit K1 fixing up the Kconfig to not default
   to "y"

* tag 'pinctrl-v6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: spacemit: PINCTRL_SPACEMIT_K1 should not default to y unconditionally
2025-03-21 13:02:28 -07:00
Linus Torvalds
d07de43e3f Merge tag 'io_uring-6.14-20250321' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
 "Single fix heading to stable, fixing an issue with io_req_msg_cleanup()
  sometimes too eagerly clearing cleanup flags"

* tag 'io_uring-6.14-20250321' of git://git.kernel.dk/linux:
  io_uring/net: don't clear REQ_F_NEED_CLEANUP unconditionally
2025-03-21 10:30:15 -07:00
Linus Torvalds
5c7474b544 Merge tag 'perf-urgent-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf events fixes from Ingo Molnar:
 "Two fixes: an RAPL PMU driver error handling fix, and an AMD IBS
  software filter fix"

* tag 'perf-urgent-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/rapl: Fix error handling in init_rapl_pmus()
  perf/x86: Check data address for IBS software filter
2025-03-21 08:52:31 -07:00
Linus Torvalds
cb90c8df91 Merge tag 'sched-urgent-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "Revert a scheduler performance optimization that regressed other
  workloads"

* tag 'sched-urgent-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "sched/core: Reduce cost of sched_move_task when config autogroup"
2025-03-21 08:48:40 -07:00
Wolfram Sang
807d47a6dc Merge tag 'i2c-host-fixes-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.14-rc8

amd-mp2: fix double free of irq.
2025-03-21 16:18:59 +01:00
Linus Torvalds
b3ee1e4609 Merge tag 'drm-fixes-2025-03-21' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "Just the usual spread of a bunch for amdgpu, and small changes to
  others.

  scheduler:
   - fix fence reference leak

  xe:
   - Fix for an error if exporting a dma-buf multiple time

  amdgpu:
   - Fix video caps limits on several asics
   - SMU 14.x fixes
   - GC 12 fixes
   - eDP fixes
   - DMUB fix

  amdkfd:
   - GC 12 trap handler fix
   - GC 7/8 queue validation fix

  radeon:
   - VCE IB parsing fix

  v3d:
   - fix job error handling bugs

  qaic:
   - fix two integer overflows

  host1x:
   - fix NULL domain handling"

* tag 'drm-fixes-2025-03-21' of https://gitlab.freedesktop.org/drm/kernel: (21 commits)
  drm/xe: Fix exporting xe buffers multiple times
  gpu: host1x: Do not assume that a NULL domain means no DMA IOMMU
  drm/amdgpu/pm: Handle SCLK offset correctly in overdrive for smu 14.0.2
  drm/amd/display: Fix incorrect fw_state address in dmub_srv
  drm/amd/display: Use HW lock mgr for PSR1 when only one eDP
  drm/amd/display: Fix message for support_edp0_on_dp1
  drm/amdkfd: Fix user queue validation on Gfx7/8
  drm/amdgpu: Restore uncached behaviour on GFX12
  drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini()
  drm/amdkfd: Fix instruction hazard in gfx12 trap handler
  drm/amdgpu/pm: wire up hwmon fan speed for smu 14.0.2
  drm/amd/pm: add unique_id for gfx12
  drm/amdgpu: Remove JPEG from vega and carrizo video caps
  drm/amdgpu: Fix JPEG video caps max size for navi1x and raven
  drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max size
  drm/radeon: fix uninitialized size issue in radeon_vce_cs_parse()
  accel/qaic: Fix integer overflow in qaic_validate_req()
  accel/qaic: Fix possible data corruption in BOs > 2G
  drm/v3d: Set job pointer to NULL when the job's fence has an error
  drm/v3d: Don't run jobs that have errors flagged in its fence
  ...
2025-03-20 21:29:58 -07:00
Linus Torvalds
a7ea35b61e Merge tag 'v6.14-rc7-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fix from Steve French:
 "smb3 client reconnect fix"

* tag 'v6.14-rc7-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: don't retry IO on failed negprotos with soft mounts
2025-03-20 20:50:45 -07:00
Dave Airlie
41e09ef6c2 Merge tag 'amd-drm-fixes-6.14-2025-03-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.14-2025-03-20:

amdgpu:
- Fix video caps limits on several asics
- SMU 14.x fixes
- GC 12 fixes
- eDP fixes
- DMUB fix

amdkfd:
- GC 12 trap handler fix
- GC 7/8 queue validation fix

radeon:
- VCE IB parsing fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250320210800.1358992-1-alexander.deucher@amd.com
2025-03-21 11:59:49 +10:00
Dave Airlie
5854df5017 Merge tag 'drm-xe-fixes-2025-03-20' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix for an error if exporting a dma-buf multiple time (Tomasz)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z9xalLaCWsNbh0P0@fedora
2025-03-21 11:31:40 +10:00
Dave Airlie
d2738724e4 Merge tag 'drm-misc-fixes-2025-03-20' of ssh://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A sched fence reference leak fix, two fence fixes for v3d, two overflow
fixes for quaic, and a iommu handling fix for host1x.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250320-valiant-outstanding-nightingale-e9acae@houat
2025-03-21 10:41:51 +10:00
Linus Torvalds
a1cffe8cc8 Merge tag 'dma-mapping-6.14-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fix from Marek Szyprowski:

 - fix missing clear bdr in check_ram_in_range_map() (Baochen Qiang)

* tag 'dma-mapping-6.14-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma-mapping: fix missing clear bdr in check_ram_in_range_map()
2025-03-20 16:55:24 -07:00
Linus Torvalds
b5329d5a35 Merge tag 'vfs-6.14-final.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
 "A final set of fixes for this cycle:

  VFS:

   - Ensure that the stable offset api doesn't return duplicate
     directory entries when userspace has to perform the getdents call
     multiple times on large directories

  afs:

   - Prevent invalid pointer dereference during get_link RCU pathwalk

  fuse:

   - Fix deadlock caused by uninitialized rings when using io_uring with
     fuse

   - Handle race condition when using io_uring with fuse to prevent NULL
     dereference

  libnetfs:

   - Ensure that invalidate_cache is only called if implemented

   - Fix collection of results during pause when collection is
     offloaded

   - Ensure rolling_buffer_load_from_ra() doesn't clear mark bits

   - Make netfs_unbuffered_read() return ssize_t rather than int"

* tag 'vfs-6.14-final.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  libfs: Fix duplicate directory entry in offset_dir_lookup
  fuse: fix possible deadlock if rings are never initialized
  netfs: Fix netfs_unbuffered_read() to return ssize_t rather than int
  netfs: Fix rolling_buffer_load_from_ra() to not clear mark bits
  netfs: Call `invalidate_cache` only if implemented
  netfs: Fix collection of results during pause when collection offloaded
  fuse: fix uring race condition for null dereference of fc
  afs: Fix afs_atcell_get_link() to check if ws_cell is unset first
2025-03-20 14:13:50 -07:00
Dhananjay Ugwekar
7e512f5ad2 perf/x86/rapl: Fix error handling in init_rapl_pmus()
If init_rapl_pmu() fails while allocating memory for "rapl_pmu" objects,
we miss freeing the "rapl_pmus" object in the error path. Fix that.

Fixes: 9b99d65c0b ("perf/x86/rapl: Move the pmu allocation out of CPU hotplug")
Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250320100617.4480-1-dhananjay.ugwekar@amd.com
2025-03-20 21:03:55 +01:00
Linus Torvalds
f45f8f0ed4 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fix from Paolo Bonzini:
 "A lone fix for a s390 regression. An earlier 6.14 commit stopped
  taking the pte lock for pages that are being converted to secure, but
  it was needed to avoid races.

  The patch was in development for a while and is finally ready, but I
  wish it was split into 3-4 commits at least"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: s390: pv: fix race when making a page secure
2025-03-20 11:34:30 -07:00
Jens Axboe
cc34d8330e io_uring/net: don't clear REQ_F_NEED_CLEANUP unconditionally
io_req_msg_cleanup() relies on the fact that io_netmsg_recycle() will
always fully recycle, but that may not be the case if the msg cache
was already full. To ensure that normal cleanup always gets run,
let io_netmsg_recycle() deal with clearing the relevant cleanup flags,
as it knows exactly when that should be done.

Cc: stable@vger.kernel.org
Reported-by: David Wei <dw@davidwei.uk>
Fixes: 7519134178 ("io_uring/net: add iovec recycling")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-20 12:27:27 -06:00
Tomasz Rusinowicz
50af7cab75 drm/xe: Fix exporting xe buffers multiple times
The `struct ttm_resource->placement` contains TTM_PL_FLAG_* flags, but
it was incorrectly tested for XE_PL_* flags.
This caused xe_dma_buf_pin() to always fail when invoked for
the second time. Fix this by checking the `mem_type` field instead.

Fixes: 7764222d54 ("drm/xe: Disallow pinning dma-bufs in VRAM")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: intel-xe@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218100353.2137964-1-jacek.lawrynowicz@linux.intel.com
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
(cherry picked from commit b96dabdba9)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-03-20 17:59:49 +01:00
Linus Torvalds
5fc3193608 Merge tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from can, bluetooth and ipsec.

  This contains a last minute revert of a recent GRE patch, mostly to
  allow me stating there are no known regressions outstanding.

  Current release - regressions:

   - revert "gre: Fix IPv6 link-local address generation."

   - eth: ti: am65-cpsw: fix NAPI registration sequence

  Previous releases - regressions:

   - ipv6: fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().

   - mptcp: fix data stream corruption in the address announcement

   - bluetooth: fix connection regression between LE and non-LE adapters

   - can:
       - flexcan: only change CAN state when link up in system PM
       - ucan: fix out of bound read in strscpy() source

  Previous releases - always broken:

   - lwtunnel: fix reentry loops

   - ipv6: fix TCP GSO segmentation with NAT

   - xfrm: force software GSO only in tunnel mode

   - eth: ti: icssg-prueth: add lock to stats

  Misc:

   - add Andrea Mayer as a maintainer of SRv6"

* tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
  MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6
  Revert "gre: Fix IPv6 link-local address generation."
  Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."
  net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES
  tools headers: Sync uapi/asm-generic/socket.h with the kernel sources
  mptcp: Fix data stream corruption in the address announcement
  selftests: net: test for lwtunnel dst ref loops
  net: ipv6: ioam6: fix lwtunnel_output() loop
  net: lwtunnel: fix recursion loops
  net: ti: icssg-prueth: Add lock to stats
  net: atm: fix use after free in lec_send()
  xsk: fix an integer overflow in xp_create_and_assign_umem()
  net: stmmac: dwc-qos-eth: use devm_kzalloc() for AXI data
  selftests: drv-net: use defer in the ping test
  phy: fix xa_alloc_cyclic() error handling
  dpll: fix xa_alloc_cyclic() error handling
  devlink: fix xa_alloc_cyclic() error handling
  ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create().
  ipv6: Fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().
  net: ipv6: fix TCP GSO segmentation with NAT
  ...
2025-03-20 09:39:15 -07:00
Linus Torvalds
80c4c25460 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
 "Collected driver fixes from the last few weeks, I was surprised how
  significant many of them seemed to be.

   - Fix rdma-core test failures due to wrong startup ordering in rxe

   - Don't crash in bnxt_re if the FW supports more than 64k QPs

   - Fix wrong QP table indexing math in bnxt_re

   - Calculate the max SRQs for userspace properly in bnxt_re

   - Don't try to do math on errno for mlx5's rate calculation

   - Properly allow userspace to control the VLAN in the QP state during
     INIT->RTR for bnxt_re

   - 6 bug fixes for HNS:
      - Soft lockup when processing huge MRs, add a cond_resched()
      - Fix missed error unwind for doorbell allocation
      - Prevent bad send queue parameters from userspace
      - Wrong error unwind in qp creation
      - Missed xa_destroy during driver shutdown
      - Fix reporting to userspace of max_sge_rd, hns doesn't have a
        read/write difference"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/hns: Fix wrong value of max_sge_rd
  RDMA/hns: Fix missing xa_destroy()
  RDMA/hns: Fix a missing rollback in error path of hns_roce_create_qp_common()
  RDMA/hns: Fix invalid sq params not being blocked
  RDMA/hns: Fix unmatched condition in error path of alloc_user_qp_db()
  RDMA/hns: Fix soft lockup during bt pages loop
  RDMA/bnxt_re: Avoid clearing VLAN_ID mask in modify qp path
  RDMA/mlx5: Handle errors returned from mlx5r_ib_rate()
  RDMA/bnxt_re: Fix reporting maximum SRQs on P7 chips
  RDMA/bnxt_re: Add missing paranthesis in map_qp_id_to_tbl_indx
  RDMA/bnxt_re: Fix allocation of QP table
  RDMA/rxe: Fix the failure of ibv_query_device() and ibv_query_device_ex() tests
2025-03-20 09:25:25 -07:00
Linus Torvalds
05b880b1dc Merge tag 'mmc-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:

 - sdhci-brcmstb: Fix CQE suspend/resume support

 - atmel-mci: Add a missing clk_disable_unprepare() in ->probe()

* tag 'mmc-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-brcmstb: add cqhci suspend/resume to PM ops
  mmc: atmel-mci: Add missing clk_disable_unprepare()
2025-03-20 09:22:11 -07:00
Linus Torvalds
a4f586a9fc Merge tag 'efi-fixes-for-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
 "Here's a final batch of EFI fixes for v6.14.

  The efivarfs ones are fixes for changes that were made this cycle.
  James's fix is somewhat of a band-aid, but it was blessed by the VFS
  folks, who are working with James to come up with something better for
  the next cycle.

   - Avoid physical address 0x0 for random page allocations

   - Add correct lockdep annotation when traversing efivarfs on resume

   - Avoid NULL mount in kernel_file_open() when traversing efivarfs on
     resume"

* tag 'efi-fixes-for-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efivarfs: fix NULL dereference on resume
  efivarfs: use I_MUTEX_CHILD nested lock to traverse variables on resume
  efi/libstub: Avoid physical address 0x0 when doing random allocation
2025-03-20 09:18:38 -07:00
David Ahern
feaee98c6c MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6
Andrea has made significant contributions to SRv6 support in Linux.
Acknowledge the work and on-going interest in Srv6 support with a
maintainers entry for these files so hopefully he is included
on patches going forward.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250312092212.46299-1-dsahern@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 15:49:00 +01:00
Paolo Abeni
8417db0be5 Merge branch 'gre-revert-ipv6-link-local-address-fix'
Guillaume Nault says:

====================
gre: Revert IPv6 link-local address fix.

Following Paolo's suggestion, let's revert the IPv6 link-local address
generation fix for GRE devices. The patch introduced regressions in the
upstream CI, which are still under investigation.

Start by reverting the kselftest that depend on that fix (patch 1), then
revert the kernel code itself (patch 2).
====================

Link: https://patch.msgid.link/cover.1742418408.git.gnault@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 15:46:20 +01:00
Guillaume Nault
fc486c2d06 Revert "gre: Fix IPv6 link-local address generation."
This reverts commit 183185a18f.

This patch broke net/forwarding/ip6gre_custom_multipath_hash.sh in some
circumstances (https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/).
Let's revert it while the problem is being investigated.

Fixes: 183185a18f ("gre: Fix IPv6 link-local address generation.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/8b1ce738eb15dd841aab9ef888640cab4f6ccfea.1742418408.git.gnault@redhat.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 15:46:16 +01:00
Guillaume Nault
355d940f4d Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."
This reverts commit 6f50175cca.

Commit 183185a18f ("gre: Fix IPv6 link-local address generation.") is
going to be reverted. So let's revert the corresponding kselftest
first.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/259a9e98f7f1be7ce02b53d0b4afb7c18a8ff747.1742418408.git.gnault@redhat.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 15:46:16 +01:00
Paolo Abeni
84761651dd Merge tag 'ipsec-2025-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2025-03-19

1) Fix tunnel mode TX datapath in packet offload mode
   by directly putting it to the xmit path.
   From Alexandre Cassen.

2) Force software GSO only in tunnel mode in favor
   of potential HW GSO. From Cosmin Ratiu.

ipsec-2025-03-19

* tag 'ipsec-2025-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm_output: Force software GSO only in tunnel mode
  xfrm: fix tunnel mode TX datapath in packet offload mode
====================

Link: https://patch.msgid.link/20250319065513.987135-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 15:39:05 +01:00
Christian Brauner
d40dc30c7b Merge patch series "pidfs: handle multi-threaded exec and premature thread-group leader exit"
Christian Brauner <brauner@kernel.org> says:

This is another attempt at trying to make pidfd polling for
multi-threaded exec and premature thread-group leader exit consistent.

A quick recap of these two cases:

(1) During a multi-threaded exec by a subthread, i.e., non-thread-group
    leader thread, all other threads in the thread-group including the
    thread-group leader are killed and the struct pid of the
    thread-group leader will be taken over by the subthread that called
    exec. IOW, two tasks change their TIDs.

(2) A premature thread-group leader exit means that the thread-group
    leader exited before all of the other subthreads in the thread-group
    have exited.

Both cases lead to inconsistencies for pidfd polling with PIDFD_THREAD.
Any caller that holds a PIDFD_THREAD pidfd to the current thread-group
leader may or may not see an exit notification on the file descriptor
depending on when poll is performed. If the poll is performed before the
exec of the subthread has concluded an exit notification is generated
for the old thread-group leader. If the poll is performed after the exec
of the subthread has concluded no exit notification is generated for the
old thread-group leader.

The correct behavior would be to simply not generate an exit
notification on the struct pid of a subhthread exec because the struct
pid is taken over by the subthread and thus remains alive.

But this is difficult to handle because a thread-group may exit
premature as mentioned in (2). In that case an exit notification is
reliably generated but the subthreads may continue to run for an
indeterminate amount of time and thus also may exec at some point.

This tiny series tries to address this problem. If that works correctly
then no exit notifications are generated for a PIDFD_THREAD pidfd for a
thread-group leader until all subthreads have been reaped. If a
subthread should exec before no exit notification will be generated
until that task exits or it creates subthreads and repeates the cycle.

* patches from https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-0-da678ce805bf@kernel.org:
  selftests/pidfd: third test for multi-threaded exec polling
  selftests/pidfd: second test for multi-threaded exec polling
  selftests/pidfd: first test for multi-threaded exec polling
  pidfs: improve multi-threaded exec and premature thread-group leader exit polling

Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-0-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-20 15:32:51 +01:00
Christian Brauner
4d5483a42c selftests/pidfd: third test for multi-threaded exec polling
Ensure that during a multi-threaded exec and premature thread-group
leader exit no exit notification is generated.

Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-4-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-20 15:32:43 +01:00
Christian Brauner
9b6f723db5 selftests/pidfd: second test for multi-threaded exec polling
Ensure that during a multi-threaded exec and premature thread-group
leader exit no exit notification is generated.

Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-3-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-20 15:32:43 +01:00
Christian Brauner
db7ce91e22 selftests/pidfd: first test for multi-threaded exec polling
Add first test for premature thread-group leader exit.

Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-2-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-20 15:32:43 +01:00
Christian Brauner
0fb482728b pidfs: improve multi-threaded exec and premature thread-group leader exit polling
This is another attempt trying to make pidfd polling for multi-threaded
exec and premature thread-group leader exit consistent.

A quick recap of these two cases:

(1) During a multi-threaded exec by a subthread, i.e., non-thread-group
    leader thread, all other threads in the thread-group including the
    thread-group leader are killed and the struct pid of the
    thread-group leader will be taken over by the subthread that called
    exec. IOW, two tasks change their TIDs.

(2) A premature thread-group leader exit means that the thread-group
    leader exited before all of the other subthreads in the thread-group
    have exited.

Both cases lead to inconsistencies for pidfd polling with PIDFD_THREAD.
Any caller that holds a PIDFD_THREAD pidfd to the current thread-group
leader may or may not see an exit notification on the file descriptor
depending on when poll is performed. If the poll is performed before the
exec of the subthread has concluded an exit notification is generated
for the old thread-group leader. If the poll is performed after the exec
of the subthread has concluded no exit notification is generated for the
old thread-group leader.

The correct behavior would be to simply not generate an exit
notification on the struct pid of a subhthread exec because the struct
pid is taken over by the subthread and thus remains alive.

But this is difficult to handle because a thread-group may exit
prematurely as mentioned in (2). In that case an exit notification is
reliably generated but the subthreads may continue to run for an
indeterminate amount of time and thus also may exec at some point.

So far there was no way to distinguish between (1) and (2) internally.
This tiny series tries to address this problem by discarding
PIDFD_THREAD notification on premature thread-group leader exit.

If that works correctly then no exit notifications are generated for a
PIDFD_THREAD pidfd for a thread-group leader until all subthreads have
been reaped. If a subthread should exec aftewards no exit notification
will be generated until that task exits or it creates subthreads and
repeates the cycle.

Co-Developed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-1-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-20 15:32:43 +01:00
Paolo Abeni
2fdf0880ca Merge tag 'batadv-net-pullrequest-20250318' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:

====================
Here is batman-adv bugfix:

- Ignore own maximum aggregation size during RX, Sven Eckelmann

* tag 'batadv-net-pullrequest-20250318' of git://git.open-mesh.org/linux-merge:
  batman-adv: Ignore own maximum aggregation size during RX
====================

Link: https://patch.msgid.link/20250318150035.35356-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 15:30:00 +01:00
Lin Ma
90a7138619 net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES
Previous commit 8b5c171bb3 ("neigh: new unresolved queue limits")
introduces new netlink attribute NDTPA_QUEUE_LENBYTES to represent
approximative value for deprecated QUEUE_LEN. However, it forgot to add
the associated nla_policy in nl_ntbl_parm_policy array. Fix it with one
simple NLA_U32 type policy.

Fixes: 8b5c171bb3 ("neigh: new unresolved queue limits")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://patch.msgid.link/20250315165113.37600-1-linma@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 15:23:29 +01:00