Commit Graph

15721 Commits

Author SHA1 Message Date
Linus Torvalds
d10a88ce16 Merge tag 'nilfs2-v7.0-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2
Pull nilfs2 updates from Viacheslav Dubeyko:

 - Fix potential block overflow that cause system hang

   When executing the FITRIM command, an underflow can occur in the
   calculation of nblocks. This ultimately leads to the block layer
   function __blkdev_issue_discard() taking an excessively long time
   to process the bio chain, and the ns_segctor_sem lock remains held
   for a long period.

   This prevents other tasks from acquiring the ns_segctor_sem lock,
   resulting in a hang reported by syzbot (Edward Adam Davis)

 - Fix missing struct keywords in nilfs2_api.h kernel-doc (Ryusuke
   Konishi)

 - Convert nilfs_super_block to kernel-doc

   Eliminate 40+ kernel-doc warnings in nilfs2_ondisk.h by converting
   all of the struct member comments to kernel-doc comments (Randy
   Dunlap)

* tag 'nilfs2-v7.0-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2:
  nilfs2: fix missing struct keywords in nilfs2_api.h kernel-doc
  nilfs2: convert nilfs_super_block to kernel-doc
  nilfs2: Fix potential block overflow that cause system hang
2026-02-09 15:55:41 -08:00
Linus Torvalds
8912c2fd58 Merge tag 'for-6.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
 "User visible changes, feature updates:

   - when using block size > page size, enable direct IO

   - fallback to buffered IO if the data profile has duplication,
     workaround to avoid checksum mismatches on block group profiles
     with redundancy, real direct IO is possible on single or RAID0

   - redo export of zoned statistics, moved from sysfs to
     /proc/pid/mountstats due to size limitations of the former

  Experimental features:

   - remove offload checksum tunable, intended to find best way to do it
     but since we've switched to offload to thread for everything we
     don't need it anymore

   - initial support for remap-tree feature, a translation layer of
     logical block addresses that allow changes without moving/rewriting
     blocks to do eg. relocation, or other changes that require COW

  Notable fixes:

   - automatic removal of accidentally leftover chunks when
     free-space-tree is enabled since mkfs.btrfs v6.16.1

   - zoned mode:
      - do not try to append to conventional zones when RAID is mixing
        zoned and conventional drives
      - fixup write pointers when mixing zoned and conventional on
        DUP/RAID* profiles

   - when using squota, relax deletion rules for qgroups with 0 members
     to allow easier recovery from accounting bugs, also add more checks
     to detect bad accounting

   - fix periodic reclaim scanning, properly check boundary conditions
     not to trigger it unexpectedly or miss the time to run it

   - trim:
      - continue after first error
      - change reporting to the first detected error
      - add more cancellation points
      - reduce contention of big device lock that can block other
        operations when there's lots of trimmed space

   - when chunk allocation is forced (needs experimental build) fix
     transaction abort when unexpected space layout is detected

  Core:

   - switch to crypto library API for checksumming, removed module
     dependencies, pointer indirections, etc.

   - error handling improvements

   - adjust how and where transaction commit or abort are done and are
     maybe not necessary

   - minor compression optimization to skip single block ranges

   - improve how compression folios are handled

   - new and updated selftests

   - cleanups, refactoring:
      - auto-freeing and other automatic variable cleanup conversion
      - structure size optimizations
      - condition annotations"

* tag 'for-6.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (137 commits)
  btrfs: get rid of compressed_bio::compressed_folios[]
  btrfs: get rid of compressed_folios[] usage for encoded writes
  btrfs: get rid of compressed_folios[] usage for compressed read
  btrfs: remove the old btrfs_compress_folios() infrastructure
  btrfs: switch to btrfs_compress_bio() interface for compressed writes
  btrfs: introduce btrfs_compress_bio() helper
  btrfs: zlib: introduce zlib_compress_bio() helper
  btrfs: zstd: introduce zstd_compress_bio() helper
  btrfs: lzo: introduce lzo_compress_bio() helper
  btrfs: zoned: factor out the zone loading part into a testable function
  btrfs: add cleanup function for btrfs_free_chunk_map
  btrfs: tests: add cleanup functions for test specific functions
  btrfs: raid56: fix memory leak of btrfs_raid_bio::stripe_uptodate_bitmap
  btrfs: tests: add unit tests for pending extent walking functions
  btrfs: fix EEXIST abort due to non-consecutive gaps in chunk allocation
  btrfs: fix transaction commit blocking during trim of unallocated space
  btrfs: handle user interrupt properly in btrfs_trim_fs()
  btrfs: preserve first error in btrfs_trim_fs()
  btrfs: continue trimming remaining devices on failure
  btrfs: do not BUG_ON() in btrfs_remove_block_group()
  ...
2026-02-09 15:45:21 -08:00
Linus Torvalds
157d3d6efd Merge tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:

 - statmount: accept fd as a parameter

   Extend struct mnt_id_req with a file descriptor field and a new
   STATMOUNT_BY_FD flag. When set, statmount() returns mount information
   for the mount the fd resides on — including detached mounts
   (unmounted via umount2(MNT_DETACH)).

   For detached mounts the STATMOUNT_MNT_POINT and STATMOUNT_MNT_NS_ID
   mask bits are cleared since neither is meaningful. The capability
   check is skipped for STATMOUNT_BY_FD since holding an fd already
   implies prior access to the mount and equivalent information is
   available through fstatfs() and /proc/pid/mountinfo without
   privilege. Includes comprehensive selftests covering both attached
   and detached mount cases.

 - fs: Remove internal old mount API code (1 patch)

   Now that every in-tree filesystem has been converted to the new
   mount API, remove all the legacy shim code in fs_context.c that
   handled unconverted filesystems. This deletes ~280 lines including
   legacy_init_fs_context(), the legacy_fs_context struct, and
   associated wrappers. The mount(2) syscall path for userspace remains
   untouched. Documentation references to the legacy callbacks are
   cleaned up.

 - mount: add OPEN_TREE_NAMESPACE to open_tree()

   Container runtimes currently use CLONE_NEWNS to copy the caller's
   entire mount namespace — only to then pivot_root() and recursively
   unmount everything they just copied. With large mount tables and
   thousands of parallel container launches this creates significant
   contention on the namespace semaphore.

   OPEN_TREE_NAMESPACE copies only the specified mount tree (like
   OPEN_TREE_CLONE) but returns a mount namespace fd instead of a
   detached mount fd. The new namespace contains the copied tree mounted
   on top of a clone of the real rootfs.

   This functions as a combined unshare(CLONE_NEWNS) + pivot_root() in a
   single syscall. Works with user namespaces: an unshare(CLONE_NEWUSER)
   followed by OPEN_TREE_NAMESPACE creates a mount namespace owned by
   the new user namespace. Mount namespace file mounts are excluded from
   the copy to prevent cycles. Includes ~1000 lines of selftests"

* tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/open_tree: add OPEN_TREE_NAMESPACE tests
  mount: add OPEN_TREE_NAMESPACE
  fs: Remove internal old mount API code
  selftests: statmount: tests for STATMOUNT_BY_FD
  statmount: accept fd as a parameter
  statmount: permission check should return EPERM
2026-02-09 14:43:47 -08:00
Linus Torvalds
c84bb79f70 Merge tag 'vfs-7.0-rc1.nullfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs nullfs update from Christian Brauner:
 "Add a completely catatonic minimal pseudo filesystem called "nullfs"
  and make pivot_root() work in the initramfs.

  Currently pivot_root() does not work on the real rootfs because it
  cannot be unmounted. Userspace has to recursively delete initramfs
  contents manually before continuing boot, using the fragile
  switch_root sequence (overmount + chroot).

  Add nullfs, a minimal immutable filesystem that serves as the true
  root of the mount hierarchy. The mutable rootfs (tmpfs/ramfs) is
  mounted on top of it. This allows userspace to simply:

      chdir(new_root);
      pivot_root(".", ".");
      umount2(".", MNT_DETACH);

  without the traditional switch_root workarounds. systemd already
  handles this correctly. It tries pivot_root() first and falls back
  to MS_MOVE only when that fails.

  This also means rootfs mounts in unprivileged namespaces no longer
  need MNT_LOCKED, since the immutable nullfs guarantees nothing can be
  revealed by unmounting the covering mount.

  nullfs is a single-instance filesystem (get_tree_single()) marked
  SB_NOUSER | SB_I_NOEXEC | SB_I_NODEV with an immutable empty root
  directory. This means sooner or later it can be used to overmount
  other directories to hide their contents without any additional
  protection needed.

  We enable it unconditionally. If we see any real regression we'll
  hide it behind a boot option.

  nullfs has extensions beyond this in the future. It will serve as a
  concept to support the creation of completely empty mount namespaces -
  which is work coming up in the next cycle"

* tag 'vfs-7.0-rc1.nullfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: use nullfs unconditionally as the real rootfs
  docs: mention nullfs
  fs: add immutable rootfs
  fs: add init_pivot_root()
  fs: ensure that internal tmpfs mount gets mount id zero
2026-02-09 13:41:34 -08:00
Linus Torvalds
dd466ea002 Merge tag 'vfs-7.0-rc1.fserror' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs error reporting updates from Christian Brauner:
 "This contains the changes to support generic I/O error reporting.

  Filesystems currently have no standard mechanism for reporting
  metadata corruption and file I/O errors to userspace via fsnotify.
  Each filesystem (xfs, ext4, erofs, f2fs, etc.) privately defines
  EFSCORRUPTED, and error reporting to fanotify is inconsistent or
  absent entirely.

  This introduces a generic fserror infrastructure built around struct
  super_block that gives filesystems a standard way to queue metadata
  and file I/O error reports for delivery to fsnotify.

  Errors are queued via mempools and queue_work to avoid holding
  filesystem locks in the notification path; unmount waits for pending
  events to drain. A new super_operations::report_error callback lets
  filesystem drivers respond to file I/O errors themselves (to be used
  by an upcoming XFS self-healing patchset).

  On the uapi side, EFSCORRUPTED and EUCLEAN are promoted from private
  per-filesystem definitions to canonical errno.h values across all
  architectures"

* tag 'vfs-7.0-rc1.fserror' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  ext4: convert to new fserror helpers
  xfs: translate fsdax media errors into file "data lost" errors when convenient
  xfs: report fs metadata errors via fsnotify
  iomap: report file I/O errors to the VFS
  fs: report filesystem and file I/O errors to fsnotify
  uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
2026-02-09 12:21:37 -08:00
Linus Torvalds
996812c453 Merge tag 'vfs-7.0-rc1.initrd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs initrd removal from Christian Brauner:
 "Remove the deprecated linuxrc-based initrd code path and related dead
  code. The linuxrc initrd path was deprecated in 2020 and this series
  completes its removal. If we see real-life regressions we'll revert.

  The core change removes handle_initrd() and init_linuxrc() — the
  entire flow that ran /linuxrc from an initrd, pivoted roots, and
  handed off to the real root filesystem. With that gone, initrd_load()
  becomes void (no longer short-circuits prepare_namespace()),
  rd_load_image() is simplified to always load /initrd.image instead of
  taking a path, and rd_load_disk() is deleted.

  The /proc/sys/kernel/real-root-dev sysctl and its backing variable are
  removed since they only existed for linuxrc to communicate the real
  root device back to the kernel.

  The no-op load_ramdisk= and prompt_ramdisk= parameters are dropped,
  and noinitrd and ramdisk_start= gain deprecation warnings.

  Initramfs is entirely unaffected. The non-linuxrc initrd path
  (root=/dev/ram0) is preserved but now carries a deprecation warning
  targeting January 2027 removal"

* tag 'vfs-7.0-rc1.initrd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  init: remove /proc/sys/kernel/real-root-dev
  initrd: remove deprecated code path (linuxrc)
  init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
2026-02-09 11:03:25 -08:00
Mark Harmstone
8620da16fb btrfs: allow mounting filesystems with remap-tree incompat flag
If we encounter a filesystem with the remap-tree incompat flag set,
validate its compatibility with the other flags, and load the remap tree
using the values that have been added to the superblock.

The remap-tree feature depends on the free-space-tree, but no-holes and
block-group-tree have been made dependencies to reduce the testing
matrix. Similarly I'm not aware of any reason why mixed-bg and zoned would be
incompatible with remap-tree, but this is blocked for the time being
until it can be fully tested.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03 07:54:35 +01:00
Mark Harmstone
7977011460 btrfs: add extended version of struct block_group_item
Add a struct btrfs_block_group_item_v2, which is used in the block group
tree if the remap-tree incompat flag is set.

This adds two new fields to the block group item: `remap_bytes` and
`identity_remap_count`.

`remap_bytes` records the amount of data that's physically within this
block group, but nominally in another, remapped block group. This is
necessary because this data will need to be moved first if this block
group is itself relocated. If `remap_bytes` > 0, this is an indicator to
the relocation thread that it will need to search the remap-tree for
backrefs. A block group must also have `remap_bytes` == 0 before it can
be dropped.

`identity_remap_count` records how many identity remap items are located
in the remap tree for this block group. When relocation is begun for
this block group, this is set to the number of holes in the free-space
tree for this range. As identity remaps are converted into actual remaps
by the relocation process, this number is decreased. Once it reaches 0,
either because of relocation or because extents have been deleted, the
block group has been fully remapped and its chunk's device extents are
removed.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03 07:54:34 +01:00
Mark Harmstone
0b4d29fa98 btrfs: add METADATA_REMAP chunk type
Add a new METADATA_REMAP chunk type, which is a metadata chunk that holds the
remap tree.

This is needed for bootstrapping purposes: the remap tree can't itself
be remapped, and must be relocated the existing way, by COWing every
leaf. The remap tree can't go in the SYSTEM chunk as space there is
limited, because a copy of the chunk item gets placed in the superblock.

The changes in fs/btrfs/volumes.h are because we're adding a new block
group type bit after the profile bits, and so can no longer rely on the
const_ilog2 trick.

The sizing to 32MB per chunk, matching the SYSTEM chunk, is an estimate
here, we can adjust it later if it proves to be too big or too small.
This works out to be ~500,000 remap items, which for a 4KB block size
covers ~2GB of remapped data in the worst case and ~500TB in the best case.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03 07:54:27 +01:00
Mark Harmstone
ef6a31d035 btrfs: add definitions and constants for remap-tree
Add an incompat flag for the new remap-tree feature, and the constants
and definitions needed to support it.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03 07:54:02 +01:00
Linus Torvalds
0a6dce0a5c Merge tag 'char-misc-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc/iio driver fixes from Greg KH:
 "Here are some small char/misc/iio and some other minor driver
  subsystem fixes for 6.19-rc7. Nothing huge here, just some fixes for
  reported issues including:

   - lots of little iio driver fixes

   - comedi driver fixes

   - mux driver fix

   - w1 driver fixes

   - uio driver fix

   - slimbus driver fixes

   - hwtracing bugfix

   - other tiny bugfixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (36 commits)
  comedi: dmm32at: serialize use of paged registers
  mei: trace: treat reg parameter as string
  uio: pci_sva: correct '-ENODEV' check logic
  uacce: ensure safe queue release with state management
  uacce: implement mremap in uacce_vm_ops to return -EPERM
  uacce: fix isolate sysfs check condition
  uacce: fix cdev handling in the cleanup path
  slimbus: core: clean up of_slim_get_device()
  slimbus: core: fix of_slim_get_device() kernel doc
  slimbus: core: amend slim_get_device() kernel doc
  slimbus: core: fix device reference leak on report present
  slimbus: core: fix runtime PM imbalance on report present
  slimbus: core: fix OF node leak on registration failure
  intel_th: rename error label
  intel_th: fix device leak on output open()
  comedi: Fix getting range information for subdevices 16 to 255
  mux: mmio: Fix IS_ERR() vs NULL check in probe()
  interconnect: debugfs: initialize src_node and dst_node to empty strings
  iio: dac: ad3552r-hs: fix out-of-bound write in ad3552r_hs_write_data_source
  iio: accel: iis328dq: fix gain values
  ...
2026-01-25 09:57:31 -08:00
Linus Torvalds
00d20db21e Merge tag 'block-6.19-20260122' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:

 - A set of selftest fixes for ublk

 - Fix for a pid mismatch in ublk, comparing PIDs in different
   namespaces if run inside a namespace

 - Fix for a regression added in this release with polling, where the
   nvme tcp connect code would spin forever

 - Zoned device error path fix

 - Tweak the blkzoned uapi additions from this kernel release, making
   them more easily discoverable

 - Fix for a regression in bcache with bio endio handling added in this
   release

* tag 'block-6.19-20260122' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  bcache: use bio cloning for detached device requests
  blk-mq: use BLK_POLL_ONESHOT for synchronous poll completion
  selftests/ublk: fix garbage output in foreground mode
  selftests/ublk: fix error handling for starting device
  selftests/ublk: fix IO thread idle check
  block: make the new blkzoned UAPI constants discoverable
  ublk: fix ublksrv pid handling for pid namespaces
  block: Fix an error path in disk_update_zone_resources()
2026-01-23 12:53:56 -08:00
Linus Torvalds
0a80e38d0f Merge tag 'net-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from CAN and wireless.

  Pretty big, but hard to make up any cohesive story that would explain
  it, a random collection of fixes. The two reverts of bad patches from
  this release here feel like stuff that'd normally show up by rc5 or
  rc6. Perhaps obvious thing to say, given the holiday timing.

  That said, no active investigations / regressions. Let's see what the
  next week brings.

  Current release - fix to a fix:

   - can: alloc_candev_mqs(): add missing default CAN capabilities

  Current release - regressions:

   - usbnet: fix crash due to missing BQL accounting after resume

   - Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not ...

  Previous releases - regressions:

   - Revert "nfc/nci: Add the inconsistency check between the input ...

  Previous releases - always broken:

   - number of driver fixes for incorrect use of seqlocks on stats

   - rxrpc: fix recvmsg() unconditional requeue, don't corrupt rcv queue
     when MSG_PEEK was set

   - ipvlan: make the addrs_lock be per port avoid races in the port
     hash table

   - sched: enforce that teql can only be used as root qdisc

   - virtio: coalesce only linear skb

   - wifi: ath12k: fix dead lock while flushing management frames

   - eth: igc: reduce TSN TX packet buffer from 7KB to 5KB per queue"

* tag 'net-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits)
  Octeontx2-af: Add proper checks for fwdata
  dpll: Prevent duplicate registrations
  net/sched: act_ife: avoid possible NULL deref
  hinic3: Fix netif_queue_set_napi queue_index input parameter error
  vsock/test: add stream TX credit bounds test
  vsock/virtio: cap TX credit to local buffer size
  vsock/test: fix seqpacket message bounds test
  vsock/virtio: fix potential underflow in virtio_transport_get_credit()
  net: fec: account for VLAN header in frame length calculations
  net: openvswitch: fix data race in ovs_vport_get_upcall_stats
  octeontx2-af: Fix error handling
  net: pcs: pcs-mtk-lynxi: report in-band capability for 2500Base-X
  rxrpc: Fix data-race warning and potential load/store tearing
  net: dsa: fix off-by-one in maximum bridge ID determination
  net: bcmasp: Fix network filter wake for asp-3.0
  bonding: provide a net pointer to __skb_flow_dissect()
  selftests: net: amt: wait longer for connection before sending packets
  be2net: Fix NULL pointer dereference in be_cmd_get_mac_from_list
  Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not-at-end warning"
  netrom: fix double-free in nr_route_frame()
  ...
2026-01-22 09:32:11 -08:00
Jakub Kicinski
9146fe2829 Merge tag 'wireless-2026-11-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:

====================
Another set of updates:
 - various small fixes for ath10k/ath12k/mwifiex/rsi
 - cfg80211 fix for HE bitrate overflow
 - mac80211 fixes
   - S1G beacon handling in scan
   - skb tailroom handling for HW encryption
   - CSA fix for multi-link
   - handling of disabled links during association

* tag 'wireless-2026-11-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: cfg80211: ignore link disabled flag from userspace
  wifi: mac80211: apply advertised TTLM from association response
  wifi: mac80211: parse all TTLM entries
  wifi: mac80211: don't increment crypto_tx_tailroom_needed_cnt twice
  wifi: mac80211: don't perform DA check on S1G beacon
  wifi: ath12k: Fix wrong P2P device link id issue
  wifi: ath12k: fix dead lock while flushing management frames
  wifi: ath12k: Fix scan state stuck in ABORTING after cancel_remain_on_channel
  wifi: ath12k: cancel scan only on active scan vdev
  wifi: mwifiex: Fix a loop in mwifiex_update_ampdu_rxwinsize()
  wifi: mac80211: correctly check if CSA is active
  wifi: cfg80211: Fix bitrate calculation overflow for HE rates
  wifi: rsi: Fix memory corruption due to not set vif driver data size
  wifi: ath12k: don't force radio frequency check in freq_to_idx()
  wifi: ath12k: fix dma_free_coherent() pointer
  wifi: ath10k: fix dma_free_coherent() pointer
====================

Link: https://patch.msgid.link/20260122110248.15450-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-22 07:54:31 -08:00
Christoph Hellwig
f5f2bad67a block: make the new blkzoned UAPI constants discoverable
The Linux 6.19 merge window added the new BLKREPORTZONESV2 ioctl, and
with it the new BLK_ZONE_REP_CACHED and BLK_ZONE_COND_ACTIVE constants.

The two constants are defined as part of enums, which makes it very
painful for userspace to discover if they are present in the installed
system headers.

Use the #define to the same name trick to make them trivially
discoverable using CPP directives.

Fixes: 0bf0e2e466 ("block: track zone conditions")
Fixes: b30ffcdc0c ("block: introduce BLKREPORTZONESV2 ioctl")
Reported-by: Andrey Albershteyn <aalbersh@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-21 07:47:44 -07:00
Benjamin Berg
50b359896f wifi: cfg80211: ignore link disabled flag from userspace
When the AP has an advertised TID to Link Mapping (TTLM) it shall
include the element in the association response. As such, when this
element is present it needs to be used for the currently dormant links.
See Draft P802.11REVmf_D1.0 section 35.3.7.2.3 ("Negotiation of TTLM")
for the details. The flag is also not usable in case userspace wants to
specify a negotiated TTLM during association.

Note that for the link reconfiguration case, mac80211 did not use the
information. Draft P802.11REVmf_D1.0 states in section 35.3.6.4 ("Link
reconfiguration to the setup links) that we "shall operate with all the
TIDs mapped to the newly added links ..."

All this means that the flag is not needed. The implementation should
parse the information from the association response.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260118093904.754e057896a5.Ifd06f5ef839a93bfd54d0593dc932870f95f3242@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-20 10:02:01 +01:00
Linus Torvalds
90a855e75a Merge tag 'landlock-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fixes from Mickaël Salaün:
 "This fixes TCP handling, tests, documentation, non-audit elided code,
  and minor cosmetic changes"

* tag 'landlock-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  landlock: Clarify documentation for the IOCTL access right
  selftests/landlock: Properly close a file descriptor
  landlock: Improve the comment for domain_is_scoped
  selftests/landlock: Use scoped_base_variants.h for ptrace_test
  selftests/landlock: Fix missing semicolon
  selftests/landlock: Fix typo in fs_test
  landlock: Optimize stack usage when !CONFIG_AUDIT
  landlock: Fix spelling
  landlock: Clean up hook_ptrace_access_check()
  landlock: Improve erratum documentation
  landlock: Remove useless include
  landlock: Fix wrong type usage
  selftests/landlock: NULL-terminate unix pathname addresses
  selftests/landlock: Remove invalid unix socket bind()
  selftests/landlock: Add missing connect(minimal AF_UNSPEC) test
  selftests/landlock: Fix TCP bind(AF_UNSPEC) test case
  landlock: Fix TCP handling of short AF_UNSPEC addresses
  landlock: Fix formatting
2026-01-18 15:15:47 -08:00
Linus Torvalds
f8907398a6 Merge tag 'ext4_for_linus-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:

 - Fix an inconsistency in structure size on 32-bit platforms caused by
   padding differences for the new EXT4_IOC_[GS]ET_TUNE_SB_PARAM ioctls

 - Fix a buffer leak on the error path when dropping the refcount an
   xattr value stored in an inode

 - Fix missing locking on the error path for the file defragmentation
   ioctl leading to a BUG

* tag 'ext4_for_linus-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix iloc.bh leak in ext4_xattr_inode_update_ref
  ext4: add missing down_write_data_sem in mext_move_extent().
  ext4: fix ext4_tune_sb_params padding
2026-01-18 14:01:20 -08:00
Arnd Bergmann
cd16edba1c ext4: fix ext4_tune_sb_params padding
The padding at the end of struct ext4_tune_sb_params is architecture
specific and in particular is different between x86-32 and x86-64,
since the __u64 member only enforces struct alignment on the latter.

This shows up as a new warning when test-building the headers with
-Wpadded:

include/linux/ext4.h:144:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded]

All members inside the structure are naturally aligned, so the only
difference here is the amount of padding at the end. Make the padding
explicit, to have a consistent sizeof(struct ext4_tune_sb_params) of
232 on all architectures and avoid adding compat ioctl handling for
EXT4_IOC_GET_TUNE_SB_PARAM/EXT4_IOC_SET_TUNE_SB_PARAM.

This is an ABI break on x86-32 but hopefully this can go into 6.18.y early
enough as a fixup so no actual users will be affected.  Alternatively, the
kernel could handle the ioctl commands for both sizes (232 and 228 bytes)
on all architectures.

Fixes: 04a91570ac ("ext4: implemet new ioctls to set and get superblock parameters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20251204101914.1037148-1-arnd@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2026-01-18 11:22:53 -05:00
Linus Torvalds
b62ce2547f Merge tag 'pm-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These fix an error path memory leak in the energy model management
  code, fix a kerneldoc comment in it, and fix and revamp the energy
  model YNL specification added recently along with the new energy model
  management netlink interface (that received feedback after being
  added):

   - Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout)

   - Fix stale description of the cost field in struct em_perf_state to
     reflect the current code (Yaxiong Tian)

   - Fix and revamp the energy model YNL specification added recently
     along with the energy model netlink interface (Changwoo Min)"

* tag 'pm-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: EM: Add dump to get-perf-domains in the EM YNL spec
  PM: EM: Change cpus' type from string to u64 array in the EM YNL spec
  PM: EM: Rename em.yaml to dev-energymodel.yaml
  PM: EM: Fix yamllint warnings in the EM YNL spec
  PM: EM: Fix memory leak in em_create_pd() error path
  PM: EM: Fix incorrect description of the cost field in struct em_perf_state
2026-01-16 12:08:19 -08:00
Christian Brauner
9b8a0ba682 mount: add OPEN_TREE_NAMESPACE
When creating containers the setup usually involves using CLONE_NEWNS
via clone3() or unshare(). This copies the caller's complete mount
namespace. The runtime will also assemble a new rootfs and then use
pivot_root() to switch the old mount tree with the new rootfs. Afterward
it will recursively umount the old mount tree thereby getting rid of all
mounts.

On a basic system here where the mount table isn't particularly large
this still copies about 30 mounts. Copying all of these mounts only to
get rid of them later is pretty wasteful.

This is exacerbated if intermediary mount namespaces are used that only
exist for a very short amount of time and are immediately destroyed
again causing a ton of mounts to be copied and destroyed needlessly.

With a large mount table and a system where thousands or ten-thousands
of containers are spawned in parallel this quickly becomes a bottleneck
increasing contention on the semaphore.

Extend open_tree() with a new OPEN_TREE_NAMESPACE flag. Similar to
OPEN_TREE_CLONE only the indicated mount tree is copied. Instead of
returning a file descriptor referring to that mount tree
OPEN_TREE_NAMESPACE will cause open_tree() to return a file descriptor
to a new mount namespace. In that new mount namespace the copied mount
tree has been mounted on top of a copy of the real rootfs.

The caller can setns() into that mount namespace and perform any
additionally required setup such as move_mount() detached mounts in
there.

This allows OPEN_TREE_NAMESPACE to function as a combined
unshare(CLONE_NEWNS) and pivot_root().

A caller may for example choose to create an extremely minimal rootfs:

fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE);

This will create a mount namespace where "wootwoot" has become the
rootfs mounted on top of the real rootfs. The caller can now setns()
into this new mount namespace and assemble additional mounts.

This also works with user namespaces:

unshare(CLONE_NEWUSER);
fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE);

which creates a new mount namespace owned by the earlier created user
namespace with "wootwoot" as the rootfs mounted on top of the real
rootfs.

Link: https://patch.msgid.link/20251229-work-empty-namespace-v1-1-bfb24c7b061f@kernel.org
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Suggested-by: Christian Brauner <brauner@kernel.org>
Suggested-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16 19:21:40 +01:00
Ian Abbott
10d28cffb3 comedi: Fix getting range information for subdevices 16 to 255
The `COMEDI_RANGEINFO` ioctl does not work properly for subdevice
indices above 15.  Currently, the only in-tree COMEDI drivers that
support more than 16 subdevices are the "8255" driver and the
"comedi_bond" driver.  Making the ioctl work for subdevice indices up to
255 is achievable.  It needs minor changes to the handling of the
`COMEDI_RANGEINFO` and `COMEDI_CHANINFO` ioctls that should be mostly
harmless to user-space, apart from making them less broken.  Details
follow...

The `COMEDI_RANGEINFO` ioctl command gets the list of supported ranges
(usually with units of volts or milliamps) for a COMEDI subdevice or
channel.  (Only some subdevices have per-channel range tables, indicated
by the `SDF_RANGETYPE` flag in the subdevice information.)  It uses a
`range_type` value and a user-space pointer, both supplied by
user-space, but the `range_type` value should match what was obtained
using the `COMEDI_CHANINFO` ioctl (if the subdevice has per-channel
range tables)  or `COMEDI_SUBDINFO` ioctl (if the subdevice uses a
single range table for all channels).  Bits 15 to 0 of the `range_type`
value contain the length of the range table, which is the only part that
user-space should care about (so it can use a suitably sized buffer to
fetch the range table).  Bits 23 to 16 store the channel index, which is
assumed to be no more than 255 if the subdevice has per-channel range
tables, and is set to 0 if the subdevice has a single range table.  For
`range_type` values produced by the `COMEDI_SUBDINFO` ioctl, bits 31 to
24 contain the subdevice index, which is assumed to be no more than 255.
But for `range_type` values produced by the `COMEDI_CHANINFO` ioctl,
bits 27 to 24 contain the subdevice index, which is assumed to be no
more than 15, and bits 31 to 28 contain the COMEDI device's minor device
number for some unknown reason lost in the mists of time.  The
`COMEDI_RANGEINFO` ioctl extract the length from bits 15 to 0 of the
user-supplied `range_type` value, extracts the channel index from bits
23 to 16 (only used if the subdevice has per-channel range tables),
extracts the subdevice index from bits 27 to 24, and ignores bits 31 to
28.  So for subdevice indices 16 to 255, the `COMEDI_SUBDINFO` or
`COMEDI_CHANINFO` ioctl will report a `range_type` value that doesn't
work with the `COMEDI_RANGEINFO` ioctl.  It will either get the range
table for the subdevice index modulo 16, or will fail with `-EINVAL`.

To fix this, always use bits 31 to 24 of the `range_type` value to hold
the subdevice index (assumed to be no more than 255).  This affects the
`COMEDI_CHANINFO` and `COMEDI_RANGEINFO` ioctls.  There should not be
anything in user-space that depends on the old, broken usage, although
it may now see different values in bits 31 to 28 of the `range_type`
values reported by the `COMEDI_CHANINFO` ioctl for subdevices that have
per-channel subdevices.  User-space should not be trying to decode bits
31 to 16 of the `range_type` values anyway.

Fixes: ed9eccbe89 ("Staging: add comedi core")
Cc: stable@vger.kernel.org #5.17+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://patch.msgid.link/20251203162438.176841-1-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-16 16:42:15 +01:00
Rafael J. Wysocki
d51e68b700 Merge branch 'pm-em'
Merge fixes related to the energy model management for 6.19-rc6:

 - Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout)

 - Fix stale description of the cost field in struct em_perf_state to
   reflect the current code (Yaxiong Tian)

 - Fix and revamp the energy model YNL specification added recently
   along with the energy model netlink interface (Changwoo Min)

* pm-em:
  PM: EM: Add dump to get-perf-domains in the EM YNL spec
  PM: EM: Change cpus' type from string to u64 array in the EM YNL spec
  PM: EM: Rename em.yaml to dev-energymodel.yaml
  PM: EM: Fix yamllint warnings in the EM YNL spec
  PM: EM: Fix memory leak in em_create_pd() error path
  PM: EM: Fix incorrect description of the cost field in struct em_perf_state
2026-01-16 16:16:24 +01:00
Linus Torvalds
d19954ee63 Merge tag 'media/v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:

 - ov02c10: some fixes related to preserving bayer pattern and
   horizontal control

 - ipu-bridge: Add quirks for some Dell XPS laptops with inverted
   sensors

 - mali-c55: Fix version identifier logic

 - rzg2l-cru: csi-2: fix RZ/V2H input sizes on some variants

* tag 'media/v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: ov02c10: Remove unnecessary hflip and vflip pointers
  media: ipu-bridge: Add DMI quirk for Dell XPS laptops with upside down sensors
  media: ov02c10: Fix the horizontal flip control
  media: ov02c10: Adjust x-win/y-win when changing flipping to preserve bayer-pattern
  media: ov02c10: Fix bayer-pattern change after default vflip change
  media: rzg2l-cru: csi-2: Support RZ/V2H input sizes
  media: uapi: mali-c55-config: Remove version identifier
  media: mali-c55: Remove duplicated version check
  media: Documentation: mali-c55: Use v4l2-isp version identifier
2026-01-14 08:18:01 -08:00
Darrick J. Wong
6025447737 uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
Stop definining these privately and instead move them to the uapi
errno.h so that they become canonical instead of copy pasta.

Cc: linux-api@vger.kernel.org
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://patch.msgid.link/176826402587.3490369.17659117524205214600.stgit@frogsfrogsfrogs
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13 09:58:01 +01:00
Askar Safin
e6ce36ccc8 init: remove /proc/sys/kernel/real-root-dev
It is not used anymore.

Signed-off-by: Askar Safin <safinaskar@gmail.com>
Link: https://patch.msgid.link/20251119222407.3333257-4-safinaskar@gmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 17:22:27 +01:00
Günther Noack
6abbb8703a landlock: Clarify documentation for the IOCTL access right
Move the description of the LANDLOCK_ACCESS_FS_IOCTL_DEV access right
together with the file access rights.

This group of access rights applies to files (in this case device
files), and they can be added to file or directory inodes using
landlock_add_rule(2).  The check for that works the same for all file
access rights, including LANDLOCK_ACCESS_FS_IOCTL_DEV.

Invoking ioctl(2) on directory FDs can not currently be restricted
with Landlock.  Having it grouped separately in the documentation is a
remnant from earlier revisions of the LANDLOCK_ACCESS_FS_IOCTL_DEV
patch set.

Link: https://lore.kernel.org/all/20260108.Thaex5ruach2@digikod.net/
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260111175203.6545-2-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-01-12 17:07:21 +01:00
Christian Brauner
576ee5dfd4 fs: add immutable rootfs
Currently pivot_root() doesn't work on the real rootfs because it
cannot be unmounted. Userspace has to do a recursive removal of the
initramfs contents manually before continuing the boot.

Really all we want from the real rootfs is to serve as the parent mount
for anything that is actually useful such as the tmpfs or ramfs for
initramfs unpacking or the rootfs itself. There's no need for the real
rootfs to actually be anything meaningful or useful. Add a immutable
rootfs called "nullfs" that can be selected via the "nullfs_rootfs"
kernel command line option.

The kernel will mount a tmpfs/ramfs on top of it, unpack the initramfs
and fire up userspace which mounts the rootfs and can then just do:

  chdir(rootfs);
  pivot_root(".", ".");
  umount2(".", MNT_DETACH);

and be done with it. (Ofc, userspace can also choose to retain the
initramfs contents by using something like pivot_root(".", "/initramfs")
without unmounting it.)

Technically this also means that the rootfs mount in unprivileged
namespaces doesn't need to become MNT_LOCKED anymore as it's guaranteed
that the immutable rootfs remains permanently empty so there cannot be
anything revealed by unmounting the covering mount.

In the future this will also allow us to create completely empty mount
namespaces without risking to leak anything.

systemd already handles this all correctly as it tries to pivot_root()
first and falls back to MS_MOVE only when that fails.

This goes back to various discussion in previous years and a LPC 2024
presentation about this very topic.

Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-3-88dd1c34a204@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 16:52:09 +01:00
Thomas Gleixner
2e4b28c48f treewide: Update email address
In a vain attempt to consolidate the email zoo switch everything to the
kernel.org account.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-11 06:09:11 -10:00
Changwoo Min
380ff27af2 PM: EM: Add dump to get-perf-domains in the EM YNL spec
Add dump to get-perf-domains, so that a user can fetch either information
about a specific performance domain with do or information about all
performance domains with dump. Share the reply format of do and dump using
perf-domain-attrs, so remove perf-domains. The YNL spec, autogenerated
files, and the do implementation are updated, and the dump implementation
is added.

Suggested-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://patch.msgid.link/20260108053212.642478-5-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-09 21:44:46 +01:00
Changwoo Min
caa07a815d PM: EM: Rename em.yaml to dev-energymodel.yaml
The EM YNL specification used many acronyms, including ‘em’, ‘pd’,
‘ps’, etc. While the acronyms are short and convenient, they could be
confusing. So, let’s spell them out to be more specific. The following
changes were made in the spec. Note that the protocol name cannot exceed
GENL_NAMSIZ (16).

  em           -> dev-energymodel
  pds          -> perf-domains
  pd           -> perf-domain
  pd-id        -> perf-domain-id
  pd-table     -> perf-table
  ps           -> perf-state
  get-pds      -> get-perf-domains
  get-pd-table -> get-perf-table
  pd-created   -> perf-domain-created
  pd-updated   -> perf-domain-updated
  pd-deleted   -> perf-domain-deleted

In addition. doc strings were added to the spec. based on the comments in
energy_model.h. Two flag attributes (perf-state-flags and
perf-domain-flags) were added for easily interpreting the bit flags.

Finally, the autogenerated files and em_netlink.c were updated accordingly
to reflect the name changes.

Suggested-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://patch.msgid.link/20260108053212.642478-3-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-09 21:44:46 +01:00
Linus Torvalds
2bfe3e0da6 Merge tag 'vfs-6.19-rc5.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:

 - Remove incorrect __user annotation from struct xattr_args::value

 - Documentation fix: Add missing kernel-doc description for the @isnew
   parameter in ilookup5_nowait() to silence Sphinx warnings

 - Documentation fix: Fix kernel-doc comment for __start_dirop() - the
   function name in the comment was wrong and the @state parameter was
   undocumented

 - Replace dynamic folio_batch allocation with stack allocation in
   iomap_zero_range(). The dynamic allocation was problematic for
   ext4-on-iomap work (didn't handle allocation failure properly) and
   triggered lockdep complaints. Uses a flag instead to control batch
   usage

 - Re-add #ifdef guards around PIDFD_GET_<ns-type>_NAMESPACE ioctls.
   When a namespace type is disabled, ns->ops is NULL, causes crashes
   during inode eviction when closing the fd. The ifdefs were removed in
   a recent simplification but are still needed

 - Fixe a race where a folio could be unlocked before the trailing zeros
   (for EOF within the page) were written

 - Split out a dedicated lease_dispose_list() helper since lease code
   paths always know they're disposing of leases. Removes unnecessary
   runtime flag checks and prepares for upcoming lease_manager
   enhancements

 - Fix userland delegation requests succeeding despite conflicting
   opens. Previously, FL_LAYOUT and FL_DELEG leases bypassed conflict
   checks (a hack for nfsd). Adds new ->lm_open_conflict() lease_manager
   operation so userland delegations get proper conflict checking while
   nfsd can continue its own conflict handling

 - Fix LOOKUP_CACHED path lookups incorrectly falling through to the
   slow path. After legitimize_links() calls were conditionally elided,
   the routine would always fail with LOOKUP_CACHED regardless of
   whether there were any links. Now the flag is checked at the two
   callsites before calling legitimize_links()

 - Fix bug in media fd allocation in media_request_alloc()

 - Fix mismatched API calls in ecryptfs_mknod(): was calling
   end_removing() instead of end_creating() after
   ecryptfs_start_creating_dentry()

 - Fix dentry reference count leak in ecryptfs_mkdir(): a dget() of the
   lower parent dir was added but never dput()'d, causing BUG during
   lower filesystem unmount due to the still-in-use dentry

* tag 'vfs-6.19-rc5.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  pidfs: protect PIDFD_GET_* ioctls() via ifdef
  ecryptfs: Release lower parent dentry after creating dir
  ecryptfs: Fix improper mknod pairing of start_creating()/end_removing()
  get rid of bogus __user in struct xattr_args::value
  VFS: fix __start_dirop() kernel-doc warnings
  fs: Describe @isnew parameter in ilookup5_nowait()
  fs: make sure to fail try_to_unlazy() and try_to_unlazy() for LOOKUP_CACHED
  netfs: Fix early read unlock of page with EOF in middle
  filelock: allow lease_managers to dictate what qualifies as a conflict
  filelock: add lease_dispose_list() helper
  iomap: replace folio_batch allocation with stack allocation
  media: mc: fix potential use-after-free in media_request_alloc()
2026-01-09 05:57:57 -10:00
Linus Torvalds
f0b9d8eb98 Merge tag 'nfsd-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
 "A set of NFSD fixes for stable that arrived after the merge window:

   - Remove an invalid NFS status code

   - Fix an fstests failure when using pNFS

   - Fix a UAF in v4_end_grace()

   - Fix the administrative interface used to revoke NFSv4 state

   - Fix a memory leak reported by syzbot"

* tag 'nfsd-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: net ref data still needs to be freed even if net hasn't startup
  nfsd: check that server is running in unlock_filesystem
  nfsd: use correct loop termination in nfsd4_revoke_states()
  nfsd: provide locking for v4_end_grace
  NFSD: Fix permission check for read access to executable-only files
  NFSD: Remove NFSERR_EAGAIN
2026-01-06 09:12:52 -08:00
Jacopo Mondi
22cd0db47f media: uapi: mali-c55-config: Remove version identifier
The Mali C55 driver uses the v4l2-isp framework, which defines its own
versioning number which does not need to be defined again in each
platform-specific header.

Remove the definition of mali_c55_param_buffer_version enumeration from
the Mali C55 uAPI header.

Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
2026-01-06 10:14:13 +01:00
Linus Torvalds
6ce4d44fb0 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:

 - Fix several syzkaller found bugs:
    - Poor parsing of the RDMA_NL_LS_OP_IP_RESOLVE netlink
    - GID entry refcount leaking when CM destruction races with
      multicast establishment
    - Missing refcount put in ib_del_sub_device_and_put()

 - Fixup recently introduced uABI padding for 32 bit consistency

 - Avoid user triggered math overflow in MANA and AFA

 - Reading invalid netdev data during an event

 - kdoc fixes

 - Fix never-working gid copying in ib_get_gids_from_rdma_hdr

 - Typo in bnxt when validating the BAR

 - bnxt mis-parsed IB_SEND_IP_CSUM so it didn't work always

 - bnxt out of bounds access in bnxt related to the counters on new
   devices

 - Allocate the bnxt PDE table with the right sizing

 - Use dma_free_coherent() correctly in bnxt

 - Allow rxe to be unloadable when CONFIG_PROVE_LOCKING by adjusting the
   tracking of the global sockets it uses

 - Missing unlocking on error path in rxe

 - Compute the right number of pages in a MR in rtrs

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/bnxt_re: fix dma_free_coherent() pointer
  RDMA/rtrs: Fix clt_path::max_pages_per_mr calculation
  IB/rxe: Fix missing umem_odp->umem_mutex unlock on error path
  RDMA/bnxt_re: Fix to use correct page size for PDE table
  RDMA/bnxt_re: Fix OOB write in bnxt_re_copy_err_stats()
  RDMA/bnxt_re: Fix IB_SEND_IP_CSUM handling in post_send
  RDMA/core: always drop device refcount in ib_del_sub_device_and_put()
  RDMA/rxe: let rxe_reclassify_recv_socket() call sk_owner_put()
  RDMA/bnxt_re: Fix incorrect BAR check in bnxt_qplib_map_creq_db()
  RDMA/core: Fix logic error in ib_get_gids_from_rdma_hdr()
  RDMA/efa: Remove possible negative shift
  RTRS/rtrs: clean up rtrs headers kernel-doc
  RDMA/irdma: avoid invalid read in irdma_net_event
  RDMA/mana_ib: check cqe length for kernel CQs
  RDMA/irdma: Fix irdma_alloc_ucontext_resp padding
  RDMA/ucma: Fix rdma_ucm_query_ib_service_resp struct padding
  RDMA/cm: Fix leaking the multicast GID table reference
  RDMA/core: Check for the presence of LS_NLA_TYPE_DGID correctly
2026-01-02 12:25:47 -08:00
Chuck Lever
c6c209ceb8 NFSD: Remove NFSERR_EAGAIN
I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881.
None of these RFCs have an NFS status code that match the numeric
value "11".

Based on the meaning of the EAGAIN errno, I presume the use of this
status in NFSD means NFS4ERR_DELAY. So replace the one usage of
nfserr_eagain, and remove it from NFSD's NFS status conversion
tables.

As far as I can tell, NFSERR_EAGAIN has existed since the pre-git
era, but was not actually used by any code until commit f4e44b3933
("NFSD: delay unmount source's export after inter-server copy
completed."), at which time it become possible for NFSD to return
a status code of 11 (which is not valid NFS protocol).

Fixes: f4e44b3933 ("NFSD: delay unmount source's export after inter-server copy completed.")
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-02 13:43:41 -05:00
Al Viro
3dd57ddec9 get rid of bogus __user in struct xattr_args::value
The first member of struct xattr_args is declared as
	__aligned_u64 __user value;
which makes no sense whatsoever; __user is a qualifier and what that
declaration says is "all struct xattr_args instances have .value
_stored_ in user address space, no matter where the rest of the
structure happens to be".

	Something like "int __user *p" stands for "value of p is a pointer
to an instance of int that happens to live in user address space"; it
says nothing about location of p itself, just as const char *p declares a
pointer to unmodifiable char rather than an unmodifiable pointer to char.

	With xattr_args the intent clearly had been "the 64bit value
represents a _pointer_ to object in user address space", but __user has
nothing to do with that.  All it gets us is a couple of bogus warnings
in fs/xattr.c where (userland) instance of xattr_args is copied to local
variable of that type (in kernel address space), followed by access
to its members.  Since we've told sparse that args.value must somehow be
located in userland memory, we get warned that looking at that 64bit
unsigned integer (in a variable already on kernel stack) is not allowed.

	Note that sparse has no way to express "this integer shall never
be cast into a pointer to be dereferenced directly" and I don't see any
way to assign a sane semantics to that.  In any case, __user is not it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://patch.msgid.link/20251216081939.GQ1712166@ZenIV
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-24 13:52:50 +01:00
Ryusuke Konishi
6fd8a09f48 nilfs2: fix missing struct keywords in nilfs2_api.h kernel-doc
Eliminate the following kernel-doc warnings in nilfs2_api.h:

Warning: include/uapi/linux/nilfs2_api.h:65 cannot understand function
 prototype: 'struct nilfs_suinfo'
Warning: include/uapi/linux/nilfs2_api.h:101 cannot understand function
 prototype: 'struct nilfs_suinfo_update'

This ensures that the documentation for nilfs_suinfo and
nilfs_suinfo_update is correctly parsed and generated by adding the
missing 'struct' keyword to their kernel-doc comments.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-12-22 15:45:29 -08:00
Randy Dunlap
cb8fe62f87 nilfs2: convert nilfs_super_block to kernel-doc
Eliminate 40+ kernel-doc warnings in nilfs2_ondisk.h by converting
all of the struct member comments to kernel-doc comments.

Fix one misnamed struct member in nilfs_direct_node.

Object files before and after are the same size and content.

Examples of warnings:
Warning: include/uapi/linux/nilfs2_ondisk.h:202 struct member 's_rev_level'
 not described in 'nilfs_super_block'
Warning: include/uapi/linux/nilfs2_ondisk.h:202 struct member
 's_minor_rev_level' not described in 'nilfs_super_block'
Warning: include/uapi/linux/nilfs2_ondisk.h:202 struct member 's_magic'
 not described in 'nilfs_super_block'
Warning: include/uapi/linux/nilfs2_ondisk.h:202 struct member 's_bytes'
 not described in 'nilfs_super_block'
Warning: include/uapi/linux/nilfs2_ondisk.h:202 struct member 's_flags'
 not described in 'nilfs_super_block'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-12-22 15:45:29 -08:00
Thomas Weißschuh
b61104e7a6 regulator: uapi: Use UAPI integer type
Using libc types and headers from the UAPI headers is problematic as it
introduces a dependency on a full C toolchain.

Use the fixed-width integer type provided by the UAPI headers instead.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20251222-uapi-regulator-v1-1-a71c66eb1a94@linutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-12-22 09:00:42 +00:00
Linus Torvalds
10a0e846d8 Merge tag 'input-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:

 - a quirk for i8042 to better handle another TUXEDO model

 - a quirk to atkbd to handle incorcet behavior of HONOR FMB-P internal
   keyboard

 - a definition for a new ABS_SND_PROFILE event

 - fixes to alps and lkkbd drivers to reliably shut down pending work on
   removal

 - a fix to apple_z2 driver tightening input report parsing

 - a fix for "off-by-one" error when validating config in ti_am335x_tsc
   driver

 - addition of CRKD Guitars device IDs to xpad driver.

* tag 'input-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ti_am335x_tsc - fix off-by-one error in wire_order validation
  Input: xpad - add support for CRKD Guitars
  Input: add ABS_SND_PROFILE
  Input: apple_z2 - fix reading incorrect reports after exiting sleep
  Input: alps - fix use-after-free bugs caused by dev3_register_work
  Input: i8042 - add TUXEDO InfinityBook Max Gen10 AMD to i8042 quirk table
  Input: atkbd - skip deactivate for HONOR FMB-P's internal keyboard
  Input: lkkbd - disable pending work before freeing device
2025-12-21 15:21:10 -08:00
Linus Torvalds
a0bdd554a8 Merge tag 'drm-fixes-2025-12-20' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "rc2 fixes for the week, mostly xe, with amdgpu as usual. Then a
  smattering of small fixes across the core/tests/panel and amdxdna.

  I expect things will be quiet for rc3/4 as teams take a break, and I'm
  travelling but will keep an eye on things.

  core:
   - fix gem handle leak on DRM_IOCTL_GEM_CHANGE_HANDLE

  tests:
   - add EDEADLK handling

  amdgpu:
   - Fix no_console_suspend handling
   - DCN 3.5.x seamless boot fixes
   - DP audio fix
   - Fix race in GPU recovery
   - SMU 14 OD fix

  amdkfd:
   - Event fix

  xe:
   - Limit num_syncs to prevent oversized kernel allocations
   - Disallow 0 OA property values
   - Disallow 0 EU stall property values
   - Fix kobject leak
   - Workaround
   - Loop variable reference fix
   - Fix a CONFIG corner-case incorrect number of argument
   - Skip reason prefix while emitting array
   - VF migration fix
   - Fix context in mei interrupt top half
   - Don't include the CCS metadata in the dma-buf sg-table
   - VF queueing recovery work fix
   - Increase TDF timeout
   - GT reset registers vs scheduler ordering fix
   - Adjust long-running workload timeslices
   - Always set OA_OAGLBCTXCTRL_COUNTER_RESUME
   - Fix a return value
   - Drop preempt-fences when destroying imported dma-bufs
   - Use usleep_range for accurate long-running workload timeslicing

  amdxdna:
   - don't load virtualized

  panel:
   - fix visionox-rm69299 Kconfig dependency
   - sony-td4353-jdi probing fix"

* tag 'drm-fixes-2025-12-20' of https://gitlab.freedesktop.org/drm/kernel: (34 commits)
  drm/xe: Use usleep_range for accurate long-running workload timeslicing
  drm/xe: Drop preempt-fences when destroying imported dma-bufs.
  drm/xe/eustall: Disallow 0 EU stall property values
  drm/xe/oa: Disallow 0 OA property values
  drm/xe/xe_sriov_vfio: Fix return value in xe_sriov_vfio_migration_supported()
  drm/xe/oa: Always set OAG_OAGLBCTXCTRL_COUNTER_RESUME
  drm/xe: Adjust long-running workload timeslices to reasonable values
  drm/xe/oa: Limit num_syncs to prevent oversized allocations
  drm/xe: Limit num_syncs to prevent oversized allocations
  drm/amdkfd: Fix improper NULL termination of queue restore SMI event string
  drm/amd/pm: restore SCLK settings after S0ix resume
  drm/amdgpu: fix a job->pasid access race in gpu recovery
  drm/amd/display: Fix DP no audio issue
  drm/amd/display: Fix scratch registers offsets for DCN351
  drm/amd/display: Fix scratch registers offsets for DCN35
  drm/amd: Resume the device in thaw() callback when console suspend is disabled
  drm/panel: visionox-rm69299: Depend on BACKLIGHT_CLASS_DEVICE
  accel/amdxdna: Block running under a hypervisor
  drm/panel: sony-td4353-jdi: Enable prepare_prev_first
  drm/xe: Restore engine registers before restarting schedulers after GT reset
  ...
2025-12-20 12:08:02 -08:00
Linus Torvalds
d8ba32c5a4 Merge tag 'block-6.19-20251218' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:

 - ublk selftests for missing coverage

 - two fixes for the block integrity code

 - fix for the newly added newly added PR read keys ioctl, limiting the
   memory that can be allocated

 - work around for a deadlock that can occur with ublk, where partition
   scanning ends up recursing back into file closure, which needs the
   same mutex grabbed. Not the prettiest thing in the world, but an
   acceptable work-around until we can eliminate the reliance on
   disk->open_mutex for this

 - fix for a race between enabling writeback throttling and new IO
   submissions

 - move a bit of bio flag handling code. No changes, but needed for a
   patchset for a future kernel

 - fix for an init time id leak failure in rnbd

 - loop/zloop state check fix

* tag 'block-6.19-20251218' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  block: validate interval_exp integrity limit
  block: validate pi_offset integrity limit
  block: rnbd-clt: Fix leaked ID in init_dev()
  ublk: fix deadlock when reading partition table
  block: add allocation size check in blkdev_pr_read_keys()
  Documentation: admin-guide: blockdev: replace zone_capacity with zone_capacity_mb when creating devices
  zloop: use READ_ONCE() to read lo->lo_state in queue_rq path
  loop: use READ_ONCE() to read lo->lo_state without locking
  block: fix race between wbt_enable_default and IO submission
  selftests: ublk: add user copy test cases
  selftests: ublk: add support for user copy to kublk
  selftests: ublk: forbid multiple data copy modes
  selftests: ublk: don't share backing files between ublk servers
  selftests: ublk: use auto_zc for PER_IO_DAEMON tests in stress_04
  selftests: ublk: fix fio arguments in run_io_and_recover()
  selftests: ublk: remove unused ios map in seq_io.bt
  selftests: ublk: correct last_rw map type in seq_io.bt
  selftests: ublk: fix overflow in ublk_queue_auto_zc_fallback()
  block: move around bio flagging helpers
2025-12-20 09:48:56 -08:00
Gergo Koteles
733a892422 Input: add ABS_SND_PROFILE
ABS_SND_PROFILE used to describe the state of a multi-value sound profile
switch. This will be used for the alert-slider on OnePlus phones or other
phones.

Profile values added as SND_PROFLE_(SILENT|VIBRATE|RING) identifiers
to input-event-codes.h so they can be used from DTS.

Signed-off-by: Gergo Koteles <soyer@irl.hu>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Tested-by: Guido Günther <agx@sigxcpu.org> # oneplus,fajita & oneplus,enchilada
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Pavel Machek <pavel@ucw.cz>
Link: https://patch.msgid.link/20251113-op6-tri-state-v8-1-54073f3874bc@ixit.cz
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-12-18 21:34:42 -08:00
Linus Torvalds
7b8e9264f5 Merge tag 'net-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter and CAN.

  Current release - regressions:

   - netfilter: nf_conncount: fix leaked ct in error paths

   - sched: act_mirred: fix loop detection

   - sctp: fix potential deadlock in sctp_clone_sock()

   - can: fix build dependency

   - eth: mlx5e: do not update BQL of old txqs during channel
     reconfiguration

  Previous releases - regressions:

   - sched: ets: always remove class from active list before deleting it

   - inet: frags: flush pending skbs in fqdir_pre_exit()

   - netfilter: nf_nat: remove bogus direction check

   - mptcp:
      - schedule rtx timer only after pushing data
      - avoid deadlock on fallback while reinjecting

   - can: gs_usb: fix error handling

   - eth:
      - mlx5e:
         - avoid unregistering PSP twice
         - fix double unregister of HCA_PORTS component
      - bnxt_en: fix XDP_TX path
      - mlxsw: fix use-after-free when updating multicast route stats

  Previous releases - always broken:

   - ethtool: avoid overflowing userspace buffer on stats query

   - openvswitch: fix middle attribute validation in push_nsh() action

   - eth:
      - mlx5: fw_tracer, validate format string parameters
      - mlxsw: spectrum_router: fix neighbour use-after-free
      - ipvlan: ignore PACKET_LOOPBACK in handle_mode_l2()

  Misc:

   - Jozsef Kadlecsik retires from maintaining netfilter

   - tools: ynl: fix build on systems with old kernel headers"

* tag 'net-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  net: hns3: add VLAN id validation before using
  net: hns3: using the num_tqps to check whether tqp_index is out of range when vf get ring info from mbx
  net: hns3: using the num_tqps in the vf driver to apply for resources
  net: enetc: do not transmit redirected XDP frames when the link is down
  selftests/tc-testing: Test case exercising potential mirred redirect deadlock
  net/sched: act_mirred: fix loop detection
  sctp: Clear inet_opt in sctp_v6_copy_ip_options().
  sctp: Fetch inet6_sk() after setting ->pinet6 in sctp_clone_sock().
  net/handshake: duplicate handshake cancellations leak socket
  net/mlx5e: Don't include PSP in the hard MTU calculations
  net/mlx5e: Do not update BQL of old txqs during channel reconfiguration
  net/mlx5e: Trigger neighbor resolution for unresolved destinations
  net/mlx5e: Use ip6_dst_lookup instead of ipv6_dst_lookup_flow for MAC init
  net/mlx5: Serialize firmware reset with devlink
  net/mlx5: fw_tracer, Handle escaped percent properly
  net/mlx5: fw_tracer, Validate format string parameters
  net/mlx5: Drain firmware reset in shutdown callback
  net/mlx5: fw reset, clear reset requested on drain_fw_reset
  net: dsa: mxl-gsw1xx: manually clear RANEG bit
  net: dsa: mxl-gsw1xx: fix .shutdown driver operation
  ...
2025-12-19 07:55:35 +12:00
Shuicheng Lin
8e46130400 drm/xe: Limit num_syncs to prevent oversized allocations
The exec and vm_bind ioctl allow userspace to specify an arbitrary
num_syncs value. Without bounds checking, a very large num_syncs
can force an excessively large allocation, leading to kernel warnings
from the page allocator as below.

Introduce DRM_XE_MAX_SYNCS (set to 1024) and reject any request
exceeding this limit.

"
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1217 at mm/page_alloc.c:5124 __alloc_frozen_pages_noprof+0x2f8/0x2180 mm/page_alloc.c:5124
...
Call Trace:
 <TASK>
 alloc_pages_mpol+0xe4/0x330 mm/mempolicy.c:2416
 ___kmalloc_large_node+0xd8/0x110 mm/slub.c:4317
 __kmalloc_large_node_noprof+0x18/0xe0 mm/slub.c:4348
 __do_kmalloc_node mm/slub.c:4364 [inline]
 __kmalloc_noprof+0x3d4/0x4b0 mm/slub.c:4388
 kmalloc_noprof include/linux/slab.h:909 [inline]
 kmalloc_array_noprof include/linux/slab.h:948 [inline]
 xe_exec_ioctl+0xa47/0x1e70 drivers/gpu/drm/xe/xe_exec.c:158
 drm_ioctl_kernel+0x1f1/0x3e0 drivers/gpu/drm/drm_ioctl.c:797
 drm_ioctl+0x5e7/0xc50 drivers/gpu/drm/drm_ioctl.c:894
 xe_drm_ioctl+0x10b/0x170 drivers/gpu/drm/xe/xe_device.c:224
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:598 [inline]
 __se_sys_ioctl fs/ioctl.c:584 [inline]
 __x64_sys_ioctl+0x18b/0x210 fs/ioctl.c:584
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xbb/0x380 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
"

v2: Add "Reported-by" and Cc stable kernels.
v3: Change XE_MAX_SYNCS from 64 to 1024. (Matt & Ashutosh)
v4: s/XE_MAX_SYNCS/DRM_XE_MAX_SYNCS/ (Matt)
v5: Do the check at the top of the exec func. (Matt)

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6450
Cc: <stable@vger.kernel.org> # v6.12+
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Ivan Briano <ivan.briano@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251205234715.2476561-5-shuicheng.lin@intel.com
(cherry picked from commit b07bac9bd7)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-18 18:10:34 +01:00
Deepanshu Kartikey
a58383fa45 block: add allocation size check in blkdev_pr_read_keys()
blkdev_pr_read_keys() takes num_keys from userspace and uses it to
calculate the allocation size for keys_info via struct_size(). While
there is a check for SIZE_MAX (integer overflow), there is no upper
bound validation on the allocation size itself.

A malicious or buggy userspace can pass a large num_keys value that
doesn't trigger overflow but still results in an excessive allocation
attempt, causing a warning in the page allocator when the order exceeds
MAX_PAGE_ORDER.

Fix this by introducing PR_KEYS_MAX to limit the number of keys to
a sane value. This makes the SIZE_MAX check redundant, so remove it.
Also switch to kvzalloc/kvfree to handle larger allocations gracefully.

Fixes: 22a1ffea5f ("block: add IOC_PR_READ_KEYS ioctl")
Tested-by: syzbot+660d079d90f8a1baf54d@syzkaller.appspotmail.com
Reported-by: syzbot+660d079d90f8a1baf54d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=660d079d90f8a1baf54d
Link: https://lore.kernel.org/all/20251212013510.3576091-1-kartikey406@gmail.com/T/ [v1]
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-17 07:35:22 -07:00
Arnd Bergmann
d95e99a74e RDMA/irdma: Fix irdma_alloc_ucontext_resp padding
A recent commit modified struct irdma_alloc_ucontext_resp by adding a
member with implicit padding in front of it, though this does not change
the offset of the data members other than m68k. Reported by
scripts/check-uapi.sh:

==== ABI differences detected in include/rdma/irdma-abi.h from 1dd7bde2e91c -> HEAD ====
    [C] 'struct irdma_alloc_ucontext_resp' changed:
      type size changed from 704 to 640 (in bits)
      1 data member deletion:
        '__u8 rsvd3[2]', at offset 640 (in bits) at irdma-abi.h:61:1
      1 data member insertion:
        '__u8 revd3[2]', at offset 592 (in bits) at irdma-abi.h:60:1

Change the size back to the previous version, and remove the implicit
padding by making it explicit and matching what x86-64 would do by placing
max_hw_srq_quanta member into a naturally aligned location.

Fixes: 563e1feb5f ("RDMA/irdma: Add SRQ support")
Link: https://patch.msgid.link/r/20251208133849.315451-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-12-16 21:38:45 -04:00
Arnd Bergmann
2dc675f614 RDMA/ucma: Fix rdma_ucm_query_ib_service_resp struct padding
On a few 32-bit architectures, the newly added ib_user_service_rec
structure is not 64-bit aligned the way it is on most regular ones.

Add explicit padding into the rdma_ucm_query_ib_service_resp and
rdma_ucm_resolve_ib_service structures that embed it, so that the layout
is compatible across all of them.

This is an ABI change on i386, aligning it with x86_64 and the other
64-bit architectures to avoid having to use a compat ioctl handler.

Fixes: 810f874eda ("RDMA/ucma: Support query resolved service records")
Link: https://patch.msgid.link/r/20251208133311.313977-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-12-16 21:37:02 -04:00
Bhavik Sachdev
0e5032237e statmount: accept fd as a parameter
Extend `struct mnt_id_req` to take in a fd and introduce STATMOUNT_BY_FD
flag. When a valid fd is provided and STATMOUNT_BY_FD is set, statmount
will return mountinfo about the mount the fd is on.

This even works for "unmounted" mounts (mounts that have been umounted
using umount2(mnt, MNT_DETACH)), if you have access to a file descriptor
on that mount. These "umounted" mounts will have no mountpoint and no
valid mount namespace. Hence, we unset the STATMOUNT_MNT_POINT and
STATMOUNT_MNT_NS_ID in statmount.mask for "unmounted" mounts.

In case of STATMOUNT_BY_FD, given that we already have access to an fd
on the mount, accessing mount information without a capability check
seems fine because of the following reasons:

- All fs related information is available via fstatfs() without any
  capability check.
- Mount information is also available via /proc/pid/mountinfo (without
  any capability check).
- Given that we have access to a fd on the mount which tells us that we
  had access to the mount at some point (or someone that had access gave
  us the fd). So, we should be able to access mount info.

Co-developed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Link: https://patch.msgid.link/20251129091455.757724-3-b.sachdev1904@gmail.com
Acked-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15 14:13:14 +01:00