Commit Graph

1368207 Commits

Author SHA1 Message Date
Linus Torvalds
15d90a5e55 Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
 "Cleanups for the kernel's CRC (cyclic redundancy check) code:

   - Use __ro_after_init where appropriate

   - Remove unnecessary static_key on s390

   - Rename some source code files

   - Rename the crc32 and crc32c crypto API modules

   - Use subsys_initcall instead of arch_initcall

   - Restore maintainers for crc_kunit.c

   - Fold crc16_byte() into crc16.c

   - Add some SPDX license identifiers"

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crc32: add SPDX license identifier
  lib/crc16: unexport crc16_table and crc16_byte()
  w1: ds2406: use crc16() instead of crc16_byte() loop
  MAINTAINERS: add crc_kunit.c back to CRC LIBRARY
  lib/crc: make arch-optimized code use subsys_initcall
  crypto: crc32 - remove "generic" from file and module names
  x86/crc: drop "glue" from filenames
  sparc/crc: drop "glue" from filenames
  s390/crc: drop "glue" from filenames
  powerpc/crc: rename crc32-vpmsum_core.S to crc-vpmsum-template.S
  powerpc/crc: drop "glue" from filenames
  arm64/crc: drop "glue" from filenames
  arm/crc: drop "glue" from filenames
  s390/crc32: Remove no-op module init and exit functions
  s390/crc32: Remove have_vxrs static key
  lib/crc: make the CPU feature static keys __ro_after_init
2025-05-26 13:32:06 -07:00
Baris Can Goral
5bccdc51f9 replace strncpy with strscpy_pad
The strncpy() function is actively dangerous to use since it may not
NULL-terminate the destination string, resulting in potential memory
content exposures, unbounded reads, or crashes.
Link: https://github.com/KSPP/linux/issues/90

In addition, strscpy_pad is more appropriate because it also zero-fills
any remaining space in the destination if the source is shorter than
the provided buffer size.

Signed-off-by: Baris Can Goral <goralbaris@gmail.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250521161036.14489-1-goralbaris@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 22:28:44 +02:00
Linus Torvalds
14f19dc644 Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux
Pull fscrypt update from Eric Biggers:
 "Add support for 'hardware-wrapped inline encryption keys' to fscrypt.

  When enabled on supported platforms, this feature protects file
  contents keys from certain attacks, such as cold boot attacks.

  This feature uses the block layer support for wrapped keys which was
  merged in 6.15. Wrapped key support has existed out-of-tree in Android
  for a long time, and it's finally ready for upstream now that there is
  a platform on which it works end-to-end with upstream.

  Specifically, it works on the Qualcomm SM8650 HDK, using the Qualcomm
  ICE (Inline Crypto Engine) and HWKM (Hardware Key Manager). The
  corresponding driver support is included in the SCSI tree for 6.16.

  Validation for this feature includes two new tests that were already
  merged into xfstests (generic/368 and generic/369)"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  fscrypt: add support for hardware-wrapped keys
2025-05-26 13:27:40 -07:00
Paolo Bonzini
1f7c9d52b1 Merge tag 'kvm-riscv-6.16-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.16

- Add vector registers to get-reg-list selftest
- VCPU reset related improvements
- Remove scounteren initialization from VCPU reset
- Support VCPU reset from userspace using set_mpstate() ioctl
2025-05-26 16:27:00 -04:00
Paolo Bonzini
4d526b02df Merge tag 'kvmarm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.16

* New features:

  - Add large stage-2 mapping support for non-protected pKVM guests,
    clawing back some performance.

  - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and
    protected modes.

  - Enable nested virtualisation support on systems that support it
    (yes, it has been a long time coming), though it is disabled by
    default.

* Improvements, fixes and cleanups:

  - Large rework of the way KVM tracks architecture features and links
    them with the effects of control bits. This ensures correctness of
    emulation (the data is automatically extracted from the published
    JSON files), and helps dealing with the evolution of the
    architecture.

  - Significant changes to the way pKVM tracks ownership of pages,
    avoiding page table walks by storing the state in the hypervisor's
    vmemmap. This in turn enables the THP support described above.

  - New selftest checking the pKVM ownership transition rules

  - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
    even if the host didn't have it.

  - Fixes for the address translation emulation, which happened to be
    rather buggy in some specific contexts.

  - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
    from the number of counters exposed to a guest and addressing a
    number of issues in the process.

  - Add a new selftest for the SVE host state being corrupted by a
    guest.

  - Keep HCR_EL2.xMO set at all times for systems running with the
    kernel at EL2, ensuring that the window for interrupts is slightly
    bigger, and avoiding a pretty bad erratum on the AmpereOne HW.

  - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
    from a pretty bad case of TLB corruption unless accesses to HCR_EL2
    are heavily synchronised.

  - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
    tables in a human-friendly fashion.

  - and the usual random cleanups.
2025-05-26 16:19:46 -04:00
Paolo Bonzini
85502b2214 Merge tag 'loongarch-kvm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.16

1. Don't flush tlb if HW PTW supported.
2. Add LoongArch KVM selftests support.
2025-05-26 16:12:13 -04:00
Paolo Bonzini
2bb0e39885 Documentation: virt/kvm: remove unreferenced footnote
Replace it with just the URL.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-26 16:12:01 -04:00
Linus Torvalds
f83fcb87f8 Merge tag 'xfs-merge-6.16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Carlos Maiolino:

 - Atomic writes for XFS

 - Remove experimental warnings for pNFS, scrub and parent pointers

* tag 'xfs-merge-6.16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits)
  xfs: add inode to zone caching for data placement
  xfs: free the item in xfs_mru_cache_insert on failure
  xfs: remove the EXPERIMENTAL warning for pNFS
  xfs: remove some EXPERIMENTAL warnings
  xfs: Remove deprecated xfs_bufd sysctl parameters
  xfs: stop using set_blocksize
  xfs: allow sysadmins to specify a maximum atomic write limit at mount time
  xfs: update atomic write limits
  xfs: add xfs_calc_atomic_write_unit_max()
  xfs: add xfs_file_dio_write_atomic()
  xfs: commit CoW-based atomic writes atomically
  xfs: add large atomic writes checks in xfs_direct_write_iomap_begin()
  xfs: add xfs_atomic_write_cow_iomap_begin()
  xfs: refine atomic write size check in xfs_file_write_iter()
  xfs: refactor xfs_reflink_end_cow_extent()
  xfs: allow block allocator to take an alignment hint
  xfs: ignore HW which cannot atomic write a single block
  xfs: add helpers to compute transaction reservation for finishing intent items
  xfs: add helpers to compute log item overhead
  xfs: separate out setting buftarg atomic writes limits
  ...
2025-05-26 12:56:01 -07:00
Wentao Liang
f0b50730bd net/mlx5_core: Add error handling inmlx5_query_nic_vport_qkey_viol_cntr()
The function mlx5_query_nic_vport_qkey_viol_cntr() calls the function
mlx5_query_nic_vport_context() but does not check its return value. This
could lead to undefined behavior if the query fails. A proper
implementation can be found in mlx5_nic_vport_query_local_lb().

Add error handling for mlx5_query_nic_vport_context(). If it fails, free
the out buffer via kvfree() and return error code.

Fixes: 9efa752545 ("net/mlx5_core: Introduce access functions to query vport RoCE fields")
Cc: stable@vger.kernel.org # v4.5
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250521133620.912-1-vulab@iscas.ac.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 21:49:38 +02:00
Linus Torvalds
79b98edf91 Merge tag 'erofs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
 "In this cycle, Intel QAT hardware accelerators are supported to
  improve DEFLATE decompression performance. I've tested it with the
  enwik9 dataset of 1 MiB pclusters on our Intel Sapphire Rapids
  bare-metal server and a PL0 ESSD, and the sequential read performance
  even surpasses LZ4 software decompression on this setup.

  In addition, a `fsoffset` mount option is introduced for file-backed
  mounts to specify the filesystem offset in order to adapt customized
  container formats.

  And other improvements and minor cleanups. Summary:

   - Add a `fsoffset` mount option to specify the filesystem offset

   - Support Intel QAT accelerators to boost up the DEFLATE algorithm

   - Initialize per-CPU workers and CPU hotplug hooks lazily to avoid
     unnecessary overhead when EROFS is not mounted

   - Fix file handle encoding for 64-bit NIDs

   - Minor cleanups"

* tag 'erofs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: support DEFLATE decompression by using Intel QAT
  erofs: clean up erofs_{init,exit}_sysfs()
  erofs: add 'fsoffset' mount option to specify filesystem offset
  erofs: lazily initialize per-CPU workers and CPU hotplug hooks
  erofs: refine readahead tracepoint
  erofs: avoid using multiple devices with different type
  erofs: fix file handle encoding for 64-bit NIDs
2025-05-26 12:47:41 -07:00
Horatiu Vultur
57ee9584fd net: lan966x: Fix 1-step timestamping over ipv4 or ipv6
When enabling 1-step timestamping for ptp frames that are over udpv4 or
udpv6 then the inserted timestamp is added at the wrong offset in the
frame, meaning that will modify the frame at the wrong place, so the
frame will be malformed.
To fix this, the HW needs to know which kind of frame it is to know
where to insert the timestamp. For that there is a field in the IFH that
says the PDU_TYPE, which can be NONE  which is the default value,
IPV4 or IPV6. Therefore make sure to set the PDU_TYPE so the HW knows
where to insert the timestamp.
Like I mention before the issue is not seen with L2 frames because by
default the PDU_TYPE has a value of 0, which represents the L2 frames.

Fixes: 77eecf25bd ("net: lan966x: Update extraction/injection for timestamping")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250521124159.2713525-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 21:44:15 +02:00
Linus Torvalds
522544fc71 Merge tag 'bcachefs-2025-05-24' of git://evilpiepirate.org/bcachefs
Pull bcachefs updates from Kent Overstreet:

 - Poisoned extents can now be moved: this lets us handle bitrotted data
   without deleting it. For now, reading from poisoned extents only
   returns -EIO: in the future we'll have an API for specifying "read
   this data even if there were bitflips".

 - Incompatible features may now be enabled at runtime, via
   "opts/version_upgrade" in sysfs. Toggle it to incompatible, and then
   toggle it back - option changes via the sysfs interface are
   persistent.

 - Various changes to support deployable disk images:

     - RO mounts now use less memory

     - Images may be stripped of alloc info, particularly useful for
       slimming them down if they will primarily be mounted RO. Alloc
       info will be automatically regenerated on first RW mount, and
       this is quite fast

     - Filesystem images generated with 'bcachefs image' will be
       automatically resized the first time they're mounted on a larger
       device

   The images 'bcachefs image' generates with compression enabled have
   been comparable in size to those generated by squashfs and erofs -
   but you get a full RW capable filesystem

 - Major error message improvements for btree node reads, data reads,
   and elsewhere. We now build up a single error message that lists all
   the errors encountered, actions taken to repair, and success/failure
   of the IO. This extends to other error paths that may kick off other
   actions, e.g. scheduling recovery passes: actions we took because of
   an error are included in that error message, with
   grouping/indentation so we can see what caused what.

 - New option, 'rebalance_on_ac_only'. Does exactly what the name
   suggests, quite handy with background compression.

 - Repair/self healing:

     - We can now kick off recovery passes and run them in the
       background if we detect errors. Currently, this is just used by
       code that walks backpointers. We now also check for missing
       backpointers at runtime and run check_extents_to_backpointers if
       required. The messy 6.14 upgrade left missing backpointers for
       some users, and this will correct that automatically instead of
       requiring a manual fsck - some users noticed this as copygc
       spinning and not making progress.

       In the future, as more recovery passes come online, we'll be able
       to repair and recover from nearly anything - except for
       unreadable btree nodes, and that's why you're using replication,
       of course - without shutting down the filesystem.

     - There's a new recovery pass, for checking the rebalance_work
       btree, which tracks extents that rebalance will process later.

 - Hardening:

     - Close the last known hole in btree iterator/btree locking
       assertions: path->should_be_locked paths must stay locked until
       the end of the transaction. This shook out a few bugs, including
       a performance issue that was causing unnecessary path_upgrade
       transaction restarts.

 - Performance:

     - Faster snapshot deletion: this is an incompatible feature, as it
       requires new sentinal values, for safety. Snapshot deletion no
       longer has to do a full metadata scan, it now just scans the
       inodes btree: if an extent/dirent/xattr is present for a given
       snapshot ID, we already require that an inode be present with
       that same snapshot ID.

       If/when users hit scalability limits again (ridiculously huge
       filesystems with lots of inodes, and many sparse snapshots), let
       me know - the next step will be to add an index from snapshot ID
       -> inode number, which won't be too hard.

     - Faster device removal: the "scan for pointers to this device" no
       longer does a full metadata scan, instead it walks backpointers.
       Like fast snapshot deletion this is another incompat feature: it
       also requires a new sentinal value, because we don't want to
       reuse these device IDs until after a fsck.

     - We're now coalescing redundant accounting updates prior to
       transaction commit, taking some pressure off the journal. Shortly
       we'll also be doing multiple extent updates in a transaction in
       the main write path, which combined with the previous should
       drastically cut down on the amount of metadata updates we have to
       journal.

 - Stack usage improvements: All allocator state has been moved off the
   stack

 - Debug improvements:

     - enumerated refcounts: The debug code previously used for
       filesystem write refs is now a small library, and used for other
       heavily used refcounts. Different users of a refcount are
       enumerated, making it much easier to debug refcount issues.

     - Async object debugging: There's a new kconfig option that makes
       various async objects (different types of bios, data updates,
       write ops, etc.) visible in debugfs, and it should be fast enough
       to leave on in production.

     - Various sets of assertions no longer require
       CONFIG_BCACHEFS_DEBUG, instead they're controlled by module
       parameters and static keys, meaning users won't need to compile
       custom kernels as often to help debug issues.

     - bch2_trans_kmalloc() calls can be tracked (there's a new kconfig
       option). With it on you can check the btree_transaction_stats in
       debugfs to see the bch2_trans_kmalloc() calls a transaction did
       when it used the most memory.

* tag 'bcachefs-2025-05-24' of git://evilpiepirate.org/bcachefs: (218 commits)
  bcachefs: Don't mount bs > ps without TRANSPARENT_HUGEPAGE
  bcachefs: Fix btree_iter_next_node() for new locking asserts
  bcachefs: Ensure we don't use a blacklisted journal seq
  bcachefs: Small check_fix_ptr fixes
  bcachefs: Fix opts.recovery_pass_last
  bcachefs: Fix allocate -> self healing path
  bcachefs: Fix endianness in casefold check/repair
  bcachefs: Path must be locked if trans->locked && should_be_locked
  bcachefs: Simplify bch2_path_put()
  bcachefs: Plumb btree_trans for more locking asserts
  bcachefs: Clear trans->locked before unlock
  bcachefs: Clear should_be_locked before unlock in key_cache_drop()
  bcachefs: bch2_path_get() reuses paths if upgrade_fails & !should_be_locked
  bcachefs: Give out new path if upgrade fails
  bcachefs: Fix btree_path_get_locks when not doing trans restart
  bcachefs: btree_node_locked_type_nowrite()
  bcachefs: Kill bch2_path_put_nokeep()
  bcachefs: bch2_journal_write_checksum()
  bcachefs: Reduce stack usage in data_update_index_update()
  bcachefs: bch2_trans_log_str()
  ...
2025-05-26 12:43:30 -07:00
Linus Torvalds
8fdabcd9c0 Merge tag 'gfs2-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:

 - Fix the long-standing warnings in inode_to_wb() when CONFIG_LOCKDEP
   is enabled: gfs2 doesn't support cgroup writeback and so inode->i_wb
   will never change. This is the counterpart of commit 9e888998ea
   ("writeback: fix false warning in inode_to_wb()")

 - Fix a hang introduced by commit 8d391972ae ("gfs2: Remove
   __gfs2_writepage()"): prevent gfs2_logd from creating transactions
   for jdata pages while trying to flush the log

 - Fix a race between gfs2_create_inode() and gfs2_evict_inode() by
   deallocating partially created inodes on the gfs2_create_inode()
   error path

 - Fix a bug in the journal head lookup code that could cause mount to
   fail after successful recovery

 - Various smaller fixes and cleanups from various people

* tag 'gfs2-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (23 commits)
  gfs2: No more gfs2_find_jhead caching
  gfs2: Get rid of duplicate log head lookup
  gfs2: Simplify clean_journal
  gfs2: Simplify gfs2_log_pointers_init
  gfs2: Move gfs2_log_pointers_init
  gfs2: Minor comments fix
  gfs2: Don't start unnecessary transactions during log flush
  gfs2: Move gfs2_trans_add_databufs
  gfs2: Rename jdata_dirty_folio to gfs2_jdata_dirty_folio
  gfs2: avoid inefficient use of crc32_le_shift()
  gfs2: Do not call iomap_zero_range beyond eof
  gfs: don't check for AOP_WRITEPAGE_ACTIVATE in gfs2_write_jdata_batch
  gfs2: Fix usage of bio->bi_status in gfs2_end_log_write
  gfs2: deallocate inodes in gfs2_create_inode
  gfs2: Move GIF_ALLOC_FAILED check out of gfs2_ea_dealloc
  gfs2: Move gfs2_dinode_dealloc
  gfs2: Don't reread inodes unnecessarily
  gfs2: gfs2_create_inode error handling fix
  gfs2: Remove unnecessary NULL check before free_percpu()
  gfs2: check sb_min_blocksize return value
  ...
2025-05-26 12:35:08 -07:00
Linus Torvalds
a56d3133bd Merge tag 'configfs-for-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux
Pull configfs updates from Andreas Hindborg:

 - Allow creation of rw files with custom permissions. This allows
   drivers to better protect secrets written through configfs

 - Fix a bug where an error condition did not cause an early return
   while populating attributes

 - Report ENOMEM rather than EFAULT when kvasprintf() fails in
   config_item_set_name()

 - Add a Rust API for configfs. This allows Rust drivers to use configfs
   through a memory safe interface

* tag 'configfs-for-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux:
  MAINTAINERS: add configfs Rust abstractions
  rust: configfs: add a sample demonstrating configfs usage
  rust: configfs: introduce rust support for configfs
  configfs: Correct error value returned by API config_item_set_name()
  configfs: Do not override creating attribute file failure in populate_attrs()
  configfs: Delete semicolon from macro type_print() definition
  configfs: Add CONFIGFS_ATTR_PERM helper
2025-05-26 12:28:55 -07:00
Linus Torvalds
5e82ed5ca4 Merge tag 'for-6.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
 "Apart from numerous cleanups, there are some performance improvements
  and one minor mount option update. There's one more radix-tree
  conversion (one remaining), and continued work towards enabling large
  folios (almost finished).

  Performance:

   - extent buffer conversion to xarray gains throughput and runtime
     improvements on metadata heavy operations doing writeback (sample
     test shows +50% throughput, -33% runtime)

   - extent io tree cleanups lead to performance improvements by
     avoiding unnecessary searches or repeated searches

   - more efficient extent unpinning when committing transaction
     (estimated run time improvement 3-5%)

  User visible changes:

   - remove standalone mount option 'nologreplay', deprecated in 5.9,
     replacement is 'rescue=nologreplay'

   - in scrub, update reporting, add back device stats message after
     detected errors (accidentally removed during recent refactoring)

  Core:

   - convert extent buffer radix tree to xarray

   - in subpage mode, move block perfect compression out of experimental
     build

   - in zoned mode, introduce sub block groups to allow managing special
     block groups, like the one for relocation or tree-log, to handle
     some corner cases of ENOSPC

   - in scrub, simplify bitmaps for block tracking status

   - continued preparations for large folios:
       - remove assertions for folio order 0
       - add support where missing: compression, buffered write, defrag,
         hole punching, subpage, send

   - fix fsync of files with no hard links not persisting deletion

   - reject tree blocks which are not nodesize aligned, a precaution
     from 4.9 times

   - move transaction abort calls closer to the error sites

   - remove usage of some struct bio_vec internals

   - simplifications in extent map

   - extent IO cleanups and optimizations

   - error handling improvements

   - enhanced ASSERT() macro with optional format strings

   - cleanups:
       - remove unused code
       - naming unifications, dropped __, added prefix
       - merge similar functions
       - use common helpers for various data structures"

* tag 'for-6.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (198 commits)
  btrfs: move misplaced comment of btrfs_path::keep_locks
  btrfs: remove standalone "nologreplay" mount option
  btrfs: use a single variable to track return value at btrfs_page_mkwrite()
  btrfs: don't return VM_FAULT_SIGBUS on failure to set delalloc for mmap write
  btrfs: simplify early error checking in btrfs_page_mkwrite()
  btrfs: pass true to btrfs_delalloc_release_space() at btrfs_page_mkwrite()
  btrfs: fix wrong start offset for delalloc space release during mmap write
  btrfs: fix harmless race getting delayed ref head count when running delayed refs
  btrfs: log error codes during failures when writing super blocks
  btrfs: simplify error return logic when getting folio at prepare_one_folio()
  btrfs: return real error from __filemap_get_folio() calls
  btrfs: remove superfluous return value check at btrfs_dio_iomap_begin()
  btrfs: fix invalid data space release when truncating block in NOCOW mode
  btrfs: update Kconfig option descriptions
  btrfs: update list of features built under experimental config
  btrfs: send: remove btrfs_debug() calls
  btrfs: use boolean for delalloc argument to btrfs_free_reserved_extent()
  btrfs: use boolean for delalloc argument to btrfs_free_reserved_bytes()
  btrfs: fold error checks when allocating ordered extent and update comments
  btrfs: check we grabbed inode reference when allocating an ordered extent
  ...
2025-05-26 12:24:43 -07:00
Rafael J. Wysocki
3e0c509fbd Merge branch 'pm-tools'
Merge a cpupower utility update for 6.16-rc1 that adds a systemd service
to run cpupower and changes binding's Makefile to use -lcpupower (John B.
Wyatt IV, Francesco Poli).

* pm-tools:
  cpupower: do not install files to /etc/default/
  cpupower: do not call systemctl at install time
  cpupower: do not write DESTDIR to cpupower.service
  cpupower: change binding's makefile to use -lcpupower
  cpupower: add a systemd service to run cpupower
2025-05-26 21:24:27 +02:00
Rafael J. Wysocki
76524ffd10 Merge branches 'pm-runtime' and 'pm-sleep'
Merge updates related to system sleep handling and runtime PM for 6.16-rc1:

 - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja
   Kalla).

 - Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki).

 - Add new devm_ functions for enabling runtime PM and runtime PM
   reference counting (Bence Csókás).

 - Remove size arguments from strscpy() calls in the hibernation core
   code (Thorsten Blum).

 - Adjust the handling of devices with asynchronous suspend enabled
   during system suspend and resume to start resuming them immediately
   after resuming their parents and to start suspending such a device
   immediately after suspending its first child (Rafael Wysocki).

 - Adjust messages printed during tasks freezing to avoid using
   pr_cont() (Andrew Sayers, Paul Menzel).

 - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan
   Zhang).

 - Add missing wakeup source attribute relax_count to sysfs and
   remove the space character at the end ofi the string produced by
   pm_show_wakelocks() (Zijun Hu).

 - Add configurable pm_test delay for hibernation (Zihuan Zhang).

 - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the
   cypd4226 device on Tegra boards from suspending prematurely (Jon
   Hunter).

 - Unbreak printing PM debug messages during hibernation and clean up
   some related code (Rafael Wysocki).

* pm-runtime:
  PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn()
  PM: sysfs: Move debug runtime PM attributes to runtime_attrs[]
  PM: runtime: Add new devm functions

* pm-sleep:
  PM: freezer: Rewrite restarting tasks log to remove stray *done.*
  PM: sleep: Introduce pm_sleep_transition_in_progress()
  PM: sleep: Introduce pm_suspend_in_progress()
  PM: sleep: Print PM debug messages during hibernation
  ucsi_ccg: Disable async suspend in ucsi_ccg_probe()
  PM: hibernate: add configurable delay for pm_test
  PM: wakeup: Delete space in the end of string shown by pm_show_wakelocks()
  PM: wakeup: Add missing wakeup source attribute relax_count
  PM: sleep: Remove unnecessary !!
  PM: sleep: Use two lines for "Restarting..." / "done" messages
  PM: sleep: Make suspend of devices more asynchronous
  PM: sleep: Suspend async parents after suspending children
  PM: sleep: Resume children after resuming the parent
  PM: hibernate: Remove size arguments when calling strscpy()
2025-05-26 21:21:58 +02:00
Rafael J. Wysocki
af86d7e88e Merge branch 'pm-cpuidle'
Merge cpuidle updates for 6.16-rc1:

 - Optimize bucket assignment when next_timer_ns equals KTIME_MAX in the
   menu cpuidle governor (Zhongqiu Han).

 - Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla).

 - Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem
   Bityutskiy).

 - Fix typos in two comments in the teo cpuidle governor (Atul Kumar
   Pant).

* pm-cpuidle:
  cpuidle: psci: Avoid initializing faux device if no DT idle states are present
  Documentation: ABI: testing: document the new cpuidle sysfs file
  Documentation: admin-guide: pm: Document intel_idle C1 demotion
  intel_idle: Add C1 demotion on/off sysfs knob
  cpuidle: psci: Transition to the faux device interface
  cpuidle: menu: Optimize bucket assignment when next_timer_ns equals KTIME_MAX
  cpuidle: teo: Fix typos in two comments
2025-05-26 21:18:34 +02:00
Linus Torvalds
49fffac983 Merge tag 'for-6.16/io_uring-20250523' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:

 - Avoid indirect function calls in io-wq for executing and freeing
   work.

   The design of io-wq is such that it can be a generic mechanism, but
   as it's just used by io_uring now, may as well avoid these indirect
   calls

 - Clean up registered buffers for networking

 - Add support for IORING_OP_PIPE. Pretty straight forward, allows
   creating pipes with io_uring, particularly useful for having these be
   instantiated as direct descriptors

 - Clean up the coalescing support fore registered buffers

 - Add support for multiple interface queues for zero-copy rx
   networking. As this feature was merged for 6.15 it supported just a
   single ifq per ring

 - Clean up the eventfd support

 - Add dma-buf support to zero-copy rx

 - Clean up and improving the request draining support

 - Clean up provided buffer support, most notably with an eye toward
   making the legacy support less intrusive

 - Minor fdinfo cleanups, dropping support for dumping what credentials
   are registered

 - Improve support for overflow CQE handling, getting rid of GFP_ATOMIC
   for allocating overflow entries where possible

 - Improve detection of cases where io-wq doesn't need to spawn a new
   worker unnecessarily

 - Various little cleanups

* tag 'for-6.16/io_uring-20250523' of git://git.kernel.dk/linux: (59 commits)
  io_uring/cmd: warn on reg buf imports by ineligible cmds
  io_uring/io-wq: only create a new worker if it can make progress
  io_uring/io-wq: ignore non-busy worker going to sleep
  io_uring/io-wq: move hash helpers to the top
  trace/io_uring: fix io_uring_local_work_run ctx documentation
  io_uring: finish IOU_OK -> IOU_COMPLETE transition
  io_uring: add new helpers for posting overflows
  io_uring: pass in struct io_big_cqe to io_alloc_ocqe()
  io_uring: make io_alloc_ocqe() take a struct io_cqe pointer
  io_uring: split alloc and add of overflow
  io_uring: open code io_req_cqe_overflow()
  io_uring/fdinfo: get rid of dumping credentials
  io_uring/fdinfo: only compile if CONFIG_PROC_FS is set
  io_uring/kbuf: unify legacy buf provision and removal
  io_uring/kbuf: refactor __io_remove_buffers
  io_uring/kbuf: don't compute size twice on prep
  io_uring/kbuf: drop extra vars in io_register_pbuf_ring
  io_uring/kbuf: use mem_is_zero()
  io_uring/kbuf: account ring io_buffer_list memory
  io_uring: drain based on allocates reqs
  ...
2025-05-26 12:13:22 -07:00
Linus Torvalds
6f59de9bc0 Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:

 - ublk updates:
      - Add support for updating the size of a ublk instance
      - Zero-copy improvements
      - Auto-registering of buffers for zero-copy
      - Series simplifying and improving GET_DATA and request lookup
      - Series adding quiesce support
      - Lots of selftests additions
      - Various cleanups

 - NVMe updates via Christoph:
      - add per-node DMA pools and use them for PRP/SGL allocations
        (Caleb Sander Mateos, Keith Busch)
      - nvme-fcloop refcounting fixes (Daniel Wagner)
      - support delayed removal of the multipath node and optionally
        support the multipath node for private namespaces (Nilay Shroff)
      - support shared CQs in the PCI endpoint target code (Wilfred
        Mallawa)
      - support admin-queue only authentication (Hannes Reinecke)
      - use the crc32c library instead of the crypto API (Eric Biggers)
      - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes
        Reinecke, Leon Romanovsky, Gustavo A. R. Silva)

 - MD updates via Yu:
      - Fix that normal IO can be starved by sync IO, found by mkfs on
        newly created large raid5, with some clean up patches for bdev
        inflight counters

 - Clean up brd, getting rid of atomic kmaps and bvec poking

 - Add loop driver specifically for zoned IO testing

 - Eliminate blk-rq-qos calls with a static key, if not enabled

 - Improve hctx locking for when a plug has IO for multiple queues
   pending

 - Remove block layer bouncing support, which in turn means we can
   remove the per-node bounce stat as well

 - Improve blk-throttle support

 - Improve delay support for blk-throttle

 - Improve brd discard support

 - Unify IO scheduler switching. This should also fix a bunch of lockdep
   warnings we've been seeing, after enabling lockdep support for queue
   freezing/unfreezeing

 - Add support for block write streams via FDP (flexible data placement)
   on NVMe

 - Add a bunch of block helpers, facilitating the removal of a bunch of
   duplicated boilerplate code

 - Remove obsolete BLK_MQ pci and virtio Kconfig options

 - Add atomic/untorn write support to blktrace

 - Various little cleanups and fixes

* tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits)
  selftests: ublk: add test for UBLK_F_QUIESCE
  ublk: add feature UBLK_F_QUIESCE
  selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE
  traceevent/block: Add REQ_ATOMIC flag to block trace events
  ublk: run auto buf unregisgering in same io_ring_ctx with registering
  io_uring: add helper io_uring_cmd_ctx_handle()
  ublk: remove io argument from ublk_auto_buf_reg_fallback()
  ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch()
  selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK
  selftests: ublk: support UBLK_F_AUTO_BUF_REG
  ublk: support UBLK_AUTO_BUF_REG_FALLBACK
  ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG
  ublk: prepare for supporting to register request buffer automatically
  ublk: convert to refcount_t
  selftests: ublk: make IO & device removal test more stressful
  nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk
  nvme: introduce multipath_always_on module param
  nvme-multipath: introduce delayed removal of the multipath head node
  nvme-pci: derive and better document max segments limits
  nvme-pci: use struct_size for allocation struct nvme_dev
  ...
2025-05-26 11:39:36 -07:00
Jack Morgenstein
92a251c3df RDMA/cma: Fix hang when cma_netevent_callback fails to queue_work
The cited commit fixed a crash when cma_netevent_callback was called for
a cma_id while work on that id from a previous call had not yet started.
The work item was re-initialized in the second call, which corrupted the
work item currently in the work queue.

However, it left a problem when queue_work fails (because the item is
still pending in the work queue from a previous call). In this case,
cma_id_put (which is called in the work handler) is therefore not
called. This results in a userspace process hang (zombie process).

Fix this by calling cma_id_put() if queue_work fails.

Fixes: 45f5dcdd04 ("RDMA/cma: Fix workqueue crash in cma_netevent_work_handler")
Link: https://patch.msgid.link/r/4f3640b501e48d0166f312a64fdadf72b059bd04.1747827103.git.leon@kernel.org
Signed-off-by: Jack Morgenstein <jackm@nvidia.com>
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Sharath Srinivasan <sharath.srinivasan@oracle.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-05-26 15:36:46 -03:00
Jason Gunthorpe
ef2233850e Merge tag 'v6.15' into rdma.git for-next
Following patches need the RDMA rc branch since we are past the RC cycle
now.

Merge conflicts resolved based on Linux-next:

- For RXE odp changes keep for-next version and fixup new places that
  need to call is_odp_mr()
  https://lore.kernel.org/r/20250422143019.500201bd@canb.auug.org.au
  https://lore.kernel.org/r/20250514122455.3593b083@canb.auug.org.au

- irdma is keeping the while/kfree bugfix from -rc and the pf/cdev_info
  change from for-next
  https://lore.kernel.org/r/20250513130630.280ee6c5@canb.auug.org.au

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-05-26 15:33:52 -03:00
Linus Torvalds
3e406741b1 Merge tag 'vfs-6.16-rc1.selftests' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs selftests updates from Christian Brauner:
 "This contains various cleanups, fixes, and extensions for out
  filesystem selftests"

* tag 'vfs-6.16-rc1.selftests' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/fs/mount-notify: add a test variant running inside userns
  selftests/filesystems: create setup_userns() helper
  selftests/filesystems: create get_unique_mnt_id() helper
  selftests/fs/mount-notify: build with tools include dir
  selftests/mount_settattr: remove duplicate syscall definitions
  selftests/pidfd: move syscall definitions into wrappers.h
  selftests/fs/statmount: build with tools include dir
  selftests/filesystems: move wrapper.h out of overlayfs subdir
  selftests/mount_settattr: ensure that ext4 filesystem can be created
  selftests/mount_settattr: add missing STATX_MNT_ID_UNIQUE define
  selftests/mount_settattr: don't define sys_open_tree() twice
2025-05-26 11:32:28 -07:00
Linus Torvalds
a2e43397e5 Merge tag 'vfs-6.16-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull iomap updates from Christian Brauner:

 - More fallout and preparatory work associated with the folio batch
   prototype posted a while back.

   Mainly this just cleans up some of the helpers and pushes some
   pos/len trimming further down in the write begin path.

 - Add missing flag descriptions to the iomap documentation

* tag 'vfs-6.16-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iomap: rework iomap_write_begin() to return folio offset and length
  iomap: push non-large folio check into get folio path
  iomap: helper to trim pos/bytes to within folio
  iomap: drop pos param from __iomap_[get|put]_folio()
  iomap: drop unnecessary pos param from iomap_write_[begin|end]
  iomap: resample iter->pos after iomap_write_begin() calls
  iomap: trace: Add missing flags to [IOMAP_|IOMAP_F_]FLAGS_STRINGS
  Documentation: iomap: Add missing flags description
2025-05-26 11:28:42 -07:00
Rafael J. Wysocki
f34dc28343 Merge branch 'pm-cpufreq'
Merge cpufreq updates for 6.16-rc1:

 - Refactor cpufreq_online(), add and use cpufreq policy locking guards,
   use __free() in policy reference counting, and clean up core cpufreq
   code on top of that (Rafael Wysocki).

 - Fix boost handling on CPU suspend/resume and sysfs updates (Viresh
   Kumar).

 - Fix des_perf clamping with max_perf in amd_pstate_update() (Dhananjay
   Ugwekar).

 - Add offline, online and suspend callbacks to the amd-pstate driver,
   rename and use the existing amd_pstate_epp callbacks in it (Dhananjay
   Ugwekar).

 - Add support for the "Requested CPU Min frequency" BIOS option to the
   amd-pstate driver (Dhananjay Ugwekar).

 - Reset amd-pstate driver mode after running selftests (Swapnil
   Sapkal).

 - Add helper for governor checks to the schedutil cpufreq governor and
   move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki).

 - Populate the cpu_capacity sysfs entries from the intel_pstate driver
   after registering asym capacity support (Ricardo Neri).

 - Add support for enabling Energy-aware scheduling (EAS) to the
   intel_pstate driver when operating in the passive mode on a hybrid
   platform (Rafael Wysocki).

 - Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan
   Chancellor).

 - Drop redundant cpus_read_lock() from store_local_boost() in the
   cpufreq core (Seyediman Seyedarab).

 - Replace sscanf() with kstrtouint() in the cpufreq code and use a
   symbol instead of a raw number in it (Bowen Yu).

 - Add support for autonomous CPU performance state selection to the
   CPPC cpufreq driver (Lifeng Zheng).

* pm-cpufreq: (31 commits)
  cpufreq: CPPC: Add support for autonomous selection
  cpufreq: Update sscanf() to kstrtouint()
  cpufreq: Replace magic number
  cpufreq: drop redundant cpus_read_lock() from store_local_boost()
  cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver()
  cpufreq: intel_pstate: Document hybrid processor support
  cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache
  cpufreq: intel_pstate: EAS support for hybrid platforms
  cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas()
  cpufreq: intel_pstate: Populate the cpu_capacity sysfs entries
  arch_topology: Relocate cpu_scale to topology.[h|c]
  cpufreq/sched: Move cpufreq-specific EAS checks to cpufreq
  cpufreq/sched: schedutil: Add helper for governor checks
  amd-pstate-ut: Reset amd-pstate driver mode after running selftests
  cpufreq/amd-pstate: Add support for the "Requested CPU Min frequency" BIOS option
  cpufreq/amd-pstate: Add offline, online and suspend callbacks for amd_pstate_driver
  cpufreq: Force sync policy boost with global boost on sysfs update
  cpufreq: Preserve policy's boost state after resume
  cpufreq: Introduce policy_set_boost()
  cpufreq: Don't unnecessarily call set_boost()
  ...
2025-05-26 20:19:40 +02:00
Linus Torvalds
c5bfc48d54 Merge tag 'vfs-6.16-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull coredump updates from Christian Brauner:
 "This adds support for sending coredumps over an AF_UNIX socket. It
  also makes (implicit) use of the new SO_PEERPIDFD ability to hand out
  pidfds for reaped peer tasks

  The new coredump socket will allow userspace to not have to rely on
  usermode helpers for processing coredumps and provides a saf way to
  handle them instead of relying on super privileged coredumping helpers

  This will also be significantly more lightweight since the kernel
  doens't have to do a fork()+exec() for each crashing process to spawn
  a usermodehelper. Instead the kernel just connects to the AF_UNIX
  socket and userspace can process it concurrently however it sees fit.
  Support for userspace is incoming starting with systemd-coredump

  There's more work coming in that direction next cycle. The rest below
  goes into some details and background

  Coredumping currently supports two modes:

   (1) Dumping directly into a file somewhere on the filesystem.

   (2) Dumping into a pipe connected to a usermode helper process
       spawned as a child of the system_unbound_wq or kthreadd

  For simplicity I'm mostly ignoring (1). There's probably still some
  users of (1) out there but processing coredumps in this way can be
  considered adventurous especially in the face of set*id binaries

  The most common option should be (2) by now. It works by allowing
  userspace to put a string into /proc/sys/kernel/core_pattern like:

          |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

  The "|" at the beginning indicates to the kernel that a pipe must be
  used. The path following the pipe indicator is a path to a binary that
  will be spawned as a usermode helper process. Any additional
  parameters pass information about the task that is generating the
  coredump to the binary that processes the coredump

  In the example the core_pattern shown causes the kernel to spawn
  systemd-coredump as a usermode helper. There's various conceptual
  consequences of this (non-exhaustive list):

   - systemd-coredump is spawned with file descriptor number 0 (stdin)
     connected to the read-end of the pipe. All other file descriptors
     are closed. That specifically includes 1 (stdout) and 2 (stderr).

     This has already caused bugs because userspace assumed that this
     cannot happen (Whether or not this is a sane assumption is
     irrelevant)

   - systemd-coredump will be spawned as a child of system_unbound_wq.
     So it is not a child of any userspace process and specifically not
     a child of PID 1. It cannot be waited upon and is in a weird hybrid
     upcall which are difficult for userspace to control correctly

   - systemd-coredump is spawned with full kernel privileges. This
     necessitates all kinds of weird privilege dropping excercises in
     userspace to make this safe

   - A new usermode helper has to be spawned for each crashing process

  This adds a new mode:

   (3) Dumping into an AF_UNIX socket

  Userspace can set /proc/sys/kernel/core_pattern to:

          @/path/to/coredump.socket

  The "@" at the beginning indicates to the kernel that an AF_UNIX
  coredump socket will be used to process coredumps

  The coredump socket must be located in the initial mount namespace.
  When a task coredumps it opens a client socket in the initial network
  namespace and connects to the coredump socket:

   - The coredump server uses SO_PEERPIDFD to get a stable handle on the
     connected crashing task. The retrieved pidfd will provide a stable
     reference even if the crashing task gets SIGKILLed while generating
     the coredump. That is a huge attack vector right now

   - By setting core_pipe_limit non-zero userspace can guarantee that
     the crashing task cannot be reaped behind it's back and thus
     process all necessary information in /proc/<pid>. The SO_PEERPIDFD
     can be used to detect whether /proc/<pid> still refers to the same
     process

     The core_pipe_limit isn't used to rate-limit connections to the
     socket. This can simply be done via AF_UNIX socket directly

   - The pidfd for the crashing task will contain information how the
     task coredumps. The PIDFD_GET_INFO ioctl gained a new flag
     PIDFD_INFO_COREDUMP which can be used to retreive the coredump
     information

     If the coredump gets a new coredump client connection the kernel
     guarantees that PIDFD_INFO_COREDUMP information is available.

     Currently the following information is provided in the new
     @coredump_mask extension to struct pidfd_info:

      * PIDFD_COREDUMPED is raised if the task did actually coredump

      * PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping
        (e.g., undumpable)

      * PIDFD_COREDUMP_USER is raised if this is a regular coredump and
        doesn't need special care by the coredump server

      * PIDFD_COREDUMP_ROOT is raised if the generated coredump should
        be treated as sensitive and the coredump server should restrict
        access to the generated coredump to sufficiently privileged
        users"

* tag 'vfs-6.16-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  mips, net: ensure that SOCK_COREDUMP is defined
  selftests/coredump: add tests for AF_UNIX coredumps
  selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructure
  coredump: validate socket name as it is written
  coredump: show supported coredump modes
  pidfs, coredump: add PIDFD_INFO_COREDUMP
  coredump: add coredump socket
  coredump: reflow dump helpers a little
  coredump: massage do_coredump()
  coredump: massage format_corename()
2025-05-26 11:17:01 -07:00
Rafael J. Wysocki
e481e10ab5 Merge branch 'pm-em'
Merge energy model management code updates for 6.16-rc1:

 - Fix potential division-by-zero error in em_compute_costs() (Yaxiong
   Tian).

 - Fix typos in energy model documentation and example driver code (Moon
   Hee Lee, Atul Kumar Pant).

 - Rearrange the energy model management code and add a new function for
   adjusting a CPU energy model after adjusting the capacity of the
   given CPU to it (Rafael Wysocki).

* pm-em:
  PM: EM: Introduce em_adjust_cpu_capacity()
  PM: EM: Move CPU capacity check to em_adjust_new_capacity()
  PM: EM: Documentation: Fix typos in example driver code
  PM: EM: Documentation: fix typo in energy-model.rst
  PM: EM: Fix potential division-by-zero error in em_compute_costs()
2025-05-26 20:14:58 +02:00
Rafael J. Wysocki
db0e4d5429 Merge branches 'acpi-resource', 'acpi-pm', 'acpi-platform-profile' and 'acpi-docs'
Merge an ACPI resources management update, ACPI power management
updates, an ACPI platform profile driver fix and an ACPI documentation
update related to device properties for 6.16-rc1:

 - Fix a typo for MECHREVO in irq1_edge_low_force_override[] (Mingcong
   Bai).

 - Add an LPS0 check() callback to the AMD pinctrl driver and fix up
   config symbol dependencies in it (Mario Limonciello, Rafael Wysocki).

 - Avoid initializing the ACPI platform profile driver on non-ACPI
   platforms (Alexandre Ghiti).

 - Document that references to ACPI data (non-device) nodes should use
   string-only references in hierarchical data node packages (Sakari
   Ailus).

* acpi-resource:
  ACPI: resource: fix a typo for MECHREVO in irq1_edge_low_force_override[]

* acpi-pm:
  pinctrl: amd: Fix hibernation support with CONFIG_SUSPEND unset
  pinctrl: amd: Fix use of undeclared identifier 'pinctrl_amd_s2idle_dev_ops'
  pinctrl: amd: Add an LPS0 check() callback
  ACPI: Add missing prototype for non CONFIG_SUSPEND/CONFIG_X86 case

* acpi-platform-profile:
  ACPI: platform_profile: Avoid initializing on non-ACPI platforms

* acpi-docs:
  Documentation: ACPI: Use all-string data node references
2025-05-26 19:42:29 +02:00
Rafael J. Wysocki
f5b4df96ee Merge branches 'acpi-pci', 'acpi-battery', 'acpi-ec' and 'acpi-apei'
Merge an ACPI PCI root driver update, ACPI battery driver updates, an
ACPI EC driver update and APEI updates for 6.16-rc1:

 - Turn the acpi_pci_root_remap_iospace() fwnode_handle parameter into a
   const pointer (Pei Xiao).

 - Round battery capacity percengate in the ACPI battery driver to the
   closest integer to avoid user confusion (shitao).

 - Make the ACPI battery driver report the current as a negative number
   to the power supply framework when the battery is discharging as
   documented (Peter Marheine).

 - Add TUXEDO InfinityBook Pro AMD Gen9 to the acpi_ec_no_wakeup[] list
   to prevent spurious wakeups from suspend-to-idle (Werner Sembach).

 - Convert the APEI EINJ driver to a faux device one (Sudeep Holla, Jon
   Hunter).

 - Remove redundant calls to einj_get_available_error_type() from the
   APEI EINJ driver (Zaid Alali).

* acpi-pci:
  ACPI: PCI: Constify fwnode_handle in acpi_pci_root_remap_iospace()

* acpi-battery:
  ACPI: battery: negate current when discharging
  ACPI: battery: Round capacity percengate to closest integer

* acpi-ec:
  ACPI: EC: Add device to acpi_ec_no_wakeup[] qurik list

* acpi-apei:
  ACPI: APEI: EINJ: Remove redundant calls to einj_get_available_error_type()
  ACPI: APEI: EINJ: Fix probe error message
  ACPI: APEI: EINJ: Transition to the faux device interface
2025-05-26 19:31:50 +02:00
Linus Torvalds
7d7a103d29 Merge tag 'vfs-6.16-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfs updates from Christian Brauner:
 "Features:

   - Allow handing out pidfds for reaped tasks for AF_UNIX SO_PEERPIDFD
     socket option

     SO_PEERPIDFD is a socket option that allows to retrieve a pidfd for
     the process that called connect() or listen(). This is heavily used
     to safely authenticate clients in userspace avoiding security bugs
     due to pid recycling races (dbus, polkit, systemd, etc.)

     SO_PEERPIDFD currently doesn't support handing out pidfds if the
     sk->sk_peer_pid thread-group leader has already been reaped. In
     this case it currently returns EINVAL. Userspace still wants to get
     a pidfd for a reaped process to have a stable handle it can pass
     on. This is especially useful now that it is possible to retrieve
     exit information through a pidfd via the PIDFD_GET_INFO ioctl()'s
     PIDFD_INFO_EXIT flag

     Another summary has been provided by David Rheinsberg:

      > A pidfd can outlive the task it refers to, and thus user-space
      > must already be prepared that the task underlying a pidfd is
      > gone at the time they get their hands on the pidfd. For
      > instance, resolving the pidfd to a PID via the fdinfo must be
      > prepared to read `-1`.
      >
      > Despite user-space knowing that a pidfd might be stale, several
      > kernel APIs currently add another layer that checks for this. In
      > particular, SO_PEERPIDFD returns `EINVAL` if the peer-task was
      > already reaped, but returns a stale pidfd if the task is reaped
      > immediately after the respective alive-check.
      >
      > This has the unfortunate effect that user-space now has two ways
      > to check for the exact same scenario: A syscall might return
      > EINVAL/ESRCH/... *or* the pidfd might be stale, even though
      > there is no particular reason to distinguish both cases. This
      > also propagates through user-space APIs, which pass on pidfds.
      > They must be prepared to pass on `-1` *or* the pidfd, because
      > there is no guaranteed way to get a stale pidfd from the kernel.
      >
      > Userspace must already deal with a pidfd referring to a reaped
      > task as the task may exit and get reaped at any time will there
      > are still many pidfds referring to it

     In order to allow handing out reaped pidfd SO_PEERPIDFD needs to
     ensure that PIDFD_INFO_EXIT information is available whenever a
     pidfd for a reaped task is created by PIDFD_INFO_EXIT. The uapi
     promises that reaped pidfds are only handed out if it is guaranteed
     that the caller sees the exit information:

     TEST_F(pidfd_info, success_reaped)
     {
             struct pidfd_info info = {
                     .mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT,
             };

             /*
              * Process has already been reaped and PIDFD_INFO_EXIT been set.
              * Verify that we can retrieve the exit status of the process.
              */
             ASSERT_EQ(ioctl(self->child_pidfd4, PIDFD_GET_INFO, &info), 0);
             ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS));
             ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT));
             ASSERT_TRUE(WIFEXITED(info.exit_code));
             ASSERT_EQ(WEXITSTATUS(info.exit_code), 0);
     }

     To hand out pidfds for reaped processes we thus allocate a pidfs
     entry for the relevant sk->sk_peer_pid at the time the
     sk->sk_peer_pid is stashed and drop it when the socket is
     destroyed. This guarantees that exit information will always be
     recorded for the sk->sk_peer_pid task and we can hand out pidfds
     for reaped processes

   - Hand a pidfd to the coredump usermode helper process

     Give userspace a way to instruct the kernel to install a pidfd for
     the crashing process into the process started as a usermode helper.
     There's still tricky race-windows that cannot be easily or
     sometimes not closed at all by userspace. There's various ways like
     looking at the start time of a process to make sure that the
     usermode helper process is started after the crashing process but
     it's all very very brittle and fraught with peril

     The crashed-but-not-reaped process can be killed by userspace
     before coredump processing programs like systemd-coredump have had
     time to manually open a PIDFD from the PID the kernel provides
     them, which means they can be tricked into reading from an
     arbitrary process, and they run with full privileges as they are
     usermode helper processes

     Even if that specific race-window wouldn't exist it's still the
     safest and cleanest way to let the kernel provide the pidfd
     directly instead of requiring userspace to do it manually. In
     parallel with this commit we already have systemd adding support
     for this in [1]

     When the usermode helper process is forked we install a pidfd file
     descriptor three into the usermode helper's file descriptor table
     so it's available to the exec'd program

     Since usermode helpers are either children of the system_unbound_wq
     workqueue or kthreadd we know that the file descriptor table is
     empty and can thus always use three as the file descriptor number

     Note, that we'll install a pidfd for the thread-group leader even
     if a subthread is calling do_coredump(). We know that task linkage
     hasn't been removed yet and even if this @current isn't the actual
     thread-group leader we know that the thread-group leader cannot be
     reaped until
     @current has exited

   - Allow telling when a task has not been found from finding the wrong
     task when creating a pidfd

     We currently report EINVAL whenever a struct pid has no tasked
     attached anymore thereby conflating two concepts:

      (1) The task has already been reaped

      (2) The caller requested a pidfd for a thread-group leader but the
          pid actually references a struct pid that isn't used as a
          thread-group leader

     This is causing issues for non-threaded workloads as in where they
     expect ESRCH to be reported, not EINVAL

     So allow userspace to reliably distinguish between (1) and (2)

   - Make it possible to detect when a pidfs entry would outlive the
     struct pid it pinned

   - Add a range of new selftests

  Cleanups:

   - Remove unneeded NULL check from pidfd_prepare() for passed struct
     pid

   - Avoid pointless reference count bump during release_task()

  Fixes:

   - Various fixes to the pidfd and coredump selftests

   - Fix error handling for replace_fd() when spawning coredump usermode
     helper"

* tag 'vfs-6.16-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  pidfs: detect refcount bugs
  coredump: hand a pidfd to the usermode coredump helper
  coredump: fix error handling for replace_fd()
  pidfs: move O_RDWR into pidfs_alloc_file()
  selftests: coredump: Raise timeout to 2 minutes
  selftests: coredump: Fix test failure for slow machines
  selftests: coredump: Properly initialize pointer
  net, pidfs: enable handing out pidfds for reaped sk->sk_peer_pid
  pidfs: get rid of __pidfd_prepare()
  net, pidfs: prepare for handing out pidfds for reaped sk->sk_peer_pid
  pidfs: register pid in pidfs
  net, pidfd: report EINVAL for ESRCH
  release_task: kill the no longer needed get/put_pid(thread_pid)
  pidfs: ensure consistent ENOENT/ESRCH reporting
  exit: move wake_up_all() pidfd waiters into __unhash_process()
  selftest/pidfd: add test for thread-group leader pidfd open for thread
  pidfd: improve uapi when task isn't found
  pidfd: remove unneeded NULL check from pidfd_prepare()
  selftests/pidfd: adapt to recent changes
2025-05-26 10:30:02 -07:00
Stefano Garzarella
45ca7e9f07 vsock/virtio: fix rx_bytes accounting for stream sockets
In `struct virtio_vsock_sock`, we maintain two counters:
- `rx_bytes`: used internally to track how many bytes have been read.
  This supports mechanisms like .stream_has_data() and sock_rcvlowat().
- `fwd_cnt`: used for the credit mechanism to inform available receive
  buffer space to the remote peer.

These counters are updated via virtio_transport_inc_rx_pkt() and
virtio_transport_dec_rx_pkt().

Since the beginning with commit 06a8fc7836 ("VSOCK: Introduce
virtio_vsock_common.ko"), we call virtio_transport_dec_rx_pkt() in
virtio_transport_stream_do_dequeue() only when we consume the entire
packet, so partial reads, do not update `rx_bytes` and `fwd_cnt`.

This is fine for `fwd_cnt`, because we still have space used for the
entire packet, and we don't want to update the credit for the other
peer until we free the space of the entire packet. However, this
causes `rx_bytes` to be stale on partial reads.

Previously, this didn’t cause issues because `rx_bytes` was used only by
.stream_has_data(), and any unread portion of a packet implied data was
still available. However, since commit 93b8088766
("virtio/vsock: fix logic which reduces credit update messages"), we now
rely on `rx_bytes` to determine if a credit update should be sent when
the data in the RX queue drops below SO_RCVLOWAT value.

This patch fixes the accounting by updating `rx_bytes` with the number
of bytes actually read, even on partial reads, while leaving `fwd_cnt`
untouched until the packet is fully consumed. Also introduce a new
`buf_used` counter to check that the remote peer is honoring the given
credit; this was previously done via `rx_bytes`.

Fixes: 93b8088766 ("virtio/vsock: fix logic which reduces credit update messages")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250521121705.196379-1-sgarzare@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 19:04:21 +02:00
Linus Torvalds
2ca3534623 Merge tag 'vfs-6.16-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
 "This contains minor mount updates for this cycle:

   - mnt->mnt_devname can never be NULL so simplify the code handling
     that case

   - Add a comment about concurrent changes during statmount() and
     listmount()

   - Update the STATMOUNT_SUPPORTED macro

   - Convert mount flags to an enum"

* tag 'vfs-6.16-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  statmount: update STATMOUNT_SUPPORTED macro
  fs: convert mount flags to enum
  ->mnt_devname is never NULL
  mount: add a comment about concurrent changes with statmount()/listmount()
2025-05-26 09:55:56 -07:00
Paolo Abeni
f5b60d6a57 Merge tag 'nf-next-25-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next,
specifically 26 patches: 5 patches adding/updating selftests,
4 fixes, 3 PREEMPT_RT fixes, and 14 patches to enhance nf_tables):

1) Improve selftest coverage for pipapo 4 bit group format, from
   Florian Westphal.

2) Fix incorrect dependencies when compiling a kernel without
   legacy ip{6}tables support, also from Florian.

3) Two patches to fix nft_fib vrf issues, including selftest updates
   to improve coverage, also from Florian Westphal.

4) Fix incorrect nesting in nft_tunnel's GENEVE support, from
   Fernando F. Mancera.

5) Three patches to fix PREEMPT_RT issues with nf_dup infrastructure
   and nft_inner to match in inner headers, from Sebastian Andrzej Siewior.

6) Integrate conntrack information into nft trace infrastructure,
   from Florian Westphal.

7) A series of 13 patches to allow to specify wildcard netdevice in
   netdev basechain and flowtables, eg.

   table netdev filter {
       chain ingress {
           type filter hook ingress devices = { eth0, eth1, vlan* } priority 0; policy accept;
       }
   }

   This also allows for runtime hook registration on NETDEV_{UN}REGISTER
   event, from Phil Sutter.

netfilter pull request 25-05-23

* tag 'nf-next-25-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: (26 commits)
  selftests: netfilter: Torture nftables netdev hooks
  netfilter: nf_tables: Add notifications for hook changes
  netfilter: nf_tables: Support wildcard netdev hook specs
  netfilter: nf_tables: Sort labels in nft_netdev_hook_alloc()
  netfilter: nf_tables: Handle NETDEV_CHANGENAME events
  netfilter: nf_tables: Wrap netdev notifiers
  netfilter: nf_tables: Respect NETDEV_REGISTER events
  netfilter: nf_tables: Prepare for handling NETDEV_REGISTER events
  netfilter: nf_tables: Have a list of nf_hook_ops in nft_hook
  netfilter: nf_tables: Pass nf_hook_ops to nft_unregister_flowtable_hook()
  netfilter: nf_tables: Introduce nft_register_flowtable_ops()
  netfilter: nf_tables: Introduce nft_hook_find_ops{,_rcu}()
  netfilter: nf_tables: Introduce functions freeing nft_hook objects
  netfilter: nf_tables: add packets conntrack state to debug trace info
  netfilter: conntrack: make nf_conntrack_id callable without a module dependency
  netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmit
  netfilter: nft_inner: Use nested-BH locking for nft_pcpu_tun_ctx
  netfilter: nf_dup{4, 6}: Move duplication check to task_struct
  netfilter: nft_tunnel: fix geneve_opt dump
  selftests: netfilter: nft_fib.sh: add type and oif tests with and without VRFs
  ...
====================

Link: https://patch.msgid.link/20250523132712.458507-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 18:53:41 +02:00
Rafael J. Wysocki
0a17adc6be Merge branches 'acpi-processor' and 'acpi-cppc'
Merge ACPI processor driver updates and ACPI CPPC library updates for
6.16-rc1:

 - Clean up the initialization of CPU data structures in the ACPI
   processor driver (Zhang Rui).

 - Remove an obsolete comment regarding the C-states handling in the
   ACPI processor driver (Giovanni Gherdovich).

 - Simplify PCC shared memory region handling (Sudeep Holla).

 - Rework and extend functions for reading CPPC register values and for
   updating CPPC registers (Lifeng Zheng).

 - Add three functions related to autonomous CPU performance state
   selection to the CPPC library (Lifeng Zheng).

* acpi-processor:
  ACPI: processor: idle: Remove redundant pr->power.count assignment
  ACPI: processor: idle: Set pr->flags.power unconditionally
  ACPI: processor: idle: Remove obsolete comment

* acpi-cppc:
  ACPI: CPPC: Add three functions related to autonomous selection
  ACPI: CPPC: Modify cppc_get_auto_sel_caps() to cppc_get_auto_sel()
  ACPI: CPPC: Refactor register value get and set ABIs
  ACPI: CPPC: Add cppc_set_reg_val()
  ACPI: CPPC: Extract cppc_get_reg_val_in_pcc()
  ACPI: CPPC: Rename cppc_get_perf() to cppc_get_reg_val()
  ACPI: CPPC: Optimize cppc_get_perf()
  ACPI: CPPC: Add IS_OPTIONAL_CPC_REG macro to judge if a cpc_reg is optional
  ACPI: CPPC: Simplify PCC shared memory region handling
  ACPI: PCC: Simplify PCC shared memory region handling
2025-05-26 18:37:38 +02:00
Rafael J. Wysocki
5349b0051b Merge branch 'acpi-tables'
Merge updates related to the handling of static (data-only) ACPI tables
for 6.16-rc1:

 - Add __nonstring annotations for unterminated strings in the static
   ACPI tables parsing code (Kees Cook).

 - Add support for parsing the MRRM ACPI table and sysfs files to
   describe memory regions listed in it (Tony Luck, Anil Keshavamurthy).

 - Remove an (explicitly) unused header file include from the VIOT ACPI
   table parser file (Andy Shevchenko).

 - Improve logging around acpi_initialize_tables() (Bartosz Szczepanek).

* acpi-tables:
  ACPI: MRRM: Fix default max memory region
  ACPI: tables: Improve logging around acpi_initialize_tables()
  ACPI: VIOT: Remove (explicitly) unused header
  ACPI: Add documentation for exposing MRRM data
  ACPI: MRRM: Add /sys files to describe memory ranges
  ACPI: MRRM: Minimal parse of ACPI MRRM table
  ACPI: tables: Add __nonstring annotations for unterminated strings
2025-05-26 18:36:39 +02:00
Rafael J. Wysocki
57356d98c0 Merge branch 'acpica'
Merge ACPICA updates, including two upstream releases 20241212 and
20250404, for 6.16-rc1:

 - Fix two ACPICA SLAB cache leaks (Seunghun Han).

 - Add EINJv2 get error type action and define Error Injection Actions
   in hex values to avoid inconsistencies between the specification and
   the code (Zaid Alali).

 - Fix typo in comments for SRAT structures (Adam Lackorzynski).

 - Prevent possible loss of data in ACPICA because of u32 to u8
   conversions (Saket Dumbre).

 - Fix reading FFixedHW operation regions in ACPICA (Daniil Tatianin).

 - Add support for printing AML arguments when the ACPICA debug level is
   ACPI_LV_TRACE_POINT (Mario Limonciello).

 - Drop a stale comment about the file content from actbl2.h (Sudeep
   Holla).

 - Apply pack(1) to union aml_resource (Tamir Duberstein).

 - Fix overflow check in the ACPICA version of vsnprintf() (gldrk).

 - Interpret SIDP structures in DMAR added revision 3.4 of the VT-d
   specification (Alexey Neyman).

 - Add typedef and other definitions related to MRRM to ACPICA (Tony
   Luck).

 - Add definitions for RIMT to ACPICA (Sunil V L).

 - Fix spelling mistake "Incremement" -> "Increment" in the ACPICA
   utilities code (Colin Ian King).

 - Add typedef and other definitions for ERDT to ACPICA (Tony Luck).

 - Introduce ACPI_NONSTRING and use it (Kees Cook, Ahmed Salem).

 - Rename structure and field names of the RAS2 table in actbl2.h (Shiju
   Jose).

 - Fix up whitespace in acpica/utcache.c (Zhe Qiao).

 - Avoid sequence overread in a call to strncmp() in ap_get_table_length()
   and replace strncpy() with memcpy() in ACPICA in some places (Ahmed
   Salem).

 - Update copyright year in all ACPICA files (Saket Dumbre).

* acpica: (30 commits)
  ACPICA: Update copyright year
  ACPICA: Logfile: Changes for version 20250404
  ACPICA: Replace strncpy() with memcpy()
  ACPICA: Apply ACPI_NONSTRING in more places
  ACPICA: Avoid sequence overread in call to strncmp()
  ACPICA: Adjust the position of code lines
  ACPICA: actbl2.h: ACPI 6.5: RAS2: Rename structure and field names of the RAS2 table
  ACPICA: Apply ACPI_NONSTRING
  ACPICA: Introduce ACPI_NONSTRING
  ACPICA: actbl2.h: ERDT: Add typedef and other definitions
  ACPICA: infrastructure: Add new DMT_BUF types and shorten a long name
  ACPICA: Utilities: Fix spelling mistake "Incremement" -> "Increment"
  ACPICA: MRRM: Some cleanups
  ACPICA: actbl2: Add definitions for RIMT
  ACPICA: actbl2.h: MRRM: Add typedef and other definitions
  ACPICA: infrastructure: Add new header and ACPI_DMT_BUF26 types
  ACPICA: Interpret SIDP structures in DMAR
  ACPICA: utilities: Fix overflow check in vsnprintf()
  ACPICA: Apply pack(1) to union aml_resource
  ACPICA: Drop stale comment about the header file content
  ...
2025-05-26 18:35:01 +02:00
Linus Torvalds
8dd53535f1 Merge tag 'vfs-6.16-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs freezing updates from Christian Brauner:
 "This contains various filesystem freezing related work for this cycle:

   - Allow the power subsystem to support filesystem freeze for suspend
     and hibernate.

     Now all the pieces are in place to actually allow the power
     subsystem to freeze/thaw filesystems during suspend/resume.
     Filesystems are only frozen and thawed if the power subsystem does
     actually own the freeze.

     If the filesystem is already frozen by the time we've frozen all
     userspace processes we don't care to freeze it again. That's
     userspace's job once the process resumes. We only actually freeze
     filesystems if we absolutely have to and we ignore other failures
     to freeze.

     We could bubble up errors and fail suspend/resume if the error
     isn't EBUSY (aka it's already frozen) but I don't think that this
     is worth it. Filesystem freezing during suspend/resume is
     best-effort. If the user has 500 ext4 filesystems mounted and 4
     fail to freeze for whatever reason then we simply skip them.

     What we have now is already a big improvement and let's see how we
     fare with it before making our lives even harder (and uglier) than
     we have to.

   - Allow efivars to support freeze and thaw

     Allow efivarfs to partake to resync variable state during system
     hibernation and suspend. Add freeze/thaw support.

     This is a pretty straightforward implementation. We simply add
     regular freeze/thaw support for both userspace and the kernel.
     efivars is the first pseudofilesystem that adds support for
     filesystem freezing and thawing.

     The simplicity comes from the fact that we simply always resync
     variable state after efivarfs has been frozen. It doesn't matter
     whether that's because of suspend, userspace initiated freeze or
     hibernation. Efivars is simple enough that it doesn't matter that
     we walk all dentries. There are no directories and there aren't
     insane amounts of entries and both freeze/thaw are already
     heavy-handed operations. If userspace initiated a freeze/thaw cycle
     they would need CAP_SYS_ADMIN in the initial user namespace (as
     that's where efivarfs is mounted) so it can't be triggered by
     random userspace. IOW, we really really don't care"

* tag 'vfs-6.16-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  f2fs: fix freezing filesystem during resize
  kernfs: add warning about implementing freeze/thaw
  efivarfs: support freeze/thaw
  power: freeze filesystems during suspend/resume
  libfs: export find_next_child()
  super: add filesystem freezing helpers for suspend and hibernate
  gfs2: pass through holder from the VFS for freeze/thaw
  super: use common iterator (Part 2)
  super: use a common iterator (Part 1)
  super: skip dying superblocks early
  super: simplify user_get_super()
  super: remove pointless s_root checks
  fs: allow all writers to be frozen
  locking/percpu-rwsem: add freezable alternative to down_read
2025-05-26 09:33:44 -07:00
Paolo Abeni
fdb061195f Merge tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1) Remove some unnecessary strscpy_pad() size arguments.
   From Thorsten Blum.

2) Correct use of xso.real_dev on bonding offloads.
   Patchset from Cosmin Ratiu.

3) Add hardware offload configuration to XFRM_MSG_MIGRATE.
   From Chiachang Wang.

4) Refactor migration setup during cloning. This was
   done after the clone was created. Now it is done
   in the cloning function itself.
   From Chiachang Wang.

5) Validate assignment of maximal possible SEQ number.
   Prevent from setting to the maximum sequrnce number
   as this would cause for traffic drop.
   From Leon Romanovsky.

6) Prevent configuration of interface index when offload
   is used. Hardware can't handle this case.i
   From Leon Romanovsky.

7) Always use kfree_sensitive() for SA secret zeroization.
   From Zilin Guan.

ipsec-next-2025-05-23

* tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: use kfree_sensitive() for SA secret zeroization
  xfrm: prevent configuration of interface index when offload is used
  xfrm: validate assignment of maximal possible SEQ number
  xfrm: Refactor migration setup during the cloning process
  xfrm: Migrate offload configuration
  bonding: Fix multiple long standing offload races
  bonding: Mark active offloaded xfrm_states
  xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
  xfrm: Remove unneeded device check from validate_xmit_xfrm
  xfrm: Use xdo.dev instead of xdo.real_dev
  net/mlx5: Avoid using xso.real_dev unnecessarily
  xfrm: Remove unnecessary strscpy_pad() size arguments
====================

Link: https://patch.msgid.link/20250523075611.3723340-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 18:32:48 +02:00
Paolo Abeni
34d26315db Merge tag 'linux-can-next-for-6.16-20250522' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:

====================
pull-request: can-next 2025-05-22

this is a pull request of 22 patches for net-next/main.

The series by Biju Das contains 19 patches and adds RZ/G3E CANFD
support to the rcar_canfd driver.

The patch by Vincent Mailhol adds a struct data_bittiming_params to
group FD parameters as a preparation patch for CAN-XL support.

Felix Maurer's patch imports tst-filter from can-tests into the kernel
self tests and Vincent Mailhol adds support for physical CAN
interfaces.

linux-can-next-for-6.16-20250522

* tag 'linux-can-next-for-6.16-20250522' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (22 commits)
  selftests: can: test_raw_filter.sh: add support of physical interfaces
  selftests: can: Import tst-filter from can-tests
  can: dev: add struct data_bittiming_params to group FD parameters
  can: rcar_canfd: Add RZ/G3E support
  can: rcar_canfd: Enhance multi_channel_irqs handling
  can: rcar_canfd: Add external_clk variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add sh variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add struct rcanfd_regs variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add shared_can_regs variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add ch_interface_mode variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add {nom,data}_bittiming variables to struct rcar_canfd_hw_info
  can: rcar_canfd: Add max_cftml variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add max_aflpn variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add rnc_field_width variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Update RCANFD_GAFLCFG macro
  can: rcar_canfd: Add rcar_canfd_setrnc()
  can: rcar_canfd: Drop the mask operation in RCANFD_GAFLCFG_SETRNC macro
  can: rcar_canfd: Update RCANFD_GERFL_ERR macro
  can: rcar_canfd: Drop RCANFD_GAFLCFG_GETRNC macro
  can: rcar_canfd: Use of_get_available_child_by_name()
  ...
====================

Link: https://patch.msgid.link/20250522084128.501049-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 18:11:24 +02:00
Subbaraya Sundeep
ba5cb47b56 octeontx2-af: Send Link events one by one
Send link events one after another otherwise new message
is overwriting the message which is being processed by PF.

Fixes: a88e0f936b ("octeontx2: Detect the mbox up or down message via register")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1747823443-404-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 18:07:40 +02:00
Patrick Miller
4bf7b97eb3 rust: make section names plural
Clean Rust documentation section headers to use plural names.

Suggested-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://github.com/Rust-for-Linux/linux/issues/1110
Signed-off-by: Patrick Miller <paddymills@proton.me>
Link: https://lore.kernel.org/r/20241002022749.390836-1-paddymills@proton.me
[ Removed the `init` one that doesn't apply anymore and
  reworded slightly. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-26 18:03:09 +02:00
Linus Torvalds
181d8e399f Merge tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Use folios for symlinks in the page cache

     FUSE already uses folios for its symlinks. Mirror that conversion
     in the generic code and the NFS code. That lets us get rid of a few
     folio->page->folio conversions in this path, and some of the few
     remaining users of read_cache_page() / read_mapping_page()

   - Try and make a few filesystem operations killable on the VFS
     inode->i_mutex level

   - Add sysctl vfs_cache_pressure_denom for bulk file operations

     Some workloads need to preserve more dentries than we currently
     allow through out sysctl interface

     A HDFS servers with 12 HDDs per server, on a HDFS datanode startup
     involves scanning all files and caching their metadata (including
     dentries and inodes) in memory. Each HDD contains approximately 2
     million files, resulting in a total of ~20 million cached dentries
     after initialization

     To minimize dentry reclamation, they set vfs_cache_pressure to 1.
     Despite this configuration, memory pressure conditions can still
     trigger reclamation of up to 50% of cached dentries, reducing the
     cache from 20 million to approximately 10 million entries. During
     the subsequent cache rebuild period, any HDFS datanode restart
     operation incurs substantial latency penalties until full cache
     recovery completes

     To maintain service stability, more dentries need to be preserved
     during memory reclamation. The current minimum reclaim ratio (1/100
     of total dentries) remains too aggressive for such workload. This
     patch introduces vfs_cache_pressure_denom for more granular cache
     pressure control

     The configuration [vfs_cache_pressure=1,
     vfs_cache_pressure_denom=10000] effectively maintains the full 20
     million dentry cache under memory pressure, preventing datanode
     restart performance degradation

   - Avoid some jumps in inode_permission() using likely()/unlikely()

   - Avid a memory access which is most likely a cache miss when
     descending into devcgroup_inode_permission()

   - Add fastpath predicts for stat() and fdput()

   - Anonymous inodes currently don't come with a proper mode causing
     issues in the kernel when we want to add useful VFS debug assert.
     Fix that by giving them a proper mode and masking it off when we
     report it to userspace which relies on them not having any mode

   - Anonymous inodes currently allow to change inode attributes because
     the VFS falls back to simple_setattr() if i_op->setattr isn't
     implemented. This means the ownership and mode for every single
     user of anon_inode_inode can be changed. Block that as it's either
     useless or actively harmful. If specific ownership is needed the
     respective subsystem should allocate anonymous inodes from their
     own private superblock

   - Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock

   - Add proper tests for anonymous inode behavior

   - Make it easy to detect proper anonymous inodes and to ensure that
     we can detect them in codepaths such as readahead()

  Cleanups:

   - Port pidfs to the new anon_inode_{g,s}etattr() helpers

   - Try to remove the uselib() system call

   - Add unlikely branch hint return path for poll

   - Add unlikely branch hint on return path for core_sys_select

   - Don't allow signals to interrupt getdents copying for fuse

   - Provide a size hint to dir_context for during readdir()

   - Use writeback_iter directly in mpage_writepages

   - Update compression and mtime descriptions in initramfs
     documentation

   - Update main netfs API document

   - Remove useless plus one in super_cache_scan()

   - Remove unnecessary NULL-check guards during setns()

   - Add separate separate {get,put}_cgroup_ns no-op cases

  Fixes:

   - Fix typo in root= kernel parameter description

   - Use KERN_INFO for infof()|info_plog()|infofc()

   - Correct comments of fs_validate_description()

   - Mark an unlikely if condition with unlikely() in
     vfs_parse_monolithic_sep()

   - Delete macro fsparam_u32hex()

   - Remove unused and problematic validate_constant_table()

   - Fix potential unsigned integer underflow in fs_name()

   - Make file-nr output the total allocated file handles"

* tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits)
  fs: Pass a folio to page_put_link()
  nfs: Use a folio in nfs_get_link()
  fs: Convert __page_get_link() to use a folio
  fs/read_write: make default_llseek() killable
  fs/open: make do_truncate() killable
  fs/open: make chmod_common() and chown_common() killable
  include/linux/fs.h: add inode_lock_killable()
  readdir: supply dir_context.count as readdir buffer size hint
  vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
  fuse: don't allow signals to interrupt getdents copying
  Documentation: fix typo in root= kernel parameter description
  include/cgroup: separate {get,put}_cgroup_ns no-op case
  kernel/nsproxy: remove unnecessary guards
  fs: use writeback_iter directly in mpage_writepages
  fs: remove useless plus one in super_cache_scan()
  fs: add S_ANON_INODE
  fs: remove uselib() system call
  device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()
  fs/fs_parse: Remove unused and problematic validate_constant_table()
  fs: touch up predicts in inode_permission()
  ...
2025-05-26 09:02:39 -07:00
Linus Torvalds
a1ae8ce78b Merge tag 'vfs-6.16-rc1.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount api conversions from Christian Brauner:
 "This converts the bfs and omfs filesystems to the new mount api"

* tag 'vfs-6.16-rc1.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  omfs: convert to new mount API
  bfs: convert bfs to use the new mount api
2025-05-26 08:46:59 -07:00
Benno Lossin
eb71feaaca rust: list: fix path of assert_pinned!
Commit dbd5058ba6 ("rust: make pin-init its own crate") moved all
items from pin-init into the pin-init crate, including the
`assert_pinned!` macro.

Thus fix the path of the sole user of the `assert_pinned!` macro.

This occurrence was missed in the commit above, since it is in a macro
rule that has no current users (although binder is a future user).

Cc: stable@kernel.org
Fixes: dbd5058ba6 ("rust: make pin-init its own crate")
Signed-off-by: Benno Lossin <lossin@kernel.org>
Link: https://lore.kernel.org/r/20250525173450.853413-1-lossin@kernel.org
[ Reworded slightly as discussed in the list. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-26 17:43:53 +02:00
Jeremy Kerr
0a9b2c9fd1 net: mctp: use nlmsg_payload() for netlink message data extraction
Jakub suggests:

> I have a different request :) Matt, once this ends up in net-next
> (end of this week) could you refactor it to use nlmsg_payload() ?
> It doesn't exist in net but this is exactly why it was added.

This refactors the additions to both mctp_dump_addrinfo(), and
mctp_rtm_getneigh() - two cases where we're calling nlh_data() on an
an incoming netlink message, without a prior nlmsg_parse().

For the neigh.c case, we cannot hit the failure where the nlh does not
contain a full ndmsg at present, as the core handler
(net/core/neighbour.c, neigh_get()) has already validated the size
through neigh_valid_req_get(), and would have failed the get operation
before the MCTP hander is called.

However, relying on that is a bit fragile, so apply the nlmsg_payload
refector here too.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/20250521-mctp-nlmsg-payload-v2-1-e85df160c405@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 17:38:27 +02:00
Paolo Abeni
31eaaa5cb5 Merge branch 'add-the-capability-to-consume-sram-for-hwfd-descriptor-queue-in-airoha_eth-driver'
Lorenzo Bianconi says:

====================
Add the capability to consume SRAM for hwfd descriptor queue in airoha_eth driver

In order to improve packet processing and packet forwarding
performances, EN7581 SoC supports consuming SRAM instead of DRAM for hw
forwarding descriptors queue. For downlink hw accelerated traffic
request to consume SRAM memory for hw forwarding descriptors queue.
Moreover, in some configurations QDMA blocks require a contiguous block
of system memory for hwfd buffers queue. Introduce the capability to
allocate hw buffers forwarding queue via the reserved-memory DTS
property instead of running dmam_alloc_coherent().

v2: https://lore.kernel.org/r/20250509-airopha-desc-sram-v2-0-9dc3d8076dfb@kernel.org
v1: https://lore.kernel.org/r/20250507-airopha-desc-sram-v1-0-d42037431bfa@kernel.org
====================

Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-0-a6e9b085b4f0@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 17:32:59 +02:00
Lorenzo Bianconi
c683e378c0 net: airoha: Add the capability to allocate hfwd descriptors in SRAM
In order to improve packet processing and packet forwarding
performances, EN7581 SoC supports consuming SRAM instead of DRAM for
hw forwarding descriptors queue.
For downlink hw accelerated traffic request to consume SRAM memory
for hw forwarding descriptors queue.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-4-a6e9b085b4f0@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 17:32:57 +02:00
Lorenzo Bianconi
3a1ce9e3d0 net: airoha: Add the capability to allocate hwfd buffers via reserved-memory
In some configurations QDMA blocks require a contiguous block of
system memory for hwfd buffers queue. Introduce the capability to allocate
hw buffers forwarding queue via the reserved-memory DTS property instead of
running dmam_alloc_coherent().

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-3-a6e9b085b4f0@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 17:32:57 +02:00
Lorenzo Bianconi
09aa788f98 net: airoha: Do not store hfwd references in airoha_qdma struct
Since hfwd descriptor and buffer queues are allocated via
dmam_alloc_coherent() we do not need to store their references
in airoha_qdma struct. This patch does not introduce any logical changes,
just code clean-up.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-2-a6e9b085b4f0@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 17:32:56 +02:00
Lorenzo Bianconi
2d13b45f80 dt-bindings: net: airoha: Add EN7581 memory-region property
Introduce memory-region and memory-region-names properties for the
ethernet node available on EN7581 SoC in order to reserve system memory
for hw forwarding buffers queue used by the QDMA modules.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-1-a6e9b085b4f0@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 17:32:56 +02:00