Commit Graph

15120 Commits

Author SHA1 Message Date
Linus Torvalds
78bb43e51b Merge tag 'core-entry-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull generic entry code updates from Thomas Gleixner:

 - Split the code into syscall and exception/interrupt parts to ease the
   conversion of ARM[64] to the generic entry infrastructure

 - Extend syscall user dispatching to support a single intercepted range
   instead of the default single non-intercepted range. That allows
   monitoring/analysis of a specific executable range, e.g. a library,
   and also provides flexibility for sandboxing scenarios

 - Cleanup and extend the user dispatch selftest

* tag 'core-entry-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  entry: Split generic entry into generic exception and syscall entry
  selftests: Add tests for PR_SYS_DISPATCH_INCLUSIVE_ON
  syscall_user_dispatch: Add PR_SYS_DISPATCH_INCLUSIVE_ON
  selftests: Fix errno checking in syscall_user_dispatch test
2025-07-29 15:14:29 -07:00
Linus Torvalds
f38b1f243e Merge tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex updates from Thomas Gleixner:

 - Switch the reference counting to a RCU based per-CPU reference to
   address a performance bottleneck vs the single instance rcuref
   variant

 - Make the futex selftest build on 32-bit architectures which only
   support 64-bit time_t, e.g. RISCV-32

 - Cleanups and improvements in selftests and futex bench

* tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/futex: Fix spelling mistake "Succeffuly" -> "Successfully"
  selftests/futex: Define SYS_futex on 32-bit architectures with 64-bit time_t
  perf bench futex: Remove support for IMMUTABLE
  selftests/futex: Remove support for IMMUTABLE
  futex: Remove support for IMMUTABLE
  futex: Make futex_private_hash_get() static
  futex: Use RCU-based per-CPU reference counting instead of rcuref_t
  selftests/futex: Adapt the private hash test to RCU related changes
2025-07-29 14:39:42 -07:00
Linus Torvalds
02dc9d15d7 Merge tag 'timers-ptp-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping and VDSO updates from Thomas Gleixner:

 - Introduce support for auxiliary timekeepers

   PTP clocks can be disconnected from the universal CLOCK_TAI reality
   for various reasons including regularatory requirements for
   functional safety redundancy.

   The kernel so far only supports a single notion of time, which means
   that all clocks are correlated in frequency and only differ by offset
   to each other.

   Access to non-correlated PTP clocks has been available so far only
   through the file descriptor based "POSIX clock IDs", which are
   subject to locking and have to go all the way out to the hardware.

   The access is not only horribly slow, as it has to go all the way out
   to the NIC/PTP hardware, but that also prevents the kernel to read
   the time of such clocks e.g. from the network stack, where it is
   required for TSN networking both on the transmit and receive side
   unless the hardware provides offloading.

   The auxiliary clocks provide a mechanism to support arbitrary clocks
   which are not correlated to the system clock. This is not restricted
   to the PTP use case on purpose as there is no kernel side association
   of these clocks to a particular PTP device because that's a pure user
   space configuration decision. Having them independent allows to
   utilize them for other purposes and also enables them to be tested
   without hardware dependencies.

   To avoid pointless overhead these clocks have to be enabled
   individualy via a new sysfs interface to reduce the overhead to a
   single compare in the hotpath if they are enabled at the Kconfig
   level at all.

   These clocks utilize the existing timekeeping/NTP infrastructures,
   which has been made possible over the recent releases by incrementaly
   converting these infrastructures over from a single static instance
   to a multi-instance pointer based implementation without any
   performance regression reported.

   The auxiliary clocks provide the same "emulation" of a "correct"
   clock as the existing CLOCK_* variants do with an independent
   instance of data and provide the same steering mechanism through the
   existing sys_clock_adjtime() interface, which has been confirmed to
   work by the chronyd(8) maintainer.

   That allows to provide lockless kernel internal and VDSO support so
   that applications and kernel internal functionalities can access
   these clocks without restrictions and at the same performance as the
   existing system clocks.

 - Avoid double notifications in the adjtimex() syscall. Not a big
   issue, but a trivial to avoid latency source.

* tag 'timers-ptp-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
  vdso/gettimeofday: Add support for auxiliary clocks
  vdso/vsyscall: Update auxiliary clock data in the datapage
  vdso: Introduce aux_clock_resolution_ns()
  vdso/gettimeofday: Introduce vdso_get_timestamp()
  vdso/gettimeofday: Introduce vdso_set_timespec()
  vdso/gettimeofday: Introduce vdso_clockid_valid()
  vdso/gettimeofday: Return bool from clock_gettime() helpers
  vdso/gettimeofday: Return bool from clock_getres() helpers
  vdso/helpers: Add helpers for seqlocks of single vdso_clock
  vdso/vsyscall: Split up __arch_update_vsyscall() into __arch_update_vdso_clock()
  vdso/vsyscall: Introduce a helper to fill clock configurations
  timekeeping: Remove the temporary CLOCK_AUX workaround
  timekeeping: Provide ktime_get_clock_ts64()
  timekeeping: Provide interface to control auxiliary clocks
  timekeeping: Provide update for auxiliary timekeepers
  timekeeping: Provide adjtimex() for auxiliary clocks
  timekeeping: Prepare do_adtimex() for auxiliary clocks
  timekeeping: Make do_adjtimex() reusable
  timekeeping: Add auxiliary clock support to __timekeeping_inject_offset()
  timekeeping: Make timekeeping_inject_offset() reusable
  ...
2025-07-29 14:12:52 -07:00
Linus Torvalds
0ae982df67 Merge tag 'i2c-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
 "I2C Core:
   - prevent double-free of an fwnode if it is a software node
   - use recent helpers instead of custom ACPI or outdated OF ones
   - add a more elaborate description of a message flag

  Cleanups and refactorings:
   - lpi2c, riic, st, stm32f7: general improvements
   - riic: support more flexible IRQ configurations
   - tegra: fix documentation

  Improvements:
   - lpi2c: improve register polling and add atomic transfer
   - imx: use guarded spinlocks

  New hardware support:
   - Samsung Exynos 2200
   - Renesas RZ/T2H (R9A09G077), RZ/N2H (R9A09G087)

  DT binding:
   - rk3x: enable power domains
   - nxp: support clock property"

* tag 'i2c-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: core: Fix double-free of fwnode in i2c_unregister_device()
  i2c: lpi2c: implement xfer_atomic callback
  i2c: lpi2c: use readl_poll_timeout() for register polling
  dt-bindings: i2c: i2c-rk3x: Allow use of a power-domain
  dt-bindings: i2c: exynos5: add samsung,exynos2200-hsi2c compatible
  i2c: lpi2c: convert to use secs_to_jiffies()
  i2c: st: Use min() to improve code
  i2c: imx: use guard to take spinlock
  i2c: stm32f7: Use str_on_off() helper
  dt-bindings: i2c: nxp,pnx-i2c: allow clocks property
  i2c: riic: Add support for RZ/T2H SoC
  i2c: riic: Move generic compatible string to end of array
  i2c: riic: Pass IRQ desc array as part of OF data
  dt-bindings: i2c: renesas,riic: Document RZ/T2H and RZ/N2H support
  dt-bindings: i2c: renesas,riic: Move ref for i2c-controller.yaml to the end
  i2c: tegra: Add missing kernel-doc for dma_dev member
  i2c: Clarify behavior of I2C_M_RD flag
  i2c: mux: pca954x: Use dev_fwnode()
  i2c: acpi: Replace custom code with device_match_acpi_handle()
2025-07-29 11:35:24 -07:00
Linus Torvalds
91e60731dd Merge tag 'tty-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver updates from Greg KH:
 "Here is the big set of TTY and Serial driver updates for 6.17-rc1.
  Included in here is the following types of changes:

   - another cleanup round from Jiri for the 8250 serial driver and some
     other tty drivers, things are slowly getting better with our apis
     thanks to this work. This touched many tty drivers all over the
     tree.

   - qcom_geni_serial driver update for new platforms and devices

   - 8250 quirk handling fixups

   - dt serial binding updates for different boards/platforms

   - other minor cleanups and fixes

  All of these have been in linux-next with no reported issues"

* tag 'tty-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (79 commits)
  dt-bindings: serial: snps-dw-apb-uart: Allow use of a power-domain
  serial: 8250: fix panic due to PSLVERR
  dt-bindings: serial: samsung: add samsung,exynos2200-uart compatible
  vt: defkeymap: Map keycodes above 127 to K_HOLE
  vt: keyboard: Don't process Unicode characters in K_OFF mode
  serial: qcom-geni: Enable Serial on SA8255p Qualcomm platforms
  serial: qcom-geni: Enable PM runtime for serial driver
  serial: qcom-geni: move clock-rate logic to separate function
  serial: qcom-geni: move resource control logic to separate functions
  serial: qcom-geni: move resource initialization to separate function
  soc: qcom: geni-se: Enable QUPs on SA8255p Qualcomm platforms
  dt-bindings: qcom: geni-se: describe SA8255p
  dt-bindings: serial: describe SA8255p
  serial: 8250_dw: Fix typo "notifer"
  dt-bindings: serial: 8250: spacemit: set clocks property as required
  dt-bindings: serial: renesas: Document RZ/V2N SCIF
  serial: 8250_ce4100: Fix CONFIG_SERIAL_8250=n build
  tty: omit need_resched() before cond_resched()
  serial: 8250_ni: Reorder local variables
  serial: 8250_ni: Fix build warning
  ...
2025-07-29 10:11:01 -07:00
Linus Torvalds
f38b751290 Merge tag 'pwm/for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm updates from Uwe Kleine-König:
 "Apart from the usual mix of new drivers (pwm-argon-fan-hat), adding
  support for variants to existing drivers, minor improvements to both
  drivers and docs, device tree documenation updates, the noteworthy
  changes are:

   - A hwmon companion driver to pwm-mc33xs2410 living in drivers/hwmon
     and acked by Guenter Roeck

   - chardev support for PWM devices. This leverages atomic PWM updates
     to userspace and at the same time simplifies and accelerates PWM
     configuration changes"

* tag 'pwm/for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux: (35 commits)
  pwm: raspberrypi-poe: Fix spelling mistake "Firwmware" -> "Firmware"
  hwmon: add support for MC33XS2410 hardware monitoring
  pwm: mc33xs2410: add hwmon support
  pwm: img: Remove redundant pm_runtime_mark_last_busy() calls
  pwm: Expose PWM_WFHWSIZE in public header
  dt-bindings: pwm: Convert lpc32xx-pwm.txt to yaml format
  docs: pwm: Adapt Locking paragraph to reality
  pwm: twl-led: Drop driver local locking
  pwm: sun4i: Drop driver local locking
  pwm: sti: Drop driver local locking
  pwm: microchip-core: Drop driver local locking
  pwm: lpc18xx-sct: Drop driver local locking
  pwm: fsl-ftm: Drop driver local locking
  pwm: clps711x: Drop driver local locking
  pwm: atmel: Drop driver local locking
  pwm: argon-fan-hat: Add Argon40 Fan HAT support
  dt-bindings: pwm: argon40,fan-hat: Document Argon40 Fan HAT
  dt-bindings: vendor-prefixes: Document Argon40
  pwm: pwm-mediatek: Add support for PWM IP V3.0.2 in MT6991/MT8196
  pwm: pwm-mediatek: Pass PWM_CK_26M_SEL from platform data
  ...
2025-07-28 23:17:46 -07:00
Linus Torvalds
177bf8620c Merge tag 'sound-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
 "This includes lots of file shuffling due to HD-audio code
  reorganization and many trivial changes, but otherwise there shouldn't
  be much surprise from the functionality POV. The PR includes the PM
  changes as prerequisite, too. Some highlights below:

  Core:
   - Performance optimizations in PCM core code
   - Refactoring of ASoC Kconfig menus to be hopefully more consistant
     and easier to navigate.
   - Refactoring of ASoC DAPM code, mainly hiding functionality that
     doesn't need to be exposed to drivers

  HD-audio reorganization:
   - All code are moved under sound/hda with a bit more understandable
     tree structure, as well as file renames
   - The huge Realtek driver code is split to several parts, a common
     helper module with driver modules per probe entry
   - HDMI and Cirrus codec drivers also split

  ASoC:
   - Further work on the generic handling for SoundWire SDCA devices
   - Support for AMD ACP7.2 and SoundWire on ACP 7.1, Fairphone 4 & 5,
     various Intel systems, Qualcomm QCS8275, Richtek RTQ9124 and TI
     TAS5753

  HD-audio and USB-audio:
   - TAS2781 driver cleanup and TAS2770 support
   - EQ enablement in CA0132 driver
   - USB audio quirk code cleanups

  Others:
   - Cleanups of PM autosuspend call patterns with the update from the
     PM tree
   - Lots of strcpy() -> strscpy() conversions for fixed size arrays"

* tag 'sound-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (385 commits)
  ALSA: hda: Add TAS2770 support
  ASoC: qcom: sm8250: Add Fairphone 4 soundcard compatible
  ASoC: dt-bindings: qcom,sm8250: Add Fairphone 4 sound card
  ASoC: dt-bindings: qcom,q6afe: Document q6usb subnode
  ASoC: SDCA: Fix implicit cast from le16
  ASoC: SDCA: Shrink detected_mode_handler() stack frame
  ASoC: SDCA: Check devm_mutex_init() return value
  ASoC: SDCA: add route by the number of input pins in MU entity
  ALSA: hda/realtek: Add support for ASUS Commercial laptops using CS35L41 HDA
  ASoC: Intel: sof_rt5682: Add HDMI-In capture with rt5682 support for PTL.
  ASoC: codec: tlv320aic32x4: Fix reset GPIO check
  ASoC: dt-bindings: qcom,lpass-va-macro: Define clock-names in top-level
  ASoC: SDCA: Add hw_params() helper function
  ASoC: SDCA: Add a helper to get the SoundWire port number
  ASoC: SDCA: Add helper to add DAI constraints
  ASoC: soc-dai: Add private data to snd_soc_dai
  ASoC: SDCA: Move SDCA search functions and export
  ASoC: SDCA: Remove overly chatty input pin list warning
  ASoC: SDCA: Allow read-only controls to be deferrable
  ASoC: SDCA: Update memory allocations to zero initialise
  ...
2025-07-28 21:49:49 -07:00
Linus Torvalds
6e11664f14 Merge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:

 - MD pull request via Yu:
      - call del_gendisk synchronously (Xiao)
      - cleanup unused variable (John)
      - cleanup workqueue flags (Ryo)
      - fix faulty rdev can't be removed during resync (Qixing)

 - NVMe pull request via Christoph:
      - try PCIe function level reset on init failure (Keith Busch)
      - log TLS handshake failures at error level (Maurizio Lombardi)
      - pci-epf: do not complete commands twice if nvmet_req_init()
        fails (Rick Wertenbroek)
      - misc cleanups (Alok Tiwari)

 - Removal of the pktcdvd driver

   This has been more than a decade coming at this point, and some
   recently revealed breakages that had it causing issues even for cases
   where it isn't required made me re-pull the trigger on this one. It's
   known broken and nobody has stepped up to maintain the code

 - Series for ublk supporting batch commands, enabling the use of
   multishot where appropriate

 - Speed up ublk exit handling

 - Fix for the two-stage elevator fixing which could leak data

 - Convert NVMe to use the new IOVA based API

 - Increase default max transfer size to something more reasonable

 - Series fixing write operations on zoned DM devices

 - Add tracepoints for zoned block device operations

 - Prep series working towards improving blk-mq queue management in the
   presence of isolated CPUs

 - Don't allow updating of the block size of a loop device that is
   currently under exclusively ownership/open

 - Set chunk sectors from stacked device stripe size and use it for the
   atomic write size limit

 - Switch to folios in bcache read_super()

 - Fix for CD-ROM MRW exit flush handling

 - Various tweaks, fixes, and cleanups

* tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits)
  block: restore two stage elevator switch while running nr_hw_queue update
  cdrom: Call cdrom_mrw_exit from cdrom_release function
  sunvdc: Balance device refcount in vdc_port_mpgroup_check
  nvme-pci: try function level reset on init failure
  dm: split write BIOs on zone boundaries when zone append is not emulated
  block: use chunk_sectors when evaluating stacked atomic write limits
  dm-stripe: limit chunk_sectors to the stripe size
  md/raid10: set chunk_sectors limit
  md/raid0: set chunk_sectors limit
  block: sanitize chunk_sectors for atomic write limits
  ilog2: add max_pow_of_two_factor()
  nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails
  nvme-tcp: log TLS handshake failures at error level
  docs: nvme: fix grammar in nvme-pci-endpoint-target.rst
  nvme: fix typo in status code constant for self-test in progress
  nvmet: remove redundant assignment of error code in nvmet_ns_enable()
  nvme: fix incorrect variable in io cqes error message
  nvme: fix multiple spelling and grammar issues in host drivers
  block: fix blk_zone_append_update_request_bio() kernel-doc
  md/raid10: fix set but not used variable in sync_request_write()
  ...
2025-07-28 16:43:54 -07:00
Linus Torvalds
c3018a2c6a Merge tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:

 - Optimization to avoid reference counts on non-cloned registered
   buffers. This is how these buffers were handled prior to having
   cloning support, and we can still use that approach as long as the
   buffers haven't been cloned to another ring.

 - Cleanup and improvement for uring_cmd, where btrfs was the only user
   of storing allocated data for the lifetime of the uring_cmd. Clean
   that up so we can get rid of the need to do that.

 - Avoid unnecessary memory copies in uring_cmd usage. This is
   particularly important as a lot of uring_cmd usage necessitates the
   use of 128b SQEs.

 - A few updates for recv multishot, where it's now possible to add
   fairness limits for limiting how much is transferred for each retry
   loop. Additionally, recv multishot now supports an overall cap as
   well, where once reached the multishot recv will terminate. The
   latter is useful for buffer management and juggling many recv streams
   at the same time.

 - Add support for returning the TX timestamps via a new socket command.
   This feature can work in either singleshot or multishot mode, where
   the latter triggers a completion whenever new timestamps are
   available. This is an alternative to using the existing error queue.

 - Add support for an io_uring "mock" file, which is the start of being
   able to do 100% targeted testing in terms of exercising io_uring
   request handling. The idea is to have a file type that can be
   anything the tester would like, and behave exactly how you want it to
   behave in terms of hitting the code paths you want.

 - Improve zcrx by using sgtables to de-duplicate and improve dma
   address handling.

 - Prep work for supporting larger pages for zcrx.

 - Various little improvements and fixes.

* tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux: (42 commits)
  io_uring/zcrx: fix leaking pages on sg init fail
  io_uring/zcrx: don't leak pages on account failure
  io_uring/zcrx: fix null ifq on area destruction
  io_uring: fix breakage in EXPERT menu
  io_uring/cmd: remove struct io_uring_cmd_data
  btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmd
  io_uring/cmd: introduce IORING_URING_CMD_REISSUE flag
  io_uring/zcrx: account area memory
  io_uring: export io_[un]account_mem
  io_uring/net: Support multishot receive len cap
  io_uring: deduplicate wakeup handling
  io_uring/net: cast min_not_zero() type
  io_uring/poll: cleanup apoll freeing
  io_uring/net: allow multishot receive per-invocation cap
  io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags
  io_uring/net: use passed in 'len' in io_recv_buf_select()
  io_uring/zcrx: prepare fallback for larger pages
  io_uring/zcrx: assert area type in io_zcrx_iov_page
  io_uring/zcrx: allocate sgtable for umem areas
  io_uring/zcrx: introduce io_populate_area_dma
  ...
2025-07-28 16:30:12 -07:00
Linus Torvalds
57fcb7d930 Merge tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull fileattr updates from Christian Brauner:
 "This introduces the new file_getattr() and file_setattr() system calls
  after lengthy discussions.

  Both system calls serve as successors and extensible companions to
  the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have
  started to show their age in addition to being named in a way that
  makes it easy to conflate them with extended attribute related
  operations.

  These syscalls allow userspace to set filesystem inode attributes on
  special files. One of the usage examples is the XFS quota projects.

  XFS has project quotas which could be attached to a directory. All new
  inodes in these directories inherit project ID set on parent
  directory.

  The project is created from userspace by opening and calling
  FS_IOC_FSSETXATTR on each inode. This is not possible for special
  files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
  with empty project ID. Those inodes then are not shown in the quota
  accounting but still exist in the directory. This is not critical but
  in the case when special files are created in the directory with
  already existing project quota, these new inodes inherit extended
  attributes. This creates a mix of special files with and without
  attributes. Moreover, special files with attributes don't have a
  possibility to become clear or change the attributes. This, in turn,
  prevents userspace from re-creating quota project on these existing
  files.

  In addition, these new system calls allow the implementation of
  additional attributes that we couldn't or didn't want to fit into the
  legacy ioctls anymore"

* tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: tighten a sanity check in file_attr_to_fileattr()
  tree-wide: s/struct fileattr/struct file_kattr/g
  fs: introduce file_getattr and file_setattr syscalls
  fs: prepare for extending file_get/setattr()
  fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP
  selinux: implement inode_file_[g|s]etattr hooks
  lsm: introduce new hooks for setting/getting inode fsxattr
  fs: split fileattr related helpers into separate file
2025-07-28 15:24:14 -07:00
Linus Torvalds
cec40a7c80 Merge tag 'vfs-6.17-rc1.integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs 'protection info' updates from Christian Brauner:
 "This adds the new FS_IOC_GETLBMD_CAP ioctl() to query metadata and
  protection info (PI) capabilities. This ioctl returns information
  about the files integrity profile. This is useful for userspace
  applications to understand a files end-to-end data protection support
  and configure the I/O accordingly.

  For now this interface is only supported by block devices. However the
  design and placement of this ioctl in generic FS ioctl space allows us
  to extend it to work over files as well. This maybe useful when
  filesystems start supporting PI-aware layouts.

  A new structure struct logical_block_metadata_cap is introduced, which
  contains the following fields:

   - lbmd_flags:
     bitmask of logical block metadata capability flags

   - lbmd_interval:
     the amount of data described by each unit of logical block metadata

   - lbmd_size:
     size in bytes of the logical block metadata associated with each
     interval

   - lbmd_opaque_size:
     size in bytes of the opaque block tag associated with each interval

   - lbmd_opaque_offset:
     offset in bytes of the opaque block tag within the logical block
     metadata

   - lbmd_pi_size:
     size in bytes of the T10 PI tuple associated with each interval

   - lbmd_pi_offset:
     offset in bytes of T10 PI tuple within the logical block metadata

   - lbmd_pi_guard_tag_type:
     T10 PI guard tag type

   - lbmd_pi_app_tag_size:
     size in bytes of the T10 PI application tag

   - lbmd_pi_ref_tag_size:
     size in bytes of the T10 PI reference tag

   - lbmd_pi_storage_tag_size:
     size in bytes of the T10 PI storage tag

  The internal logic to fetch the capability is encapsulated in a helper
  function blk_get_meta_cap(), which uses the blk_integrity profile
  associated with the device. The ioctl returns -EOPNOTSUPP, if
  CONFIG_BLK_DEV_INTEGRITY is not enabled"

* tag 'vfs-6.17-rc1.integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  block: fix lbmd_guard_tag_type assignment in FS_IOC_GETLBMD_CAP
  block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
  fs: add ioctl to query metadata and protection info capabilities
  nvme: set pi_offset only when checksum type is not BLK_INTEGRITY_CSUM_NONE
  block: introduce pi_tuple_size field in blk_integrity
  block: rename tuple_size field in blk_integrity to metadata_size
2025-07-28 15:12:00 -07:00
Linus Torvalds
672dcda246 Merge tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfs updates from Christian Brauner:

 - persistent info

   Persist exit and coredump information independent of whether anyone
   currently holds a pidfd for the struct pid.

   The current scheme allocated pidfs dentries on-demand repeatedly.
   This scheme is reaching it's limits as it makes it impossible to pin
   information that needs to be available after the task has exited or
   coredumped and that should not be lost simply because the pidfd got
   closed temporarily. The next opener should still see the stashed
   information.

   This is also a prerequisite for supporting extended attributes on
   pidfds to allow attaching meta information to them.

   If someone opens a pidfd for a struct pid a pidfs dentry is allocated
   and stashed in pid->stashed. Once the last pidfd for the struct pid
   is closed the pidfs dentry is released and removed from pid->stashed.

   So if 10 callers create a pidfs dentry for the same struct pid
   sequentially, i.e., each closing the pidfd before the other creates a
   new one then a new pidfs dentry is allocated every time.

   Because multiple tasks acquiring and releasing a pidfd for the same
   struct pid can race with each another a task may still find a valid
   pidfs entry from the previous task in pid->stashed and reuse it. Or
   it might find a dead dentry in there and fail to reuse it and so
   stashes a new pidfs dentry. Multiple tasks may race to stash a new
   pidfs dentry but only one will succeed, the other ones will put their
   dentry.

   The current scheme aims to ensure that a pidfs dentry for a struct
   pid can only be created if the task is still alive or if a pidfs
   dentry already existed before the task was reaped and so exit
   information has been was stashed in the pidfs inode.

   That's great except that it's buggy. If a pidfs dentry is stashed in
   pid->stashed after pidfs_exit() but before __unhash_process() is
   called we will return a pidfd for a reaped task without exit
   information being available.

   The pidfds_pid_valid() check does not guard against this race as it
   doens't sync at all with pidfs_exit(). The pid_has_task() check might
   be successful simply because we're before __unhash_process() but
   after pidfs_exit().

   Introduce a new scheme where the lifetime of information associated
   with a pidfs entry (coredump and exit information) isn't bound to the
   lifetime of the pidfs inode but the struct pid itself.

   The first time a pidfs dentry is allocated for a struct pid a struct
   pidfs_attr will be allocated which will be used to store exit and
   coredump information.

   If all pidfs for the pidfs dentry are closed the dentry and inode can
   be cleaned up but the struct pidfs_attr will stick until the struct
   pid itself is freed. This will ensure minimal memory usage while
   persisting relevant information.

   The new scheme has various advantages. First, it allows to close the
   race where we end up handing out a pidfd for a reaped task for which
   no exit information is available. Second, it minimizes memory usage.
   Third, it allows to remove complex lifetime tracking via dentries
   when registering a struct pid with pidfs. There's no need to get or
   put a reference. Instead, the lifetime of exit and coredump
   information associated with a struct pid is bound to the lifetime of
   struct pid itself.

 - extended attributes

   Now that we have a way to persist information for pidfs dentries we
   can start supporting extended attributes on pidfds. This will allow
   userspace to attach meta information to tasks.

   One natural extension would be to introduce a custom pidfs.* extended
   attribute space and allow for the inheritance of extended attributes
   across fork() and exec().

   The first simple scheme will allow privileged userspace to set
   trusted extended attributes on pidfs inodes.

 - Allow autonomous pidfs file handles

   Various filesystems such as pidfs and drm support opening file
   handles without having to require a file descriptor to identify the
   filesystem. The filesystem are global single instances and can be
   trivially identified solely on the information encoded in the file
   handle.

   This makes it possible to not have to keep or acquire a sentinal file
   descriptor just to pass it to open_by_handle_at() to identify the
   filesystem. That's especially useful when such sentinel file
   descriptor cannot or should not be acquired.

   For pidfs this means a file handle can function as full replacement
   for storing a pid in a file. Instead a file handle can be stored and
   reopened purely based on the file handle.

   Such autonomous file handles can be opened with or without specifying
   a a file descriptor. If no proper file descriptor is used the
   FD_PIDFS_ROOT sentinel must be passed. This allows us to define
   further special negative fd sentinels in the future.

   Userspace can trivially test for support by trying to open the file
   handle with an invalid file descriptor.

 - Allow pidfds for reaped tasks with SCM_PIDFD messages

   This is a logical continuation of the earlier work to create pidfds
   for reaped tasks through the SO_PEERPIDFD socket option merged in
   923ea4d448 ("Merge patch series "net, pidfs: enable handing out
   pidfds for reaped sk->sk_peer_pid"").

 - Two minor fixes:

    * Fold fs_struct->{lock,seq} into a seqlock

    * Don't bother with path_{get,put}() in unix_open_file()

* tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (37 commits)
  don't bother with path_get()/path_put() in unix_open_file()
  fold fs_struct->{lock,seq} into a seqlock
  selftests: net: extend SCM_PIDFD test to cover stale pidfds
  af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD
  af_unix: stash pidfs dentry when needed
  af_unix/scm: fix whitespace errors
  af_unix: introduce and use scm_replace_pid() helper
  af_unix: introduce unix_skb_to_scm helper
  af_unix: rework unix_maybe_add_creds() to allow sleep
  selftests/pidfd: decode pidfd file handles withou having to specify an fd
  fhandle, pidfs: support open_by_handle_at() purely based on file handle
  uapi/fcntl: add FD_PIDFS_ROOT
  uapi/fcntl: add FD_INVALID
  fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP
  uapi/fcntl: mark range as reserved
  fhandle: reflow get_path_anchor()
  pidfs: add pidfs_root_path() helper
  fhandle: rename to get_path_anchor()
  fhandle: hoist copy_from_user() above get_path_from_fd()
  fhandle: raise FILEID_IS_DIR in handle_type
  ...
2025-07-28 14:10:15 -07:00
Linus Torvalds
278c7d9b5e Merge tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull fallocate updates from Christian Brauner:
 "fallocate() currently supports creating preallocated files
  efficiently. However, on most filesystems fallocate() will preallocate
  blocks in an unwriten state even if FALLOC_FL_ZERO_RANGE is specified.

  The extent state must later be converted to a written state when the
  user writes data into this range, which can trigger numerous metadata
  changes and journal I/O. This may leads to significant write
  amplification and performance degradation in synchronous write mode.

  At the moment, the only method to avoid this is to create an empty
  file and write zero data into it (for example, using 'dd' with a large
  block size). However, this method is slow and consumes a considerable
  amount of disk bandwidth.

  Now that more and more flash-based storage devices are available it is
  possible to efficiently write zeros to SSDs using the unmap write
  zeroes command if the devices do not write physical zeroes to the
  media.

  For example, if SCSI SSDs support the UMMAP bit or NVMe SSDs support
  the DEAC bit[1], the write zeroes command does not write actual data
  to the device, instead, NVMe converts the zeroed range to a
  deallocated state, which works fast and consumes almost no disk write
  bandwidth.

  This series implements the BLK_FEAT_WRITE_ZEROES_UNMAP feature and
  BLK_FLAG_WRITE_ZEROES_UNMAP_DISABLED flag for SCSI, NVMe and
  device-mapper drivers, and add the FALLOC_FL_WRITE_ZEROES and
  STATX_ATTR_WRITE_ZEROES_UNMAP support for ext4 and raw bdev devices.

  fallocate() is subsequently extended with the FALLOC_FL_WRITE_ZEROES
  flag. FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a
  way that subsequent writes to that range do not require further
  changes to the file mapping metadata. This flag is beneficial for
  subsequent pure overwriting within this range, as it can save on block
  allocation and, consequently, significant metadata changes"

* tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  ext4: add FALLOC_FL_WRITE_ZEROES support
  block: add FALLOC_FL_WRITE_ZEROES support
  block: factor out common part in blkdev_fallocate()
  fs: introduce FALLOC_FL_WRITE_ZEROES to fallocate
  dm: clear unmap write zeroes limits when disabling write zeroes
  scsi: sd: set max_hw_wzeroes_unmap_sectors if device supports SD_ZERO_*_UNMAP
  nvmet: set WZDS and DRB if device enables unmap write zeroes operation
  nvme: set max_hw_wzeroes_unmap_sectors if device supports DEAC bit
  block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits
2025-07-28 13:36:49 -07:00
Linus Torvalds
f70d24c230 Merge tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull namespace updates from Christian Brauner:
 "This contains namespace updates. This time specifically for nsfs:

   - Userspace heavily relies on the root inode numbers for namespaces
     to identify the initial namespaces. That's already a hard
     dependency. So we cannot change that anymore. Move the initial
     inode numbers to a public header and align the only two namespaces
     that currently don't do that with all the other namespaces.

   - The root inode of /proc having a fixed inode number has been part
     of the core kernel ABI since its inception, and recently some
     userspace programs (mainly container runtimes) have started to
     explicitly depend on this behaviour.

     The main reason this is useful to userspace is that by checking
     that a suspect /proc handle has fstype PROC_SUPER_MAGIC and is
     PROCFS_ROOT_INO, they can then use openat2() together with
     RESOLVE_{NO_{XDEV,MAGICLINK},BENEATH} to ensure that there isn't a
     bind-mount that replaces some procfs file with a different one.

     This kind of attack has lead to security issues in container
     runtimes in the past (such as CVE-2019-19921) and libraries like
     libpathrs[1] use this feature of procfs to provide safe procfs
     handling functions"

* tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  uapi: export PROCFS_ROOT_INO
  mntns: use stable inode number for initial mount ns
  netns: use stable inode number for initial mount ns
  nsfs: move root inode number to uapi
2025-07-28 12:50:56 -07:00
Linus Torvalds
117eab5c6e Merge tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull coredump updates from Christian Brauner:
 "This contains an extension to the coredump socket and a proper rework
  of the coredump code.

   - This extends the coredump socket to allow the coredump server to
     tell the kernel how to process individual coredumps. This allows
     for fine-grained coredump management. Userspace can decide to just
     let the kernel write out the coredump, or generate the coredump
     itself, or just reject it.

     * COREDUMP_KERNEL
       The kernel will write the coredump data to the socket.

     * COREDUMP_USERSPACE
       The kernel will not write coredump data but will indicate to the
       parent that a coredump has been generated. This is used when
       userspace generates its own coredumps.

     * COREDUMP_REJECT
       The kernel will skip generating a coredump for this task.

     * COREDUMP_WAIT
       The kernel will prevent the task from exiting until the coredump
       server has shutdown the socket connection.

     The flexible coredump socket can be enabled by using the "@@"
     prefix instead of the single "@" prefix for the regular coredump
     socket:

       @@/run/systemd/coredump.socket

   - Cleanup the coredump code properly while we have to touch it
     anyway.

     Split out each coredump mode in a separate helper so it's easy to
     grasp what is going on and make the code easier to follow. The core
     coredump function should now be very trivial to follow"

* tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
  cleanup: add a scoped version of CLASS()
  coredump: add coredump_skip() helper
  coredump: avoid pointless variable
  coredump: order auto cleanup variables at the top
  coredump: add coredump_cleanup()
  coredump: auto cleanup prepare_creds()
  cred: add auto cleanup method
  coredump: directly return
  coredump: auto cleanup argv
  coredump: add coredump_write()
  coredump: use a single helper for the socket
  coredump: move pipe specific file check into coredump_pipe()
  coredump: split pipe coredumping into coredump_pipe()
  coredump: move core_pipe_count to global variable
  coredump: prepare to simplify exit paths
  coredump: split file coredumping into coredump_file()
  coredump: rename do_coredump() to vfs_coredump()
  selftests/coredump: make sure invalid paths are rejected
  coredump: validate socket path in coredump_parse()
  coredump: don't allow ".." in coredump socket path
  ...
2025-07-28 11:50:36 -07:00
Linus Torvalds
126e5754e9 Merge tag 'pull-headers_param' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull asm/param cleanup from Al Viro:
 "This massages asm/param.h to simpler and more uniform shape:

   - all arch/*/include/uapi/asm/param.h are either generated includes
     of <asm-generic/param.h> or a #define or two followed by such
     include

   - no arch/*/include/asm/param.h anywhere, generated or not

   - include <asm/param.h> resolves to arch/*/include/uapi/asm/param.h
     of the architecture in question (or that of host in case of uml)

   - include/asm-generic/param.h pulls uapi/asm-generic/param.h and
     deals with USER_HZ, CLOCKS_PER_SEC and with HZ redefinition after
     that"

* tag 'pull-headers_param' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  loongarch, um, xtensa: get rid of generated arch/$ARCH/include/asm/param.h
  alpha: regularize the situation with asm/param.h
  xtensa: get rid uapi/asm/param.h
2025-07-28 09:03:37 -07:00
Wolfram Sang
f61389a9cd Merge tag 'i2c-host-6.17-pt1' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-mergewindow
i2c-host for v6.17, part 1

Cleanups and refactorings:
- lpi2c, riic, st, stm32f7: general improvements
- riic: support more flexible IRQ configurations
- tegra: fix documentation

Improvements:
- lpi2c: improve register polling and add atomic transfer
- imx: use guarded spinlocks

New hardware support:
- Samsung Exynos 2200
- Renesas RZ/T2H (R9A09G077), RZ/N2H (R9A09G087)

DT binding:
- rk3x: enable power domains
- nxp: support clock property
2025-07-28 10:24:40 +02:00
David Sterba
009b2056cb btrfs: defrag: add flag to force no-compression
Currently the defrag ioctl cannot rewrite the extents without
compression. Add a new flag for that, as setting compression to 0 (or
"no compression") means to do no changes to compression so take what is
the current default, like mount options or properties.

The defrag setting overrides mount or properties. The compression
BTRFS_DEFRAG_DONT_COMPRESS is only used for in-memory operations and
does not need to have a fixed value.

Mount with zstd:9, copy test file from /usr/bin/ (about 260KB):

  $ mount -o compress=zstd:9 /dev/vda /mnt
  $ filefrag -vsb testfile
  filefrag: -b needs a blocksize option, assuming 1024-byte blocks.
  Filesystem type is: 9123683e
  File size of testfile is 297704 (292 blocks of 1024 bytes)
   ext:     logical_offset:        physical_offset: length:   expected: flags:
     0:        0..     127:      13312..     13439:    128:             encoded
     1:      128..     255:      13364..     13491:    128:      13440: encoded
     2:      256..     291:      13424..     13459:     36:      13492: last,encoded,eof
  testfile: 3 extents found

  $ compsize testfile
  Processed 1 file, 3 regular extents (3 refs), 0 inline, 1 fragments.
  Type       Perc     Disk Usage   Uncompressed Referenced
  TOTAL       42%      124K         292K         292K
  zstd        42%      124K         292K         292K

Defrag to uncompressed:

  $ btrfs fi defrag --nocomp testfile
  $ filefrag -vsb testfile
  filefrag: -b needs a blocksize option, assuming 1024-byte blocks.
  Filesystem type is: 9123683e
  File size of testfile is 297704 (292 blocks of 1024 bytes)
   ext:     logical_offset:        physical_offset: length:   expected: flags:
     0:        0..     291:     291840..    292131:    292:             last,eof
  testfile: 1 extent found

  $ compsize testfile
  Processed 1 file, 1 regular extents (1 refs), 0 inline, 1 fragments.
  Type       Perc     Disk Usage   Uncompressed Referenced
  TOTAL      100%      292K         292K         292K
  none       100%      292K         292K         292K

Compress again with LZO:

  $ btrfs fi defrag -clzo testfile
  $ filefrag -vsb testfile
  filefrag: -b needs a blocksize option, assuming 1024-byte blocks.
  Filesystem type is: 9123683e
  File size of testfile is 297704 (292 blocks of 1024 bytes)
   ext:     logical_offset:        physical_offset: length:   expected: flags:
     0:        0..     127:      13312..     13439:    128:             encoded
     1:      128..     255:      13392..     13519:    128:      13440: encoded
     2:      256..     291:      13480..     13515:     36:      13520: last,encoded,eof
  testfile: 3 extents found

  $ compsize testfile
  Processed 1 file, 3 regular extents (3 refs), 0 inline, 1 fragments.
  Type       Perc     Disk Usage   Uncompressed Referenced
  TOTAL       64%      188K         292K         292K
  lzo         64%      188K         292K         292K

Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22 01:13:03 +02:00
Greg Kroah-Hartman
bcbef1e4a6 Merge tag 'v6.16-rc7' into tty-next
We need the tty/serial fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-21 16:53:33 +02:00
Phil Sutter
36a686c078 Revert "netfilter: nf_tables: Add notifications for hook changes"
This reverts commit 465b9ee0ee.

Such notifications fit better into core or nfnetlink_hook code,
following the NFNL_MSG_HOOK_GET message format.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-14 15:22:47 +02:00
Mark Brown
bfd291279f ASoC: codec: Convert to GPIO descriptors for
Merge series from Peng Fan <peng.fan@nxp.com>:

This patchset is a pick up of patch 1,2 from [1]. And I also collect
Linus's R-b for patch 2. After this patchset, there is only one user of
of_gpio.h left in sound driver(pxa2xx-ac97).

of_gpio.h is deprecated, update the driver to use GPIO descriptors.

Patch 1 is to drop legacy platform data which in-tree no users are using it
Patch 2 is to convert to GPIO descriptors

Checking the DTS that use the device, all are using GPIOD_ACTIVE_LOW
polarity for reset-gpios, so all should work as expected with this patch.

[1] https://lore.kernel.org/all/20250408-asoc-gpio-v1-0-c0db9d3fd6e9@nxp.com/
2025-07-14 11:34:16 +01:00
I Viswanath
c3ff7f06c7 i2c: Clarify behavior of I2C_M_RD flag
Update the description of I2C_M_RD to clarify that not setting it
signals a write transaction

Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-07-14 09:15:58 +02:00
Sebastian Andrzej Siewior
760e6f7bef futex: Remove support for IMMUTABLE
The FH_FLAG_IMMUTABLE flag was meant to avoid the reference counting on
the private hash and so to avoid the performance regression on big
machines.
With the switch to per-CPU counter this is no longer needed. That flag
was never useable on any released kernel.

Remove any support for IMMUTABLE while preserve the flags argument and
enforce it to be zero.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250710110011.384614-5-bigeasy@linutronix.de
2025-07-11 16:02:01 +02:00
Linus Torvalds
73d7cf0710 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "Many patches, pretty much all of them small, that accumulated while I
  was on vacation.

  ARM:

   - Remove the last leftovers of the ill-fated FPSIMD host state
     mapping at EL2 stage-1

   - Fix unexpected advertisement to the guest of unimplemented S2 base
     granule sizes

   - Gracefully fail initialising pKVM if the interrupt controller isn't
     GICv3

   - Also gracefully fail initialising pKVM if the carveout allocation
     fails

   - Fix the computing of the minimum MMIO range required for the host
     on stage-2 fault

   - Fix the generation of the GICv3 Maintenance Interrupt in nested
     mode

  x86:

   - Reject SEV{-ES} intra-host migration if one or more vCPUs are
     actively being created, so as not to create a non-SEV{-ES} vCPU in
     an SEV{-ES} VM

   - Use a pre-allocated, per-vCPU buffer for handling de-sparsification
     of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too
     large" issue

   - Allow out-of-range/invalid Xen event channel ports when configuring
     IRQ routing, to avoid dictating a specific ioctl() ordering to
     userspace

   - Conditionally reschedule when setting memory attributes to avoid
     soft lockups when userspace converts huge swaths of memory to/from
     private

   - Add back MWAIT as a required feature for the MONITOR/MWAIT selftest

   - Add a missing field in struct sev_data_snp_launch_start that
     resulted in the guest-visible workarounds field being filled at the
     wrong offset

   - Skip non-canonical address when processing Hyper-V PV TLB flushes
     to avoid VM-Fail on INVVPID

   - Advertise supported TDX TDVMCALLs to userspace

   - Pass SetupEventNotifyInterrupt arguments to userspace

   - Fix TSC frequency underflow"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: avoid underflow when scaling TSC frequency
  KVM: arm64: Remove kvm_arch_vcpu_run_map_fp()
  KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes
  KVM: arm64: Don't free hyp pages with pKVM on GICv2
  KVM: arm64: Fix error path in init_hyp_mode()
  KVM: arm64: Adjust range correctly during host stage-2 faults
  KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi()
  KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush
  KVM: SVM: Add missing member in SNP_LAUNCH_START command structure
  Documentation: KVM: Fix unexpected unindent warnings
  KVM: selftests: Add back the missing check of MONITOR/MWAIT availability
  KVM: Allow CPU to reschedule while setting per-page memory attributes
  KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table.
  KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks
  KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL
  KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight
  KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities
  KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
2025-07-10 09:06:53 -07:00
Aleksa Sarai
76fdb7eb4e uapi: export PROCFS_ROOT_INO
The root inode of /proc having a fixed inode number has been part of the
core kernel ABI since its inception, and recently some userspace
programs (mainly container runtimes) have started to explicitly depend
on this behaviour.

The main reason this is useful to userspace is that by checking that a
suspect /proc handle has fstype PROC_SUPER_MAGIC and is PROCFS_ROOT_INO,
they can then use openat2(RESOLVE_{NO_{XDEV,MAGICLINK},BENEATH}) to
ensure that there isn't a bind-mount that replaces some procfs file with
a different one. This kind of attack has lead to security issues in
container runtimes in the past (such as CVE-2019-19921) and libraries
like libpathrs[1] use this feature of procfs to provide safe procfs
handling functions.

There was also some trailing whitespace in the "struct proc_dir_entry"
initialiser, so fix that up as well.

[1]: https://github.com/openSUSE/libpathrs

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/20250708-uapi-procfs-root-ino-v1-1-6ae61e97c79b@cyphar.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-10 09:39:18 +02:00
Thomas Gleixner
068f7b64bf Merge v6.16-rc2 into timers/ptp
to pick up the __GENMASK() fix, otherwise the AUX clock VDSO patches fail
to compile for compat.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-07-09 11:51:34 +02:00
Uwe Kleine-König
2b2aeaa12c Merge tag 'pm-runtime-6.17-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Runtime PM updates related to autosuspend for 6.17

Make several autosuspend functions mark last busy stamp and update
the documentation accordingly (Sakari Ailus).
2025-07-09 10:48:21 +02:00
Thomas Weißschuh
70b9c0c11e uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again (2)
BITS_PER_LONG does not exist in UAPI headers, so can't be used by the UAPI
__GENMASK(). Instead __BITS_PER_LONG needs to be used.

When __GENMASK() was introduced in commit 3c7a8e190b ("uapi: introduce uapi-friendly macros for GENMASK"),
the code was fine. A broken revert in 1e7933a575 ("uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"")
introduced the incorrect usage of BITS_PER_LONG.
That was fixed in commit 11fcf36850 ("uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again").
But a broken sync of the kernel headers with the tools/ headers in
commit fc92099902 ("tools headers: Synchronize linux/bits.h with the kernel sources")
undid the fix.

Reapply the fix and while at it also fix the tools header.

Fixes: fc92099902 ("tools headers: Synchronize linux/bits.h with the kernel sources")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
2025-07-08 10:23:13 -04:00
Mark Brown
bb96a315b4 ASoC: soc-dapm: cleanups
Merge series from Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>:

This is prepare to hiding snd_soc_dapm_context inside soc-dapm.c
2025-07-07 21:02:59 +01:00
Uwe Kleine-König
9c06f26ba5 pwm: Add support for pwmchip devices for faster and easier userspace access
With this change each pwmchip defining the new-style waveform callbacks
can be accessed from userspace via a character device. Compared to the
sysfs-API this is faster and allows to pass the whole configuration in a
single ioctl allowing atomic application and thus reducing glitches.

On an STM32MP13 I see:

	root@DistroKit:~ time pwmtestperf
	real	0m 1.27s
	user	0m 0.02s
	sys	0m 1.21s
	root@DistroKit:~ rm /dev/pwmchip0
	root@DistroKit:~ time pwmtestperf
	real	0m 3.61s
	user	0m 0.27s
	sys	0m 3.26s

pwmtestperf does essentially:

	for i in 0 .. 50000:
		pwm_set_waveform(duty_length_ns=i, period_length_ns=50000, duty_offset_ns=0)

and in the presence of /dev/pwmchip0 is uses the ioctls introduced here,
without that device it uses /sys/class/pwm/pwmchip0.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/ad4a4e49ae3f8ea81e23cac1ac12b338c3bf5c5b.1746010245.git.u.kleine-koenig@baylibre.com
Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
2025-07-07 08:39:33 +02:00
Christian Brauner
ca115d7e75 tree-wide: s/struct fileattr/struct file_kattr/g
Now that we expose struct file_attr as our uapi struct rename all the
internal struct to struct file_kattr to clearly communicate that it is a
kernel internal struct. This is similar to struct mount_{k}attr and
others.

Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04 16:14:39 +02:00
Pavel Begunkov
cf73d9970e io_uring: don't use int for ABI
__kernel_rwf_t is defined as int, the actual size of which is
implementation defined. It won't go well if some compiler / archs
ever defines it as i64, so replace it with __u32, hoping that
there is no one using i16 for it.

Cc: stable@vger.kernel.org
Fixes: 2b188cc1bb ("Add io_uring IO interface")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/47c666c4ee1df2018863af3a2028af18feef11ed.1751412511.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-02 17:11:58 -06:00
Andrey Albershteyn
be7efb2d20 fs: introduce file_getattr and file_setattr syscalls
Introduce file_getattr() and file_setattr() syscalls to manipulate inode
extended attributes. The syscalls takes pair of file descriptor and
pathname. Then it operates on inode opened accroding to openat()
semantics. The struct file_attr is passed to obtain/change extended
attributes.

This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
that file don't need to be open as we can reference it with a path
instead of fd. By having this we can manipulated inode extended
attributes not only on regular files but also on special ones. This
is not possible with FS_IOC_FSSETXATTR ioctl as with special files
we can not call ioctl() directly on the filesystem inode using fd.

This patch adds two new syscalls which allows userspace to get/set
extended inode attributes on special files by using parent directory
and a path - *at() like syscall.

CC: linux-api@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
CC: linux-xfs@vger.kernel.org
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://lore.kernel.org/20250630-xattrat-syscall-v6-6-c4e3bc35227b@kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-02 17:05:17 +02:00
Pavel Begunkov
e448d57826 io_uring/mock: add trivial poll handler
Add a flag that enables polling on the mock file. For now it's trivially
says that there is always data available, it'll be extended in the
future.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f16de043ec4876d65fae294fc99ade57415fba0c.1750599274.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-02 08:10:26 -06:00
Pavel Begunkov
0c98a44329 io_uring/mock: support for async read/write
Let the user to specify a delay to read/write request. io_uring will
start a timer, return -EIOCBQUEUED and complete the request
asynchronously after the delay pass.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/38f9d2e143fda8522c90a724b74630e68f9bbd16.1750599274.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-02 08:10:26 -06:00
Pavel Begunkov
2f71d2386f io_uring/mock: allow to choose FMODE_NOWAIT
Add an option to choose whether the file supports FMODE_NOWAIT, that
changes the execution path io_uring request takes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1e532565b05a05b23589d237c24ee1a3d90c2fd9.1750599274.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-02 08:10:26 -06:00
Pavel Begunkov
d1aa034657 io_uring/mock: add sync read/write
Add support for synchronous zero read/write for mock files.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/571f3c9fe688e918256a06a722d3db6ced9ca3d5.1750599274.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-02 08:10:26 -06:00
Pavel Begunkov
4aac001f78 io_uring/mock: add cmd using vectored regbufs
There is a command api allowing to import vectored registered buffers,
add a new mock command that uses the feature and simply copies the
specified registered buffer into user space or vice versa.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/229a113fd7de6b27dbef9567f7c0bf4475c9017d.1750599274.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-02 08:10:26 -06:00
Pavel Begunkov
3a0ae385f6 io_uring/mock: add basic infra for test mock files
io_uring commands provide an ioctl style interface for files to
implement file specific operations. io_uring provides many features and
advanced api to commands, and it's getting hard to test as it requires
specific files/devices.

Add basic infrastucture for creating special mock files that will be
implementing the cmd api and using various io_uring features we want to
test. It'll also be useful to test some more obscure read/write/polling
edge cases in the future.

Suggested-by: chase xd <sl1589472800@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/93f21b0af58c1367a2b22635d5a7d694ad0272fc.1750599274.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-02 08:10:26 -06:00
Anuj Gupta
9eb22f7fed fs: add ioctl to query metadata and protection info capabilities
Add a new ioctl, FS_IOC_GETLBMD_CAP, to query metadata and protection
info (PI) capabilities. This ioctl returns information about the files
integrity profile. This is useful for userspace applications to
understand a files end-to-end data protection support and configure the
I/O accordingly.

For now this interface is only supported by block devices. However the
design and placement of this ioctl in generic FS ioctl space allows us
to extend it to work over files as well. This maybe useful when
filesystems start supporting  PI-aware layouts.

A new structure struct logical_block_metadata_cap is introduced, which
contains the following fields:

1. lbmd_flags: bitmask of logical block metadata capability flags
2. lbmd_interval: the amount of data described by each unit of logical
block metadata
3. lbmd_size: size in bytes of the logical block metadata associated
with each interval
4. lbmd_opaque_size: size in bytes of the opaque block tag associated
with each interval
5. lbmd_opaque_offset: offset in bytes of the opaque block tag within
the logical block metadata
6. lbmd_pi_size: size in bytes of the T10 PI tuple associated with each
interval
7. lbmd_pi_offset: offset in bytes of T10 PI tuple within the logical
block metadata
8. lbmd_pi_guard_tag_type: T10 PI guard tag type
9. lbmd_pi_app_tag_size: size in bytes of the T10 PI application tag
10. lbmd_pi_ref_tag_size: size in bytes of the T10 PI reference tag
11. lbmd_pi_storage_tag_size: size in bytes of the T10 PI storage tag

The internal logic to fetch the capability is encapsulated in a helper
function blk_get_meta_cap(), which uses the blk_integrity profile
associated with the device. The ioctl returns -EOPNOTSUPP, if
CONFIG_BLK_DEV_INTEGRITY is not enabled.

Suggested-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/20250630090548.3317-5-anuj20.g@samsung.com
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-01 14:00:15 +02:00
Caleb Sander Mateos
763ff02ce2 ublk: allow UBLK_IO_(UN)REGISTER_IO_BUF on any task
Currently, UBLK_IO_REGISTER_IO_BUF and UBLK_IO_UNREGISTER_IO_BUF are
only permitted on the ublk_io's daemon task. But this restriction is
unnecessary. ublk_register_io_buf() calls __ublk_check_and_get_req() to
look up the request from the tagset and atomically take a reference on
the request without accessing the ublk_io. ublk_unregister_io_buf()
doesn't use the q_id or tag at all.

So allow these opcodes even on tasks other than io->task.

Handle UBLK_IO_UNREGISTER_IO_BUF before obtaining the ubq and io since
the buffer index being unregistered is not necessarily related to the
specified q_id and tag.

Add a feature flag UBLK_F_BUF_REG_OFF_DAEMON that userspace can use to
determine whether the kernel supports off-daemon buffer registration.

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250620151008.3976463-10-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-30 20:13:42 -06:00
Jens Axboe
94b2030968 io_uring: remove errant ';' from IORING_CQE_F_TSTAMP_HW definition
An errant ';' slipped into that definition, which will cause some
compilers to complain when it's used in an application:

timestamp.c:257:45: error: empty expression statement has no effect; remove unnecessary ';' to silence this warning [-Werror,-Wextra-semi-stmt]
  257 |                 hwts = cqe->flags & IORING_CQE_F_TSTAMP_HW;
      |                                                           ^

Fixes: 9e4ed359b8 ("io_uring/netcmd: add tx timestamping cmd support")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-30 11:36:54 -06:00
Greg Kroah-Hartman
815ac67919 Merge 6.16-rc4 into tty-next
We need the tty/serial fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-30 07:50:04 +02:00
Linus Torvalds
e540341508 Merge tag 'block-6.16-20250626' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - Fixes for ublk:
      - fix C++ narrowing warnings in the uapi header
      - update/improve UBLK_F_SUPPORT_ZERO_COPY comment in uapi header
      - fix for the ublk ->queue_rqs() implementation, limiting a batch
        to just the specific task AND ring
      - ublk_get_data() error handling fix
      - sanity check more arguments in ublk_ctrl_add_dev()
      - selftest addition

 - NVMe pull request via Christoph:
      - reset delayed remove_work after reconnect
      - fix atomic write size validation

 - Fix for a warning introduced in bdev_count_inflight_rw() in this
   merge window

* tag 'block-6.16-20250626' of git://git.kernel.dk/linux:
  block: fix false warning in bdev_count_inflight_rw()
  ublk: sanity check add_dev input for underflow
  nvme: fix atomic write size validation
  nvme: refactor the atomic write unit detection
  nvme: reset delayed remove_work after reconnect
  ublk: setup ublk_io correctly in case of ublk_get_data() failure
  ublk: update UBLK_F_SUPPORT_ZERO_COPY comment in UAPI header
  ublk: fix narrowing warnings in UAPI header
  selftests: ublk: don't take same backing file for more than one ublk devices
  ublk: build batch from IOs in same io_ring_ctx and io task
2025-06-27 09:02:33 -07:00
Linus Torvalds
e34a79b96a Merge tag 'net-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth and wireless.

  Current release - regressions:

   - bridge: fix use-after-free during router port configuration

  Current release - new code bugs:

   - eth: wangxun: fix the creation of page_pool

  Previous releases - regressions:

   - netpoll: initialize UDP checksum field before checksumming

   - wifi: mac80211: finish link init before RCU publish

   - bluetooth: fix use-after-free in vhci_flush()

   - eth:
      - ionic: fix DMA mapping test
      - bnxt: properly flush XDP redirect lists

  Previous releases - always broken:

   - netlink: specs: enforce strict naming of properties

   - unix: don't leave consecutive consumed OOB skbs.

   - vsock: fix linux/vm_sockets.h userspace compilation errors

   - selftests: fix TCP packet checksum"

* tag 'net-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
  net: libwx: fix the creation of page_pool
  net: selftests: fix TCP packet checksum
  atm: Release atm_dev_mutex after removing procfs in atm_dev_deregister().
  netlink: specs: enforce strict naming of properties
  netlink: specs: tc: replace underscores with dashes in names
  netlink: specs: rt-link: replace underscores with dashes in names
  netlink: specs: mptcp: replace underscores with dashes in names
  netlink: specs: ovs_flow: replace underscores with dashes in names
  netlink: specs: devlink: replace underscores with dashes in names
  netlink: specs: dpll: replace underscores with dashes in names
  netlink: specs: ethtool: replace underscores with dashes in names
  netlink: specs: fou: replace underscores with dashes in names
  netlink: specs: nfsd: replace underscores with dashes in names
  net: enetc: Correct endianness handling in _enetc_rd_reg64
  atm: idt77252: Add missing `dma_map_error()`
  bnxt: properly flush XDP redirect lists
  vsock/uapi: fix linux/vm_sockets.h userspace compilation errors
  wifi: mac80211: finish link init before RCU publish
  wifi: iwlwifi: mvm: assume '1' as the default mac_config_cmd version
  selftest: af_unix: Add tests for -ECONNRESET.
  ...
2025-06-26 09:13:27 -07:00
Jakub Kicinski
9e6dd4c256 netlink: specs: mptcp: replace underscores with dashes in names
We're trying to add a strict regexp for the name format in the spec.
Underscores will not be allowed, dashes should be used instead.
This makes no difference to C (codegen, if used, replaces special
chars in names) but it gives more uniform naming in Python.

Fixes: bc8aeb2045 ("Documentation: netlink: add a YAML spec for mptcp")
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250624211002.3475021-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-25 15:36:28 -07:00
Mark Brown
51c18d4d88 ASoC: Standardize ASoC menu
Merge series from Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>:

Current Kconfig menu at [ALSA for SoC audio support] has no rules.
So, some venders are using menu style, some venders are listed each drivers
on top page, etc. It is difficult to find target vender and/or drivers
because it is very random.

Let's standardize ASoC menu, like below

	--- ALSA for SoC audio support
	      Analog Devices  --->
	      AMD  --->
	      Apple  --->
	      Atmel  --->
	      Au1x  ----
	      Broadcom  --->
	      Cirrus Logic  --->
	      DesignWare  --->
	      Freescale  --->
	      Google  --->
	      Hisilicon  --->
	      ...

One concern is *vender folder* alphabetical order vs *vender name*
alphabetical order were different. For example "sunxi" menu is
"Allwinner".

Link: https://lore.kernel.org/r/8734c8bf3l.wl-kuninori.morimoto.gx@renesas.com
2025-06-25 16:27:47 +01:00
Caleb Sander Mateos
81b4d1a1d0 ublk: update UBLK_F_SUPPORT_ZERO_COPY comment in UAPI header
UBLK_F_SUPPORT_ZERO_COPY has a very old comment describing the initial
idea for how zero-copy would be implemented. The actual implementation
added in commit 1f6540e2aa ("ublk: zc register/unregister bvec") uses
io_uring registered buffers rather than shared memory mapping.
Remove the inaccurate remarks about mapping ublk request memory into the
ublk server's address space and requiring 4K block size. Replace them
with a description of the current zero-copy mechanism.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250621171015.354932-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-24 20:45:31 -06:00
Caleb Sander Mateos
67caa528ae ublk: fix narrowing warnings in UAPI header
When a C++ file compiled with -Wc++11-narrowing includes the UAPI header
linux/ublk_cmd.h, ublk_sqe_addr_to_auto_buf_reg()'s assignments of u64
values to u8, u16, and u32 fields result in compiler warnings. Add
explicit casts to the intended types to avoid these warnings. Drop the
unnecessary bitmasks.

Reported-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: 99c1e4eb6a ("ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250621162842.337452-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-24 20:45:31 -06:00
Al Viro
f65bbf0539 alpha: regularize the situation with asm/param.h
The only reason why alpha can't do what sparc et.al. are doing
is that include/asm-generic/param.h relies upon the value of HZ
set for userland header in uapi/asm/param.h being 100.

We need that value to define USER_HZ and we need that definition
to outlive the redefinition of HZ kernel-side.  And alpha needs
it to be 1024, not 100 like everybody else.

So let's add __USER_HZ to uapi/asm-generic/param.h, defaulting to
100 and used to define HZ.  That way include/asm-generic/param.h
can use that thing instead of open-coding it - it won't be affected
by undefining and redefining HZ.

That done, alpha asm/param.h can be removed and uapi/asm/param.h
switched to defining __USER_HZ and EXEC_PAGESIZE and then including
<asm-generic/param.h> - asm/param.h will resolve to uapi/asm/param.h,
which pulls <asm-generic/param.h>, which will do the right thing
both in the kernel and userland contexts.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-24 22:02:05 -04:00