1355 Commits

Author SHA1 Message Date
Linus Torvalds
509d3f4584 Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
   fixes a build issue and does some cleanup in ib/sys_info.c

 - "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
   enhances the 64-bit math code on behalf of a PWM driver and beefs up
   the test module for these library functions

 - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
   makes BPF symbol names, sizes, and line numbers available to the GDB
   debugger

 - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
   adds a sysctl which can be used to cause additional info dumping when
   the hung-task and lockup detectors fire

 - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
   adds a general base64 encoder/decoder to lib/ and migrates several
   users away from their private implementations

 - "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
   makes TCP a little faster

 - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
   reworks the KEXEC Handover interfaces in preparation for Live Update
   Orchestrator (LUO), and possibly for other future clients

 - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
   increases the flexibility of KEXEC Handover. Also preparation for LUO

 - "Live Update Orchestrator" (Pasha Tatashin)
   is a major new feature targeted at cloud environments. Quoting the
   cover letter:

      This series introduces the Live Update Orchestrator, a kernel
      subsystem designed to facilitate live kernel updates using a
      kexec-based reboot. This capability is critical for cloud
      environments, allowing hypervisors to be updated with minimal
      downtime for running virtual machines. LUO achieves this by
      preserving the state of selected resources, such as memory,
      devices and their dependencies, across the kernel transition.

      As a key feature, this series includes support for preserving
      memfd file descriptors, which allows critical in-memory data, such
      as guest RAM or any other large memory region, to be maintained in
      RAM across the kexec reboot.

   Mike Rappaport merits a mention here, for his extensive review and
   testing work.

 - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
   moves the kexec and kdump sysfs entries from /sys/kernel/ to
   /sys/kernel/kexec/ and adds back-compatibility symlinks which can
   hopefully be removed one day

 - "kho: fixes for vmalloc restoration" (Mike Rapoport)
   fixes a BUG which was being hit during KHO restoration of vmalloc()
   regions

* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
  calibrate: update header inclusion
  Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
  vmcoreinfo: track and log recoverable hardware errors
  kho: fix restoring of contiguous ranges of order-0 pages
  kho: kho_restore_vmalloc: fix initialization of pages array
  MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
  init: replace simple_strtoul with kstrtoul to improve lpj_setup
  KHO: fix boot failure due to kmemleak access to non-PRESENT pages
  Documentation/ABI: new kexec and kdump sysfs interface
  Documentation/ABI: mark old kexec sysfs deprecated
  kexec: move sysfs entries to /sys/kernel/kexec
  test_kho: always print restore status
  kho: free chunks using free_page() instead of kfree()
  selftests/liveupdate: add kexec test for multiple and empty sessions
  selftests/liveupdate: add simple kexec-based selftest for LUO
  selftests/liveupdate: add userspace API selftests
  docs: add documentation for memfd preservation via LUO
  mm: memfd_luo: allow preserving memfd
  liveupdate: luo_file: add private argument to store runtime state
  mm: shmem: export some functions to internal.h
  ...
2025-12-06 14:01:20 -08:00
Linus Torvalds
6dfafbd029 Merge tag 'drm-next-2025-12-03' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
 "There was a rather late merge of a new color pipeline feature, that
  some userspace projects are blocked on, and has seen a lot of work in
  amdgpu. This should have seen some time in -next. There is additional
  support for this for Intel, that if it arrives in the next day or two
  I'll pass it on in another pull request and you can decide if you want
  to take it.

  Highlights:
   - Arm Ethos NPU accelerator driver
   - new DRM color pipeline support
   - amdgpu will now run discrete SI/CIK cards instead of radeon, which
     enables vulkan support in userspace
   - msm gets gen8 gpu support
   - initial Xe3P support in xe

  Full detail summary:

  New driver:
   - Arm Ethos-U65/U85 accel driver

  Core:
   - support the drm color pipeline in vkms/amdgfx
   - add support for drm colorop pipeline
   - add COLOR PIPELINE plane property
   - add DRM_CLIENT_CAP_PLANE_COLOR_PIPELINE
   - throttle dirty worker with vblank
   - use drm_for_each_bridge_in_chain_scoped in drm's bridge code
   - Ensure drm_client_modeset tests are enabled in UML
   - add simulated vblank interrupt - use in drivers
   - dumb buffer sizing helper
   - move freeing of drm client memory to driver
   - crtc sharpness strength property
   - stop using system_wq in scheduler/drivers
   - support emergency restore in drm-client

  Rust:
   - make slice::as_flattened usable on all supported rustc
   - add FromBytes::from_bytes_prefix() method
   - remove redundant device ptr from Rust GEM object
   - Change how AlwaysRefCounted is implemented for GEM objects

  gpuvm:
   - Add deferred vm_bo cleanup to GPUVM (for rust)

  atomic:
   - cleanup and improve state handling interfaces

  buddy:
   - optimize block management

  dma-buf:
   - heaps: Create heap per CMA reserved location
   - improve userspace documentation

  dp:
   - add POST_LT_ADJ_REQ training sequence
   - DPCD dSC quirk for synaptics panamera devices
   - helpers to query branch DSC max throughput

  ttm:
   - Rename ttm_bo_put to ttm_bo_fini
   - allow page protection flags on risc-v
   - rework pipelined eviction fence handling

  amdgpu:
   - enable amdgpu by default for SI/CI dGPUs
   - enable DC by default on SI
   - refactor CIK/SI enablement
   - add ABM KMS property
   - Re-enable DM idle optimizations
   - DC Analog encoders support
   - Powerplay fixes for fiji/iceland
   - Enable DC on bonaire by default
   - HMM cleanup
   - Add new RAS framework
   - DML2.1 updates
   - YCbCr420 fixes
   - DC FP fixes
   - DMUB fixes
   - LTTPR fixes
   - DTBCLK fixes
   - DMU cursor offload handling
   - Userq validation improvements
   - Unify shutdown callback handling
   - Suspend improvements
   - Power limit code cleanup
   - SR-IOV fixes
   - AUX backlight fixes
   - DCN 3.5 fixes
   - HDMI compliance fixes
   - DCN 4.0.1 cursor updates
   - DCN interrupt fix
   - DC KMS full update improvements
   - Add additional HDCP traces
   - DCN 3.2 fixes
   - DP MST fixes
   - Add support for new SR-IOV mailbox interface
   - UQ reset support
   - HDP flush rework
   - VCE1 support

  amdkfd:
   - HMM cleanups
   - Relax checks on save area overallocations
   - Fix GPU mappings after prefetch

  radeon:
   - refactor CIK/SI enablement

  xe:
   - Initial Xe3P support
   - panic support on VRAM for display
   - fix stolen size check
   - Loosen used tracking restriction
   - New SR-IOV debugfs structure and debugfs updates
   - Hide the GPU madvise flag behind a VM_BIND flag
   - Always expose VRAM provisioning data on discrete GPUs
   - Allow VRAM mappings for userptr when used with SVM
   - Allow pinning of p2p dma-buf
   - Use per-tile debugfs where appropriate
   - Add documentation for Execution Queues
   - PF improvements
   - VF migration recovery redesign work
   - User / Kernel VRAM partitioning
   - Update Tile-based messages
   - Allow configfs to disable specific GT types
   - VF provisioning and migration improvements
   - use SVM range helpers in PT layer
   - Initial CRI support
   - access VF registers using dedicated MMIO view
   - limit number of jobs per exec queue
   - add sriov_admin sysfs tree
   - more crescent island specific support
   - debugfs residency counter
   - SRIOV migration work
   - runtime registers for GFX 35

  i915:
   - add initial Xe3p_LPD display version 35 support
   - Enable LNL+ content adaptive sharpness filter
   - Use optimized VRR guardband
   - Enable Xe3p LT PHY
   - enable FBC support for Xe3p_LPD display
   - add display 30.02 firmware support
   - refactor SKL+ watermark latency setup
   - refactor fbdev handling
   - call i915/xe runtime PM via function pointers
   - refactor i915/xe stolen memory/display interfaces
   - use display version instead of gfx version in display code
   - extend i915_display_info with Type-C port details
   - lots of display cleanups/refactorings
   - set O_LARGEFILE in __create_shmem
   - skuip guc communication warning on reset
   - fix time conversions
   - defeature DRRS on LNL+
   - refactor intel_frontbuffer split between i915/xe/display
   - convert inteL_rom interfaces to struct drm_device
   - unify display register polling interfaces
   - aovid lock inversion when pinning to GGTT on CHV/BXT+VTD

  panel:
   - Add KD116N3730A08/A12, chromebook mt8189
   - JT101TM023, LQ079L1SX01,
   - GLD070WX3-SL01 MIPI DSI
   - Samsung LTL106AL0, Samsung LTL106AL01
   - Raystar RFF500F-AWH-DNN
   - Winstar WF70A8SYJHLNGA
   - Wanchanglong w552946aaa
   - Samsung SOFEF00
   - Lenovo X13s panel
   - ilitek-ili9881c - add rpi 5" support
   - visionx-rm69299 - add backlight support
   - edp - support AUI B116XAN02.0

  bridge:
   - improve ref counting
   - ti-sn65dsi86 - add support for DP mode with HPD
   - synopsis: support CEC, init timer with correct freq
   - ASL CS5263 DP-to-HDMI bridge support

  nova-core:
   - introduce bitfield! macro
   - introduce safe integer converters
   - GSP inits to fully booted state on Ampere
   - Use more future-proof register for GPU identification

  nova-drm:
   - select NOVA_CORE
   - 64-bit only

  nouveau:
   - improve reclocking on tegra 186+
   - add large page and compression support

  msm:
   - GPU:
      - Gen8 support: A840 (Kaanapali) and X2-85 (Glymur)
      - A612 support
   - MDSS:
      - Added support for Glymur and QCS8300 platforms
   - DPU:
      - Enabled Quad-Pipe support, unlocking higher resolutions support
      - Added support for Glymur platform
      - Documented DPU on QCS8300 platform as supported
   - DisplayPort:
      - Added support for Glymur platform
      - Added support lame remapping inside DP block
      - Documented DisplayPort controller on QCS8300 and SM6150/QCS615
        as supported

  tegra:
   - NVJPG driver

  panfrost:
   - display JM contexts over debugfs
   - export JM contexts to userspace
   - improve error and job handling

  panthor:
   - support custom ASN_HASH for mt8196
   - support mali-G1 GPU
   - flush shmem write before mapping buffers uncached
   - make timeout per-queue instead of per-job

  mediatek:
   - MT8195/88 HDMIv2/DDCv2 support

  rockchip:
   - dsi: add support for RK3368

  amdxdna:
   - enhance runtime PM
   - last hardware error reading uapi
   - support firmware debug output
   - add resource and telemetry data uapi
   - preemption support

  imx:
   - add driver for HDMI TX Parallel audio interface

  ivpu:
   - add support for user-managed preemption buffer
   - add userptr support
   - update JSM firware API to 3.33.0
   - add better alloc/free warnings
   - fix page fault in unbind all bos
   - rework bind/unbind of imported buffers
   - enable MCA ECC signalling
   - split fw runtime and global memory buffers
   - add fdinfo memory statistics

  tidss:
   - convert to drm logging
   - logging cleanup

  ast:
   - refactor generation init paths
   - add per chip generation detect_tx_chip
   - set quirks for each chip model

  atmel-hlcdc:
   - set LCDC_ATTRE register in plane disable
   - set correct values for plane scaler

  solomon:
   - use drm helper for get_modes and move_valid

  sitronix:
   - fix output position when clearing screens

  qaic:
   - support dma-buf exports
   - support new firmware's READ_DATA implementation
   - sahara AIC200 image table update
   - add sysfs support
   - add coredump support
   - add uevents support
   - PM support

  sun4i:
   - layer refactors to decouple plane from output
   - improve DE33 support

  vc4:
   - switch to generic CEC helpers

  komeda:
   - use drm_ logging functions

  vkms:
   - configfs support for display configuration

  vgem:
   - fix fence timer deadlock

  etnaviv:
   - add HWDB entry for GC8000 Nano Ultra VIP r6205"

* tag 'drm-next-2025-12-03' of https://gitlab.freedesktop.org/drm/kernel: (1869 commits)
  Revert "drm/amd: Skip power ungate during suspend for VPE"
  drm/amdgpu: use common defines for HUB faults
  drm/amdgpu/gmc12: add amdgpu_vm_handle_fault() handling
  drm/amdgpu/gmc11: add amdgpu_vm_handle_fault() handling
  drm/amdgpu: use static ids for ACP platform devs
  drm/amdgpu/sdma6: Update SDMA 6.0.3 FW version to include UMQ protected-fence fix
  drm/amdgpu: Forward VMID reservation errors
  drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring
  drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring
  drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring
  drm/amdgpu/gmc6: Cache VM fault info
  drm/amdgpu/gmc6: Don't print MC client as it's unknown
  drm/amdgpu/cz_ih: Enable soft IRQ handler ring
  drm/amdgpu/tonga_ih: Enable soft IRQ handler ring
  drm/amdgpu/iceland_ih: Enable soft IRQ handler ring
  drm/amdgpu/cik_ih: Enable soft IRQ handler ring
  drm/amdgpu/si_ih: Enable soft IRQ handler ring
  drm/amd/display: fix typo in display_mode_core_structs.h
  drm/amd/display: fix Smart Power OLED not working after S4
  drm/amd/display: Move RGB-type check for audio sync to DCE HW sequence
  ...
2025-12-04 08:53:30 -08:00
Linus Torvalds
2ddcf4962c Merge tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild updates from Nicolas Schier:

  - Enable -fms-extensions, allowing anonymous use of tagged struct or
    union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary
    conversion patch is added here, too (btrfs).

    [ Editor's note: the core of this actually came in early through a
      shared branch and a few other trees    - Linus ]

  - Introduce architecture-specific CC_CAN_LINK and flags for userprogs

  - Add new packaging target 'modules-cpio-pkg' for building a initramfs
    cpio w/ kmods

  - Handle included .c files in gen_compile_commands

  - Minor kbuild changes:
     - Use objtree for module signing key path, fixing oot kmod signing
     - Improve documentation of KBUILD_BUILD_TIMESTAMP
     - Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
     - Rename scripts/Makefile.extrawarn to Makefile.warn
     - Drop obsolete types.h check from headers_check.pl
     - Remove outdated config leak ignore entries

* tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: add target to build a cpio containing modules
  initramfs: add gen_init_cpio to hostprogs unconditionally
  kbuild: allow architectures to override CC_CAN_LINK
  init: deduplicate cc-can-link.sh invocations
  kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings
  scripts: headers_install.sh: Remove two outdated config leak ignore entries
  scripts/clang-tools: Handle included .c files in gen_compile_commands
  kbuild: uapi: Drop types.h check from headers_check.pl
  kbuild: Rename Makefile.extrawarn to Makefile.warn
  MAINTAINERS, .mailmap: Update mail address for Nicolas Schier
  kbuild: uapi: reuse KBUILD_USERCFLAGS
  kbuild: doc: improve KBUILD_BUILD_TIMESTAMP documentation
  kbuild: Use objtree for module signing key path
  btrfs: send: make use of -fms-extensions for defining struct fs_path
2025-12-03 14:42:21 -08:00
Pasha Tatashin
48a1b2321d liveupdate: kho: move to kernel/liveupdate
Move KHO to kernel/liveupdate/ in preparation of placing all Live Update
core kernel related files to the same place.

[pasha.tatashin@soleen.com: disable the menu when DEFERRED_STRUCT_PAGE_INIT]
  Link: https://lkml.kernel.org/r/CA+CK2bAvh9Oa2SLfsbJ8zztpEjrgr_hr-uGgF1coy8yoibT39A@mail.gmail.com
Link: https://lkml.kernel.org/r/20251101142325.1326536-8-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27 14:24:33 -08:00
Thomas Weißschuh
deab487e0f kbuild: allow architectures to override CC_CAN_LINK
The generic test for CC_CAN_LINK assumes that all architectures use -m32
and -m64 to switch between 32-bit and 64-bit compilation. This is overly
simplistic. Architectures may use other flags (-mabi, -m31, etc.) or may
also require byte order handling (-mlittle-endian, -EL). Expressing all
of the different possibilities will be very complicated and brittle.
Instead allow architectures to supply their own logic which will be
easy to understand and evolve.

Both the boolean ARCH_HAS_CC_CAN_LINK and the string ARCH_USERFLAGS need
to be implemented as kconfig does not allow the reuse of string options.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-3-4dee0d74d439@linutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-14 20:20:35 +01:00
Thomas Weißschuh
80623f2c83 init: deduplicate cc-can-link.sh invocations
The command to invoke scripts/cc-can-link.sh is very long and new usages
are about to be added.

Add a helper variable to make the code easier to read and maintain.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-2-4dee0d74d439@linutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-14 20:20:34 +01:00
Alexandre Courbot
88622323dd rust: enable slice_flatten feature and provide it through an extension trait
In Rust 1.80, the previously unstable `slice::flatten` family of methods
have been stabilized and renamed to `slice::as_flattened`.

This creates an issue as we want to use `as_flattened`, but need to
support the MSRV (which at the moment is Rust 1.78) where it is named
`flatten`.

Solve this by enabling the `slice_flatten` feature, and providing an
`as_flattened` implementation through an extension trait for compiler
versions where it is not available.

The trait is then exported from the prelude, making the `as_flattened`
family of methods transparently available for all supported compiler
versions.

This extension trait can be removed once the MSRV passes 1.80.

Suggested-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/all/CANiq72kK4pG=O35NwxPNoTO17oRcg1yfGcvr3==Fi4edr+sfmw@mail.gmail.com/
Acked-by: Danilo Krummrich <dakr@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Message-ID: <20251110-gsp_boot-v9-8-8ae4058e3c0e@nvidia.com>
Message-ID: <20251104-b4-as-flattened-v3-1-6cb9c26b45cd@nvidia.com>
2025-11-14 20:25:57 +09:00
Douglas Anderson
032a730268 init/main.c: wrap long kernel cmdline when printing to logs
The kernel cmdline length is allowed to be longer than what printk can
handle.  When this happens the cmdline that's printed to the kernel ring
buffer at bootup is cutoff and some kernel cmdline options are "hidden"
from the logs.  This undercuts the usefulness of the log message.

Specifically, grepping for COMMAND_LINE_SIZE shows that 2048 is common and
some architectures even define it as 4096.  s390 allows a CONFIG-based
maximum up to 1MB (though it's not expected that anyone will go over the
default max of 4096 [1]).

The maximum message pr_notice() seems to be able to handle (based on
experiment) is 1021 characters.  This appears to be based on the current
value of PRINTKRB_RECORD_MAX as 1024 and the fact that pr_notice() spends
2 characters on the loglevel prefix and we have a '\n' at the end.

While it would be possible to increase the limits of printk() (and
therefore pr_notice()) somewhat, it doesn't appear possible to increase it
enough to fully include a 2048-character cmdline without breaking
userspace.  Specifically on at least two tested userspaces (ChromeOS plus
the Debian-based distro I'm typing this message on) the `dmesg` tool reads
lines from `/dev/kmsg` in 2047-byte chunks.  As per
`Documentation/ABI/testing/dev-kmsg`:

  Every read() from the opened device node receives one record
  of the kernel's printk buffer.
  ...
  Messages in the record ring buffer get overwritten as whole,
  there are never partial messages received by read().

We simply can't fit a 2048-byte cmdline plus the "Kernel command line:"
prefix plus info about time/log_level/etc in a 2047-byte read.

The above means that if we want to avoid the truncation we need to do some
type of wrapping of the cmdline when printing.

Add wrapping to the printout of the kernel command line.  By default, the
wrapping is set to 1021 characters to avoid breaking anyone, but allow
wrapping to be set lower by a Kconfig knob
"CONFIG_CMDLINE_LOG_WRAP_IDEAL_LEN".  Any tools that are correctly parsing
the cmdline today (because it is less than 1021 characters) will see no
difference in their behavior.  The format of wrapped output is designed to
be matched by anyone using "grep" to search for the cmdline and also to be
easy for tools to handle.  Anyone who is sure their tools (if any) handle
the wrapped format can choose a lower wrapping value and have prettier
output.

Setting CONFIG_CMDLINE_LOG_WRAP_IDEAL_LEN to 0 fully disables the wrapping
logic.  This means that long command lines will be truncated again, but
this config could be set if command lines are expected to be long and
userspace is known not to handle parsing logs with the wrapping.

Wrapping is based on spaces, ignoring quotes.  All lines are prefixed with
"Kernel command line: " and lines that are not the last line have a " \"
suffix added to them.  The prefix and suffix count towards the line length
for wrapping purposes.  The ideal length will be exceeded if no
appropriate place to wrap is found.

The wrapping function added here is fairly generic and could be made a
library function (somewhat like print_hex_dump()) if it's needed elsewhere
in the kernel.  However, having printk() directly incorporate this
wrapping would be unlikely to be a good idea since it would break
printouts into more than one record without any obvious common line prefix
to tie lines together.  It would also be extra overhead when, in general,
kernel log message should simply be kept smaller than 1021 bytes.  For
some discussion on this topic, see responses to the v1 posting of this
patch [2].

[akpm@linux-foundation.org: make print_kernel_cmdline __init]
[dianders@chromium.org: v4]
  Link: https://lkml.kernel.org/r/20251027082204.v4.1.I095f1e2c6c27f9f4de0b4841f725f356c643a13f@changeid
Link: https://lkml.kernel.org/r/20251023113257.v3.1.I095f1e2c6c27f9f4de0b4841f725f356c643a13f@changeid
Link: https://lore.kernel.org/r/20251021131633.26700Dd6-hca@linux.ibm.com [1]
Link: https://lore.kernel.org/r/CAD=FV=VNyt1zG_8pS64wgV8VkZWiWJymnZ-XCfkrfaAhhFSKcA@mail.gmail.com [2]
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Chant <achant@google.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Francesco Valla <francesco@valla.it>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: guoweikang <guoweikang.kernel@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Hendrik Farr <kernel@jfarr.cc>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12 10:00:16 -08:00
Thomas Gleixner
3db6b38dfe rseq: Switch to fast path processing on exit to user
Now that all bits and pieces are in place, hook the RSEQ handling fast path
function into exit_to_user_mode_prepare() after the TIF work bits have been
handled. If case of fast path failure, TIF_NOTIFY_RESUME has been raised
and the caller needs to take another turn through the TIF handling slow
path.

This only works for architectures which use the generic entry code.
Architectures who still have their own incomplete hacks are not supported
and won't be.

This results in the following improvements:

  Kernel build	       Before		  After		      Reduction

  exit to user         80692981		  80514451
  signal checks:          32581		       121	       99%
  slowpath runs:        1201408   1.49%	       198 0.00%      100%
  fastpath runs:			    675941 0.84%       N/A
  id updates:           1233989   1.53%	     50541 0.06%       96%
  cs checks:            1125366   1.39%	         0 0.00%      100%
    cs cleared:         1125366      100%	 0            100%
    cs fixup:                 0        0%	 0

  RSEQ selftests      Before		  After		      Reduction

  exit to user:       386281778		  387373750
  signal checks:       35661203		          0           100%
  slowpath runs:      140542396 36.38%	        100  0.00%    100%
  fastpath runs:			    9509789  2.51%     N/A
  id updates:         176203599 45.62%	    9087994  2.35%     95%
  cs checks:          175587856 45.46%	    4728394  1.22%     98%
    cs cleared:       172359544   98.16%    1319307   27.90%   99%
    cs fixup:           3228312    1.84%    3409087   72.10%

The 'cs cleared' and 'cs fixup' percentages are not relative to the exit to
user invocations, they are relative to the actual 'cs check' invocations.

While some of this could have been avoided in the original code, like the
obvious clearing of CS when it's already clear, the main problem of going
through TIF_NOTIFY_RESUME cannot be solved. In some workloads the RSEQ
notify handler is invoked more than once before going out to user
space. Doing this once when everything has stabilized is the only solution
to avoid this.

The initial attempt to completely decouple it from the TIF work turned out
to be suboptimal for workloads, which do a lot of quick and short system
calls. Even if the fast path decision is only 4 instructions (including a
conditional branch), this adds up quickly and becomes measurable when the
rate for actually having to handle rseq is in the low single digit
percentage range of user/kernel transitions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.701201365@linutronix.de
2025-11-04 08:34:39 +01:00
Thomas Gleixner
9c37cb6e80 rseq: Provide static branch for runtime debugging
Config based debug is rarely turned on and is not available easily when
things go wrong.

Provide a static branch to allow permanent integration of debug mechanisms
along with the usual toggles in Kconfig, command line and debugfs.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.089270547@linutronix.de
2025-11-04 08:32:49 +01:00
Thomas Gleixner
5412910487 rseq: Expose lightweight statistics in debugfs
Analyzing the call frequency without actually using tracing is helpful for
analysis of this infrastructure. The overhead is minimal as it just
increments a per CPU counter associated to each operation.

The debugfs readout provides a racy sum of all counters.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.027916598@linutronix.de
2025-11-04 08:32:41 +01:00
Linus Torvalds
48e3694ae7 Merge tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:

 - Add KUnit test for the printk ring buffer

 - Fix the check of the maximal record size which is allowed to be
   stored into the printk ring buffer. It prevents corruptions of the
   ring buffer.

   Note that printk() is on the safe side. The messages are limited by
   1kB buffer and are always small enough for the minimal log buffer
   size 4kB, see CONFIG_LOG_BUF_SHIFT definition.

* tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: ringbuffer: Fix data block max size check
  printk: kunit: support offstack cpumask
  printk: kunit: Fix __counted_by() in struct prbtest_rbdata
  printk: ringbuffer: Explain why the KUnit test ignores failed writes
  printk: ringbuffer: Add KUnit test
2025-10-04 11:13:11 -07:00
Petr Mladek
7a75a5da79 Merge branch 'rework/ringbuffer-kunit-test' into for-linus 2025-10-02 10:33:08 +02:00
Linus Torvalds
4b81e2eb9e Merge tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO updates from Thomas Gleixner:

 - Further consolidation of the VDSO infrastructure and the common data
   store

 - Simplification of the related Kconfig logic

 - Improve the VDSO selftest suite

* tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests: vDSO: Drop vdso_test_clock_getres
  selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64()
  selftests: vDSO: vdso_test_abi: Test CPUTIME clocks
  selftests: vDSO: vdso_test_abi: Use explicit indices for name array
  selftests: vDSO: vdso_test_abi: Drop clock availability tests
  selftests: vDSO: vdso_test_abi: Use ksft_finished()
  selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO
  selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper
  vdso: Add struct __kernel_old_timeval forward declaration to gettime.h
  vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO
  vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
  vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE
  vdso: Drop kconfig GENERIC_COMPAT_VDSO
  vdso: Drop kconfig GENERIC_VDSO_32
  riscv: vdso: Untangle Kconfig logic
  time: Build generic update_vsyscall() only with generic time vDSO
  vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs
  vdso: Move ENABLE_COMPAT_VDSO from core to arm64
  ARM: VDSO: Remove cntvct_ok global variable
  vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
2025-09-30 16:58:21 -07:00
Linus Torvalds
9cc220a422 Merge tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:

 - Refactor SCLP memory hotplug code

 - Introduce common boot_panic() decompressor helper macro and use it to
   get rid of nearly few identical implementations

 - Take into account additional key generation flags and forward it to
   the ep11 implementation. With that allow users to modify the key
   generation process, e.g. provide valid combinations of XCP_BLOB_*
   flags

 - Replace kmalloc() + copy_from_user() with memdup_user_nul() in s390
   debug facility and HMC driver

 - Add DAX support for DCSS memory block devices

 - Make the compiler statement attribute "assume" available with a new
   __assume macro

 - Rework ffs() and fls() family bitops functions, including source code
   improvements and generated code optimizations. Use the newly
   introduced __assume macro for that

 - Enable additional network features in default configurations

 - Use __GFP_ACCOUNT flag for user page table allocations to add missing
   kmemcg accounting

 - Add WQ_PERCPU flag to explicitly request the use of the per-CPU
   workqueue for 3590 tape driver

 - Switch power reading to the per-CPU and the Hiperdispatch to the
   default workqueue

 - Add memory allocation profiling hooks to allow better profiling data
   and the /proc/allocinfo output similar to other architectures

* tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (21 commits)
  s390/mm: Add memory allocation profiling hooks
  s390: Replace use of system_wq with system_dfl_wq
  s390/diag324: Replace use of system_wq with system_percpu_wq
  s390/tape: Add WQ_PERCPU to alloc_workqueue users
  s390/bitops: Switch to generic ffs() if supported by compiler
  s390/bitops: Switch to generic fls(), fls64(), etc.
  s390/mm: Use __GFP_ACCOUNT for user page table allocations
  s390/configs: Enable additional network features
  s390/bitops: Cleanup __flogr()
  s390/bitops: Use __assume() for __flogr() inline assembly return value
  compiler_types: Add __assume macro
  s390/bitops: Limit return value range of __flogr()
  s390/dcssblk: Add DAX support
  s390/hmcdrv: Replace kmalloc() + copy_from_user() with memdup_user_nul()
  s390/debug: Replace kmalloc() + copy_from_user() with memdup_user_nul()
  s390/pkey: Forward keygenflags to ep11_unwrapkey
  s390/boot: Add common boot_panic() code
  s390/bitops: Optimize inlining
  s390/bitops: Slightly optimize ffs() and fls64()
  s390/sclp: Move memory hotplug code for better modularity
  ...
2025-09-29 19:14:25 -07:00
Linus Torvalds
a5ba183bde Merge tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
 "One notable addition is the creation of the 'transitional' keyword for
  kconfig so CONFIG renaming can go more smoothly.

  This has been a long-standing deficiency, and with the renaming of
  CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI
  support), this came up again.

  The breadth of the diffstat is mainly this renaming.

   - Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)

   - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
     (Junjie Cao)

   - Add str_assert_deassert() helper (Lad Prabhakar)

   - gcc-plugins: Remove TODO_verify_il for GCC >= 16

   - kconfig: Fix BrokenPipeError warnings in selftests

   - kconfig: Add transitional symbol attribute for migration support

   - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI"

* tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lib/string_choices: Add str_assert_deassert() helper
  kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
  kconfig: Add transitional symbol attribute for migration support
  kconfig: Fix BrokenPipeError warnings in selftests
  gcc-plugins: Remove TODO_verify_il for GCC >= 16
  stddef: Introduce __TRAILING_OVERLAP()
  stddef: Remove token-pasting in TRAILING_OVERLAP()
  lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
2025-09-29 17:48:27 -07:00
Linus Torvalds
b7ce6fa90f Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Add "initramfs_options" parameter to set initramfs mount options.
     This allows to add specific mount options to the rootfs to e.g.,
     limit the memory size

   - Add RWF_NOSIGNAL flag for pwritev2()

     Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
     signal from being raised when writing on disconnected pipes or
     sockets. The flag is handled directly by the pipe filesystem and
     converted to the existing MSG_NOSIGNAL flag for sockets

   - Allow to pass pid namespace as procfs mount option

     Ever since the introduction of pid namespaces, procfs has had very
     implicit behaviour surrounding them (the pidns used by a procfs
     mount is auto-selected based on the mounting process's active
     pidns, and the pidns itself is basically hidden once the mount has
     been constructed)

     This implicit behaviour has historically meant that userspace was
     required to do some special dances in order to configure the pidns
     of a procfs mount as desired. Examples include:

     * In order to bypass the mnt_too_revealing() check, Kubernetes
       creates a procfs mount from an empty pidns so that user
       namespaced containers can be nested (without this, the nested
       containers would fail to mount procfs)

       But this requires forking off a helper process because you cannot
       just one-shot this using mount(2)

     * Container runtimes in general need to fork into a container
       before configuring its mounts, which can lead to security issues
       in the case of shared-pidns containers (a privileged process in
       the pidns can interact with your container runtime process)

       While SUID_DUMP_DISABLE and user namespaces make this less of an
       issue, the strict need for this due to a minor uAPI wart is kind
       of unfortunate

       Things would be much easier if there was a way for userspace to
       just specify the pidns they want. So this pull request contains
       changes to implement a new "pidns" argument which can be set
       using fsconfig(2):

           fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
           fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);

       or classic mount(2) / mount(8):

           // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
           mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");

  Cleanups:

   - Remove the last references to EXPORT_OP_ASYNC_LOCK

   - Make file_remove_privs_flags() static

   - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used

   - Use try_cmpxchg() in start_dir_add()

   - Use try_cmpxchg() in sb_init_done_wq()

   - Replace offsetof() with struct_size() in ioctl_file_dedupe_range()

   - Remove vfs_ioctl() export

   - Replace rwlock() with spinlock in epoll code as rwlock causes
     priority inversion on preempt rt kernels

   - Make ns_entries in fs/proc/namespaces const

   - Use a switch() statement() in init_special_inode() just like we do
     in may_open()

   - Use struct_size() in dir_add() in the initramfs code

   - Use str_plural() in rd_load_image()

   - Replace strcpy() with strscpy() in find_link()

   - Rename generic_delete_inode() to inode_just_drop() and
     generic_drop_inode() to inode_generic_drop()

   - Remove unused arguments from fcntl_{g,s}et_rw_hint()

  Fixes:

   - Document @name parameter for name_contains_dotdot() helper

   - Fix spelling mistake

   - Always return zero from replace_fd() instead of the file descriptor
     number

   - Limit the size for copy_file_range() in compat mode to prevent a
     signed overflow

   - Fix debugfs mount options not being applied

   - Verify the inode mode when loading it from disk in minixfs

   - Verify the inode mode when loading it from disk in cramfs

   - Don't trigger automounts with RESOLVE_NO_XDEV

     If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
     through automounts, but could still trigger them

   - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in
     tracepoints

   - Fix unused variable warning in rd_load_image() on s390

   - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD

   - Use ns_capable_noaudit() when determining net sysctl permissions

   - Don't call path_put() under namespace semaphore in listmount() and
     statmount()"

* tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
  fcntl: trim arguments
  listmount: don't call path_put() under namespace semaphore
  statmount: don't call path_put() under namespace semaphore
  pid: use ns_capable_noaudit() when determining net sysctl permissions
  fs: rename generic_delete_inode() and generic_drop_inode()
  init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
  initramfs: Replace strcpy() with strscpy() in find_link()
  initrd: Use str_plural() in rd_load_image()
  initramfs: Use struct_size() helper to improve dir_add()
  initrd: Fix unused variable warning in rd_load_image() on s390
  fs: use the switch statement in init_special_inode()
  fs/proc/namespaces: make ns_entries const
  filelock: add FL_RECLAIM to show_fl_flags() macro
  eventpoll: Replace rwlock with spinlock
  selftests/proc: add tests for new pidns APIs
  procfs: add "pidns" mount option
  pidns: move is-ancestor logic to helper
  openat2: don't trigger automounts with RESOLVE_NO_XDEV
  namei: move cross-device check to __traverse_mounts
  namei: remove LOOKUP_NO_XDEV check from handle_mounts
  ...
2025-09-29 09:03:07 -07:00
Linus Torvalds
fde0ab43b9 Fix CC_HAS_ASM_GOTO_OUTPUT on non-x86 architectures
There's a silly problem with the CC_HAS_ASM_GOTO_OUTPUT test: even with
a working compiler it will fail on some architectures simply because it
uses the mnemonic "jmp" for testing the inline asm.

And as reported by Geert, not all architectures use that mnemonic, so
the test fails spuriously on such platforms (including arm and riscv,
but also several other architectures).

This issue avoided any obvious test failures because the build still
works thanks to falling back on the old non-asm-goto code, which just
generates worse code.

Just use an empty asm statement instead.

Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Fixes: e2ffa15b9b ("kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17")
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-29 08:54:12 -07:00
Kees Cook
23ef9d4397 kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
The kernel's CFI implementation uses the KCFI ABI specifically, and is
not strictly tied to a particular compiler. In preparation for GCC
supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with
associated options).

Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will
enable CONFIG_CFI during olddefconfig.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-24 14:29:14 -07:00
Thomas Gleixner
e2ffa15b9b kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17
clang < 17 fails to use scope local labels with CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y:

     {
     	__label__ local_lbl;
	...
	unsafe_get_user(uval, uaddr, local_lbl);
	...
	return 0;
	local_lbl:
		return -EFAULT;
     }

when two such scopes exist in the same function:

  error: cannot jump from this asm goto statement to one of its possible targets

There are other failure scenarios. Shuffling code around slightly makes it
worse and fail even with one instance.

That issue prevents using local labels for a cleanup based user access
mechanism.

After failed attempts to provide a simple enough test case for the 'depends
on' test in Kconfig, the initial cure was to mark ASM goto broken on clang
versions < 17 to get this road block out of the way.

But Nathan pointed out that this is a known clang issue and indeed affects
clang < version 17 in combination with cleanup(). It's not even required to
use local labels for that.

The clang issue tracker has a small enough test case, which can be used as
a test in the 'depends on' section of CC_HAS_ASM_GOTO_OUTPUT:

void bar(void **);
void* baz(void);

int  foo (void) {
    {
	    asm goto("jmp %l0"::::l0);
	    return 0;
l0:
	    return 1;
    }
    void *x __attribute__((cleanup(bar))) = baz();
    {
	    asm goto("jmp %l0"::::l1);
	    return 42;
l1:
	    return 0xff;
    }
}

Add another dependency to config CC_HAS_ASM_GOTO_OUTPUT for it and use the
clang issue tracker test case for detection by condensing it to obfuscated
C-code contest format. This reliably catches the problem on clang < 17 and
did not show any issues on the non broken GCC versions.

That test might be sufficient to catch all issues and therefore could
replace the existing test, but keeping that around does no harm either.

Thanks to Nathan for pointing to the relevant clang issue!

Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1886
Link: f023f5cdb2
2025-09-24 09:29:15 +02:00
Heiko Carstens
f72e2cff13 compiler_types: Add __assume macro
Make the statement attribute "assume" with a new __assume macro available.

The assume attribute is used to indicate that a certain condition is
assumed to be true. Compilers may or may not use this indication to
generate optimized code. If this condition is violated at runtime, the
behavior is undefined.

Note that the clang documentation states that optimizers may react
differently to this attribute, and this may even have a negative
performance impact. Therefore this attribute should be used with care.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2025-09-18 14:06:40 +02:00
Geert Uytterhoeven
7479260860 init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
INITRAMFS_PRESERVE_MTIME is only used in init/initramfs.c and
init/initramfs_test.c.  Hence add a dependency on BLK_DEV_INITRD, to
prevent asking the user about this feature when configuring a kernel
without initramfs support.

Fixes: 1274aea127 ("initramfs: add INITRAMFS_PRESERVE_MTIME Kconfig option")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 15:02:17 +02:00
Linus Torvalds
b236920731 Merge tag 'rust-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust fixes from Miguel Ojeda:

 - Two changes to prepare for the future Rust 1.91.0 release (expected
   2025-10-30, currently in nightly): a target specification format
   change and a renamed, soon-to-be-stabilized 'core' function.

* tag 'rust-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: support Rust >= 1.91.0 target spec
  rust: use the new name Location::file_as_c_str() in Rust >= 1.91.0
2025-09-06 12:33:09 -07:00
Thomas Weißschuh
bad53ae2dc vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
All architectures implementing time-related functionality in the vDSO are
using the generic vDSO library which handles time namespaces properly.

Remove the now unnecessary Kconfig symbol.

Enables the use of time namespaces on architectures, which use the
generic vDSO but did not enable GENERIC_VDSO_TIME_NS, namely MIPS and arm.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/all/20250826-vdso-cleanups-v1-10-d9b65750e49f@linutronix.de
2025-09-04 11:23:50 +02:00
Alice Ryhl
c09461a0d2 rust: use the new name Location::file_as_c_str() in Rust >= 1.91.0
As part of the stabilization of Location::file_with_nul(), it was brought
up that the with_nul() suffix usually means something else in Rust APIs,
so the API is being renamed prior to stabilization [1].

Thus, use the new name on new rustc versions.

Link: https://www.github.com/rust-lang/rust/pull/145928 [1]
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250827-file_as_c_str-v1-1-d3f5a3916a9c@google.com
[ Kept `cfg` separation. Reworded slightly. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-08-31 23:34:34 +02:00
Nathan Chancellor
86a9b12506 hardening: Require clang 20.1.0 for __counted_by
After an innocuous change in -next that modified a structure that
contains __counted_by, clang-19 start crashing when building certain
files in drivers/gpu/drm/xe. When assertions are enabled, the more
descriptive failure is:

  clang: clang/lib/AST/RecordLayoutBuilder.cpp:3335: const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const: Assertion `D && "Cannot get layout of forward declarations!"' failed.

According to a reverse bisect, a tangential change to the LLVM IR
generation phase of clang during the LLVM 20 development cycle [1]
resolves this problem. Bump the version of clang that enables
CONFIG_CC_HAS_COUNTED_BY to 20.1.0 to ensure that this issue cannot be
hit.

Link: 160fb1121c [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20250807-fix-counted_by-clang-19-v1-1-902c86c1d515@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-08-29 12:04:53 -07:00
Linus Torvalds
e991acf1bc Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
 "Significant patch series in this pull request:

   - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
     us closer to being able to remove page->mapping

   - "relayfs: misc changes" (Jason Xing) does some maintenance and
     minor feature addition work in relayfs

   - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
     us from static preallocation of the kdump crashkernel's working
     memory over to dynamic allocation. So the difficulty of a-priori
     estimation of the second kernel's needs is removed and the first
     kernel obtains extra memory

   - "generalize panic_print's dump function to be used by other
     kernel parts" (Feng Tang) implements some consolidation and
     rationalization of the various ways in which a failing kernel
     splats information at the operator

* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
  tools/getdelays: add backward compatibility for taskstats version
  kho: add test for kexec handover
  delaytop: enhance error logging and add PSI feature description
  samples: Kconfig: fix spelling mistake "instancess" -> "instances"
  fat: fix too many log in fat_chain_add()
  scripts/spelling.txt: add notifer||notifier to spelling.txt
  xen/xenbus: fix typo "notifer"
  net: mvneta: fix typo "notifer"
  drm/xe: fix typo "notifer"
  cxl: mce: fix typo "notifer"
  KVM: x86: fix typo "notifer"
  MAINTAINERS: add maintainers for delaytop
  ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
  ucount: fix atomic_long_inc_below() argument type
  kexec: enable CMA based contiguous allocation
  stackdepot: make max number of pools boot-time configurable
  lib/xxhash: remove unused functions
  init/Kconfig: restore CONFIG_BROKEN help text
  lib/raid6: update recov_rvv.c zero page usage
  docs: update docs after introducing delaytop
  ...
2025-08-03 16:23:09 -07:00
Andrew Morton
fefbeed8c6 init/Kconfig: restore CONFIG_BROKEN help text
Linus added it in 2003, it later was removed.  Put it back.

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-02 12:01:37 -07:00
Linus Torvalds
6a68cec16b Merge tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:

 - Add support for cgroup "cpu.max" interface

 - Code organization cleanup so that ext_idle.c doesn't depend on the
   source-file-inclusion build method of sched/

 - Drop UP paths in accordance with sched core changes

 - Documentation and other misc changes

* tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Fix scx_bpf_reenqueue_local() reference
  sched_ext: Drop kfuncs marked for removal in 6.15
  sched_ext, rcu: Eject BPF scheduler on RCU CPU stall panic
  kernel/sched/ext.c: fix typo "occured" -> "occurred" in comments
  sched_ext: Add support for cgroup bandwidth control interface
  sched_ext, sched/core: Factor out struct scx_task_group
  sched_ext: Return NULL in llc_span
  sched_ext: Always use SMP versions in kernel/sched/ext_idle.h
  sched_ext: Always use SMP versions in kernel/sched/ext_idle.c
  sched_ext: Always use SMP versions in kernel/sched/ext.h
  sched_ext: Always use SMP versions in kernel/sched/ext.c
  sched_ext: Documentation: Clarify time slice handling in task lifecycle
  sched_ext: Make scx_locked_rq() inline
  sched_ext: Make scx_rq_bypassing() inline
  sched_ext: idle: Make local functions static in ext_idle.c
  sched_ext: idle: Remove unnecessary ifdef in scx_bpf_cpu_node()
2025-07-31 16:29:46 -07:00
Linus Torvalds
beace86e61 Merge tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
 "As usual, many cleanups. The below blurbiage describes 42 patchsets.
  21 of those are partially or fully cleanup work. "cleans up",
  "cleanup", "maintainability", "rationalizes", etc.

  I never knew the MM code was so dirty.

  "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes)
     addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly
     mapped VMAs were not eligible for merging with existing adjacent
     VMAs.

  "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park)
     adds a new kernel module which simplifies the setup and usage of
     DAMON in production environments.

  "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig)
     is a cleanup to the writeback code which removes a couple of
     pointers from struct writeback_control.

  "drivers/base/node.c: optimization and cleanups" (Donet Tom)
     contains largely uncorrelated cleanups to the NUMA node setup and
     management code.

  "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman)
     does some maintenance work on the userfaultfd code.

  "Readahead tweaks for larger folios" (Ryan Roberts)
     implements some tuneups for pagecache readahead when it is reading
     into order>0 folios.

  "selftests/mm: Tweaks to the cow test" (Mark Brown)
     provides some cleanups and consistency improvements to the
     selftests code.

  "Optimize mremap() for large folios" (Dev Jain)
     does that. A 37% reduction in execution time was measured in a
     memset+mremap+munmap microbenchmark.

  "Remove zero_user()" (Matthew Wilcox)
     expunges zero_user() in favor of the more modern memzero_page().

  "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand)
     addresses some warts which David noticed in the huge page code.
     These were not known to be causing any issues at this time.

  "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park)
     provides some cleanup and consolidation work in DAMON.

  "use vm_flags_t consistently" (Lorenzo Stoakes)
     uses vm_flags_t in places where we were inappropriately using other
     types.

  "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy)
     increases the reliability of large page allocation in the memfd
     code.

  "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple)
     removes several now-unneeded PFN_* flags.

  "mm/damon: decouple sysfs from core" (SeongJae Park)
     implememnts some cleanup and maintainability work in the DAMON
     sysfs layer.

  "madvise cleanup" (Lorenzo Stoakes)
     does quite a lot of cleanup/maintenance work in the madvise() code.

  "madvise anon_name cleanups" (Vlastimil Babka)
     provides additional cleanups on top or Lorenzo's effort.

  "Implement numa node notifier" (Oscar Salvador)
     creates a standalone notifier for NUMA node memory state changes.
     Previously these were lumped under the more general memory
     on/offline notifier.

  "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan)
     cleans up the pageblock isolation code and fixes a potential issue
     which doesn't seem to cause any problems in practice.

  "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park)
     adds additional drgn- and python-based DAMON selftests which are
     more comprehensive than the existing selftest suite.

  "Misc rework on hugetlb faulting path" (Oscar Salvador)
     fixes a rather obscure deadlock in the hugetlb fault code and
     follows that fix with a series of cleanups.

  "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport)
     rationalizes and cleans up the highmem-specific code in the CMA
     allocator.

  "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand)
     provides cleanups and future-preparedness to the migration code.

  "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park)
     adds some tracepoints to some DAMON auto-tuning code.

  "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park)
     does that.

  "mm/damon: misc cleanups" (SeongJae Park)
     also does what it claims.

  "mm: folio_pte_batch() improvements" (David Hildenbrand)
     cleans up the large folio PTE batching code.

  "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park)
     facilitates dynamic alteration of DAMON's inter-node allocation
     policy.

  "Remove unmap_and_put_page()" (Vishal Moola)
     provides a couple of page->folio conversions.

  "mm: per-node proactive reclaim" (Davidlohr Bueso)
     implements a per-node control of proactive reclaim - beyond the
     current memcg-based implementation.

  "mm/damon: remove damon_callback" (SeongJae Park)
     replaces the damon_callback interface with a more general and
     powerful damon_call()+damos_walk() interface.

  "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes)
     implements a number of mremap cleanups (of course) in preparation
     for adding new mremap() functionality: newly permit the remapping
     of multiple VMAs when the user is specifying MREMAP_FIXED. It still
     excludes some specialized situations where this cannot be performed
     reliably.

  "drop hugetlb_free_pgd_range()" (Anthony Yznaga)
     switches some sparc hugetlb code over to the generic version and
     removes the thus-unneeded hugetlb_free_pgd_range().

  "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park)
     augments the present userspace-requested update of DAMON sysfs
     monitoring files. Automatic update is now provided, along with a
     tunable to control the update interval.

  "Some randome fixes and cleanups to swapfile" (Kemeng Shi)
     does what is claims.

  "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand)
     provides (and uses) a means by which debug-style functions can grab
     a copy of a pageframe and inspect it locklessly without tripping
     over the races inherent in operating on the live pageframe
     directly.

  "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan)
     addresses the large contention issues which can be triggered by
     reads from that procfs file. Latencies are reduced by more than
     half in some situations. The series also introduces several new
     selftests for the /proc/pid/maps interface.

  "__folio_split() clean up" (Zi Yan)
     cleans up __folio_split()!

  "Optimize mprotect() for large folios" (Dev Jain)
     provides some quite large (>3x) speedups to mprotect() when dealing
     with large folios.

  "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian)
     does some cleanup work in the selftests code.

  "tools/testing: expand mremap testing" (Lorenzo Stoakes)
     extends the mremap() selftest in several ways, including adding
     more checking of Lorenzo's recently added "permit mremap() move of
     multiple VMAs" feature.

  "selftests/damon/sysfs.py: test all parameters" (SeongJae Park)
     extends the DAMON sysfs interface selftest so that it tests all
     possible user-requested parameters. Rather than the present minimal
     subset"

* tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits)
  MAINTAINERS: add missing headers to mempory policy & migration section
  MAINTAINERS: add missing file to cgroup section
  MAINTAINERS: add MM MISC section, add missing files to MISC and CORE
  MAINTAINERS: add missing zsmalloc file
  MAINTAINERS: add missing files to page alloc section
  MAINTAINERS: add missing shrinker files
  MAINTAINERS: move memremap.[ch] to hotplug section
  MAINTAINERS: add missing mm_slot.h file THP section
  MAINTAINERS: add missing interval_tree.c to memory mapping section
  MAINTAINERS: add missing percpu-internal.h file to per-cpu section
  mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()
  selftests/damon: introduce _common.sh to host shared function
  selftests/damon/sysfs.py: test runtime reduction of DAMON parameters
  selftests/damon/sysfs.py: test non-default parameters runtime commit
  selftests/damon/sysfs.py: generalize DAMON context commit assertion
  selftests/damon/sysfs.py: generalize monitoring attributes commit assertion
  selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion
  selftests/damon/sysfs.py: test DAMOS filters commitment
  selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion
  selftests/damon/sysfs.py: test DAMOS destinations commitment
  ...
2025-07-31 14:57:54 -07:00
Linus Torvalds
bf76f23aa1 Merge tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Core scheduler changes:

   - Better tracking of maximum lag of tasks in presence of different
     slices duration, for better handling of lag in the fair scheduler
     (Vincent Guittot)

   - Clean up and standardize #if/#else/#endif markers throughout the
     entire scheduler code base (Ingo Molnar)

   - Make SMP unconditional: build the SMP scheduler's data structures
     and logic on UP kernel too, even though they are not used, to
     simplify the scheduler and remove around 200 #ifdef/[#else]/#endif
     blocks from the scheduler (Ingo Molnar)

   - Reorganize cgroup bandwidth control interface handling for better
     interfacing with sched_ext (Tejun Heo)

  Balancing:

   - Bump sd->max_newidle_lb_cost when newidle balance fails (Chris
     Mason)

   - Remove sched_domain_topology_level::flags to simplify the code
     (Prateek Nayak)

   - Simplify and clean up build_sched_topology() (Li Chen)

   - Optimize build_sched_topology() on large machines (Li Chen)

  Real-time scheduling:

   - Add initial version of proxy execution: a mechanism for
     mutex-owning tasks to inherit the scheduling context of higher
     priority waiters.

     Currently limited to a single runqueue and conditional on
     CONFIG_EXPERT, and other limitations (John Stultz, Peter Zijlstra,
     Valentin Schneider)

   - Deadline scheduler (Juri Lelli):
      - Fix dl_servers initialization order (Juri Lelli)
      - Fix DL scheduler's root domain reinitialization logic (Juri
        Lelli)
      - Fix accounting bugs after global limits change (Juri Lelli)
      - Fix scalability regression by implementing less agressive
        dl_server handling (Peter Zijlstra)

  PSI:

   - Improve scalability by optimizing psi_group_change() cpu_clock()
     usage (Peter Zijlstra)

  Rust changes:

   - Make Task, CondVar and PollCondVar methods inline to avoid
     unnecessary function calls (Kunwu Chan, Panagiotis Foliadis)

   - Add might_sleep() support for Rust code: Rust's "#[track_caller]"
     mechanism is used so that Rust's might_sleep() doesn't need to be
     defined as a macro (Fujita Tomonori)

   - Introduce file_from_location() (Boqun Feng)

  Debugging & instrumentation:

   - Make clangd usable with scheduler source code files again (Peter
     Zijlstra)

   - tools: Add root_domains_dump.py which dumps root domains info (Juri
     Lelli)

   - tools: Add dl_bw_dump.py for printing bandwidth accounting info
     (Juri Lelli)

  Misc cleanups & fixes:

   - Remove play_idle() (Feng Lee)

   - Fix check_preemption_disabled() (Sebastian Andrzej Siewior)

   - Do not call __put_task_struct() on RT if pi_blocked_on is set (Luis
     Claudio R. Goncalves)

   - Correct the comment in place_entity() (wang wei)"

* tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits)
  sched/idle: Remove play_idle()
  sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
  sched: Start blocked_on chain processing in find_proxy_task()
  sched: Fix proxy/current (push,pull)ability
  sched: Add an initial sketch of the find_proxy_task() function
  sched: Fix runtime accounting w/ split exec & sched contexts
  sched: Move update_curr_task logic into update_curr_se
  locking/mutex: Add p->blocked_on wrappers for correctness checks
  locking/mutex: Rework task_struct::blocked_on
  sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
  sched/topology: Remove sched_domain_topology_level::flags
  x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
  x86/smpboot: moves x86_topology to static initialize and truncate
  x86/smpboot: remove redundant CONFIG_SCHED_SMT
  smpboot: introduce SDTL_INIT() helper to tidy sched topology setup
  tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info
  tools/sched: Add root_domains_dump.py which dumps root domains info
  sched/deadline: Fix accounting after global limits change
  sched/deadline: Reset extra_bw to max_bw when clearing root domains
  sched/deadline: Initialize dl_servers after SMP
  ...
2025-07-29 17:42:52 -07:00
Linus Torvalds
f38b1f243e Merge tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex updates from Thomas Gleixner:

 - Switch the reference counting to a RCU based per-CPU reference to
   address a performance bottleneck vs the single instance rcuref
   variant

 - Make the futex selftest build on 32-bit architectures which only
   support 64-bit time_t, e.g. RISCV-32

 - Cleanups and improvements in selftests and futex bench

* tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/futex: Fix spelling mistake "Succeffuly" -> "Successfully"
  selftests/futex: Define SYS_futex on 32-bit architectures with 64-bit time_t
  perf bench futex: Remove support for IMMUTABLE
  selftests/futex: Remove support for IMMUTABLE
  futex: Remove support for IMMUTABLE
  futex: Make futex_private_hash_get() static
  futex: Use RCU-based per-CPU reference counting instead of rcuref_t
  selftests/futex: Adapt the private hash test to RCU related changes
2025-07-29 14:39:42 -07:00
Linus Torvalds
0d5ec7919f Merge tag 'char-misc-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc / IIO / other driver updates from Greg KH:
 "Here is the big set of char/misc/iio and other smaller driver
  subsystems for 6.17-rc1. It's a big set this time around, with the
  huge majority being in the iio subsystem with new drivers and dts
  files being added there.

  Highlights include:
   - IIO driver updates, additions, and changes making more code const
     and cleaning up some init logic
   - bus_type constant conversion changes
   - misc device test functions added
   - rust miscdevice minor fixup
   - unused function removals for some drivers
   - mei driver updates
   - mhi driver updates
   - interconnect driver updates
   - Android binder updates and test infrastructure added
   - small cdx driver updates
   - small comedi fixes
   - small nvmem driver updates
   - small pps driver updates
   - some acrn virt driver fixes for printk messages
   - other small driver updates

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
  binder: Use seq_buf in binder_alloc kunit tests
  binder: Add copyright notice to new kunit files
  misc: ti_fpc202: Switch to of_fwnode_handle()
  bus: moxtet: Use dev_fwnode()
  pc104: move PC104 option to drivers/Kconfig
  drivers: virt: acrn: Don't use %pK through printk
  comedi: fix race between polling and detaching
  interconnect: qcom: Add Milos interconnect provider driver
  dt-bindings: interconnect: document the RPMh Network-On-Chip Interconnect in Qualcomm Milos SoC
  mei: more prints with client prefix
  mei: bus: use cldev in prints
  bus: mhi: host: pci_generic: Add Telit FN990B40 modem support
  bus: mhi: host: Detect events pointing to unexpected TREs
  bus: mhi: host: pci_generic: Add Foxconn T99W696 modem
  bus: mhi: host: Use str_true_false() helper
  bus: mhi: host: pci_generic: Add support for EM929x and set MRU to 32768 for better performance.
  bus: mhi: host: Fix endianness of BHI vector table
  bus: mhi: host: pci_generic: Disable runtime PM for QDU100
  bus: mhi: host: pci_generic: Fix the modem name of Foxconn T99W640
  dt-bindings: interconnect: qcom,msm8998-bwmon: Allow 'nonposted-mmio'
  ...
2025-07-29 09:52:01 -07:00
Linus Torvalds
c3018a2c6a Merge tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:

 - Optimization to avoid reference counts on non-cloned registered
   buffers. This is how these buffers were handled prior to having
   cloning support, and we can still use that approach as long as the
   buffers haven't been cloned to another ring.

 - Cleanup and improvement for uring_cmd, where btrfs was the only user
   of storing allocated data for the lifetime of the uring_cmd. Clean
   that up so we can get rid of the need to do that.

 - Avoid unnecessary memory copies in uring_cmd usage. This is
   particularly important as a lot of uring_cmd usage necessitates the
   use of 128b SQEs.

 - A few updates for recv multishot, where it's now possible to add
   fairness limits for limiting how much is transferred for each retry
   loop. Additionally, recv multishot now supports an overall cap as
   well, where once reached the multishot recv will terminate. The
   latter is useful for buffer management and juggling many recv streams
   at the same time.

 - Add support for returning the TX timestamps via a new socket command.
   This feature can work in either singleshot or multishot mode, where
   the latter triggers a completion whenever new timestamps are
   available. This is an alternative to using the existing error queue.

 - Add support for an io_uring "mock" file, which is the start of being
   able to do 100% targeted testing in terms of exercising io_uring
   request handling. The idea is to have a file type that can be
   anything the tester would like, and behave exactly how you want it to
   behave in terms of hitting the code paths you want.

 - Improve zcrx by using sgtables to de-duplicate and improve dma
   address handling.

 - Prep work for supporting larger pages for zcrx.

 - Various little improvements and fixes.

* tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux: (42 commits)
  io_uring/zcrx: fix leaking pages on sg init fail
  io_uring/zcrx: don't leak pages on account failure
  io_uring/zcrx: fix null ifq on area destruction
  io_uring: fix breakage in EXPERT menu
  io_uring/cmd: remove struct io_uring_cmd_data
  btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmd
  io_uring/cmd: introduce IORING_URING_CMD_REISSUE flag
  io_uring/zcrx: account area memory
  io_uring: export io_[un]account_mem
  io_uring/net: Support multishot receive len cap
  io_uring: deduplicate wakeup handling
  io_uring/net: cast min_not_zero() type
  io_uring/poll: cleanup apoll freeing
  io_uring/net: allow multishot receive per-invocation cap
  io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags
  io_uring/net: use passed in 'len' in io_recv_buf_select()
  io_uring/zcrx: prepare fallback for larger pages
  io_uring/zcrx: assert area type in io_zcrx_iov_page
  io_uring/zcrx: allocate sgtable for umem areas
  io_uring/zcrx: introduce io_populate_area_dma
  ...
2025-07-28 16:30:12 -07:00
Randy Dunlap
61a789ad43 pc104: move PC104 option to drivers/Kconfig
Put the PC104 kconfig option in drivers/Kconfig along with
other buses (AMBA, EISA, PCI, CXL, PCCard, & RapidIO).
This localizes PC104 with option bus kconfig options to make
it easier to find.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: William Breathitt Gray <wbg@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250722235431.3671754-1-rdunlap@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24 11:42:16 +02:00
Randy Dunlap
d1fbe1ebf4 io_uring: fix breakage in EXPERT menu
Add a dependency for IO_URING for the GCOV_PROFILE_URING symbol.

Without this patch the EXPERT config menu ends with
"Enable IO uring support" and the menu prompts for
GCOV_PROFILE_URING and IO_URING_MOCK_FILE are not subordinate to it.
This causes all of the EXPERT Kconfig options that follow
GCOV_PROFILE_URING to be display in the "upper" menu (General setup),
just following the EXPERT menu.

Fixes: 1802656ef8 ("io_uring: add GCOV_PROFILE_URING Kconfig option")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: io-uring@vger.kernel.org
Link: https://lore.kernel.org/r/20250720010456.2945344-1-rdunlap@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-20 12:32:54 -06:00
Kirill A. Shutemov
fdc5001b00 mm/vmstat: make MEMCG select VM_EVENT_COUNTERS
The vmstat_text array contains labels for counters displayed in
/proc/vmstat.  It is important to keep the labels in sync with the
counters.

There is a BUILD_BUG_ON() check in vmstat_start() that ensures the size of
the vmstat_text is not smaller than VM_EVENT_COUNTERS.  This helps to
catch cases where a new counter is added but the label is not.  However,
it does not help if a counter is removed but the label remains.

It would be nice to make the BUILD_BUG_ON() check more strict to catch
such cases.  However, when compiling with MEMCG enabled but
VM_EVENT_COUNTERS disabled, the vmstat_text array is larger than
NR_VMSTAT_ITEMS.

This issue arises because some elements of the vmstat_text array are
present when either MEMCG or VM_EVENT_COUNTERS is enabled, but
NR_VMSTAT_ITEMS only accounts for these elements if VM_EVENT_COUNTERS is
enabled.

Instead of adjusting the NR_VMSTAT_ITEMS definition to account for MEMCG,
make MEMCG select VM_EVENT_COUNTERS.  VM_EVENT_COUNTERS is enabled in most
configurations anyway.

Link: https://lkml.kernel.org/r/20250604095111.533783-1-kirill.shutemov@linux.intel.com
Fixes: ebc5d83d04 ("mm/memcontrol: use vmstat names for printing statistics")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 18:59:46 -07:00
John Stultz
25c411fce7 sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
Add a CONFIG_SCHED_PROXY_EXEC option, along with a boot argument
sched_proxy_exec= that can be used to disable the feature at boot
time if CONFIG_SCHED_PROXY_EXEC was enabled.

Also uses this option to allow the rq->donor to be different from
rq->curr.

Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lkml.kernel.org/r/20250712033407.2383110-2-jstultz@google.com
2025-07-14 17:16:31 +02:00
Peter Zijlstra
8f2146159b Merge branch 'tip/sched/urgent'
Avoid merge conflicts

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2025-07-14 17:16:28 +02:00
Peter Zijlstra
56180dd20c futex: Use RCU-based per-CPU reference counting instead of rcuref_t
The use of rcuref_t for reference counting introduces a performance bottleneck
when accessed concurrently by multiple threads during futex operations.

Replace rcuref_t with special crafted per-CPU reference counters. The
lifetime logic remains the same.

The newly allocate private hash starts in FR_PERCPU state. In this state, each
futex operation that requires the private hash uses a per-CPU counter (an
unsigned int) for incrementing or decrementing the reference count.

When the private hash is about to be replaced, the per-CPU counters are
migrated to a atomic_t counter mm_struct::futex_atomic.
The migration process:
- Waiting for one RCU grace period to ensure all users observe the
  current private hash. This can be skipped if a grace period elapsed
  since the private hash was assigned.

- futex_private_hash::state is set to FR_ATOMIC, forcing all users to
  use mm_struct::futex_atomic for reference counting.

- After a RCU grace period, all users are guaranteed to be using the
  atomic counter. The per-CPU counters can now be summed up and added to
  the atomic_t counter. If the resulting count is zero, the hash can be
  safely replaced. Otherwise, active users still hold a valid reference.

- Once the atomic reference count drops to zero, the next futex
  operation will switch to the new private hash.

call_rcu_hurry() is used to speed up transition which otherwise might be
delay with RCU_LAZY. There is nothing wrong with using call_rcu(). The
side effects would be that on auto scaling the new hash is used later
and the SET_SLOTS prctl() will block longer.

[bigeasy: commit description + mm get/ put_async]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250710110011.384614-3-bigeasy@linutronix.de
2025-07-11 16:02:00 +02:00
Pavel Begunkov
3a0ae385f6 io_uring/mock: add basic infra for test mock files
io_uring commands provide an ioctl style interface for files to
implement file specific operations. io_uring provides many features and
advanced api to commands, and it's getting hard to test as it requires
specific files/devices.

Add basic infrastucture for creating special mock files that will be
implementing the cmd api and using various io_uring features we want to
test. It'll also be useful to test some more obscure read/write/polling
edge cases in the future.

Suggested-by: chase xd <sl1589472800@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/93f21b0af58c1367a2b22635d5a7d694ad0272fc.1750599274.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-02 08:10:26 -06:00
Sebastian Andrzej Siewior
9a57c37731 futex: Temporary disable FUTEX_PRIVATE_HASH
Chris Mason reported a performance regression on big iron. Reports of
this kind were usually reported as part of a micro benchmark but Chris'
test did mimic his real workload. This makes it a real regression.

The root cause is rcuref_get() which is invoked during each futex
operation. If all threads of an application do this simultaneously then
it leads to cache line bouncing and the performance drops.

Disable FUTEX_PRIVATE_HASH entirely for this cycle. The performance
regression will be addressed in the following cycle enabling the option
again.

Closes: https://lore.kernel.org/all/3ad05298-351e-4d61-9972-ca45a0a50e33@meta.com/
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250630145034.8JnINEaS@linutronix.de
2025-07-01 15:02:05 +02:00
Boqun Feng
0aa2b78ce5 rust: Introduce file_from_location()
Most of kernel debugging facilities take a nul-terminated string for
file names for a callsite (generated from __FILE__), however the Rust
courterpart, Location, would return a Rust string (not nul-terminated)
from method .file(). And such a string cannot be passed to C debugging
function directly.

There is ongoing work to support a Location::file_with_nul() [1], which
returns a nul-terminated string from a Location. Since it's still
working in progress, and it will take some time before the feature
finally gets stabilized and the kernel's minimal rustc version might
also take a while to bump to a version that at least has that feature,
introduce a file_from_location() function, which returns a warning
string if Location::file_with_nul() is not available.

This should work in most cases because as for now the known usage of
Location::file_with_nul() is only in debugging code (e.g. might_sleep())
and there might be other information reported by the debugging code that
could help locate the problematic function, so missing the file name is
fine at the moment.

Link: https://github.com/rust-lang/rust/issues/141727 [1]
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250619151007.61767-2-boqun.feng@gmail.com
2025-06-24 15:53:46 -07:00
Tejun Heo
ddceadce63 sched_ext: Add support for cgroup bandwidth control interface
From 077814f57f8acce13f91dc34bbd2b7e4911fbf25 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Fri, 13 Jun 2025 15:06:47 -1000

- Add CONFIG_GROUP_SCHED_BANDWIDTH which is selected by both
  CONFIG_CFS_BANDWIDTH and EXT_GROUP_SCHED.

- Put bandwidth control interface files for both cgroup v1 and v2 under
  CONFIG_GROUP_SCHED_BANDWIDTH.

- Update tg_bandwidth() to fetch configuration parameters from fair if
  CONFIG_CFS_BANDWIDTH, SCX otherwise.

- Update tg_set_bandwidth() to update the parameters for both fair and SCX.

- Add bandwidth control parameters to struct scx_cgroup_init_args.

- Add sched_ext_ops.cgroup_set_bandwidth() which is invoked on bandwidth
  control parameter updates.

- Update scx_qmap and maximal selftest to test the new feature.

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-06-20 17:03:51 -10:00
Thomas Weißschuh
5ea2bcdfbf printk: ringbuffer: Add KUnit test
The KUnit test validates the correct operation of the ringbuffer.
A separate dedicated ringbuffer is used so that the global printk
ringbuffer is not touched.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20250612-printk-ringbuffer-test-v3-1-550c088ee368@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-06-18 16:42:42 +02:00
Linus Torvalds
ec7714e494 Merge tag 'rust-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
 "Toolchain and infrastructure:

   - KUnit '#[test]'s:

      - Support KUnit-mapped 'assert!' macros.

        The support that landed last cycle was very basic, and the
        'assert!' macros panicked since they were the standard library
        ones. Now, they are mapped to the KUnit ones in a similar way to
        how is done for doctests, reusing the infrastructure there.

        With this, a failing test like:

            #[test]
            fn my_first_test() {
                assert_eq!(42, 43);
            }

        will report:

            # my_first_test: ASSERTION FAILED at rust/kernel/lib.rs:251
            Expected 42 == 43 to be true, but is false
            # my_first_test.speed: normal
            not ok 1 my_first_test

      - Support tests with checked 'Result' return types.

        The return value of test functions that return a 'Result' will
        be checked, thus one can now easily catch errors when e.g. using
        the '?' operator in tests.

        With this, a failing test like:

            #[test]
            fn my_test() -> Result {
                f()?;
                Ok(())
            }

        will report:

            # my_test: ASSERTION FAILED at rust/kernel/lib.rs:321
            Expected is_test_result_ok(my_test()) to be true, but is false
            # my_test.speed: normal
            not ok 1 my_test

      - Add 'kunit_tests' to the prelude.

   - Clarify the remaining language unstable features in use.

   - Compile 'core' with edition 2024 for Rust >= 1.87.

   - Workaround 'bindgen' issue with forward references to 'enum' types.

   - objtool: relax slice condition to cover more 'noreturn' functions.

   - Use absolute paths in macros referencing 'core' and 'kernel'
     crates.

   - Skip '-mno-fdpic' flag for bindgen in GCC 32-bit arm builds.

   - Clean some 'doc_markdown' lint hits -- we may enable it later on.

  'kernel' crate:

   - 'alloc' module:

      - 'Box': support for type coercion, e.g. 'Box<T>' to 'Box<dyn U>'
        if 'T' implements 'U'.

      - 'Vec': implement new methods (prerequisites for nova-core and
        binder): 'truncate', 'resize', 'clear', 'pop',
        'push_within_capacity' (with new error type 'PushError'),
        'drain_all', 'retain', 'remove' (with new error type
        'RemoveError'), insert_within_capacity' (with new error type
        'InsertError').

        In addition, simplify 'push' using 'spare_capacity_mut', split
        'set_len' into 'inc_len' and 'dec_len', add type invariant 'len
        <= capacity' and simplify 'truncate' using 'dec_len'.

   - 'time' module:

      - Morph the Rust hrtimer subsystem into the Rust timekeeping
        subsystem, covering delay, sleep, timekeeping, timers. This new
        subsystem has all the relevant timekeeping C maintainers listed
        in the entry.

      - Replace 'Ktime' with 'Delta' and 'Instant' types to represent a
        duration of time and a point in time.

      - Temporarily add 'Ktime' to 'hrtimer' module to allow 'hrtimer'
        to delay converting to 'Instant' and 'Delta'.

   - 'xarray' module:

      - Add a Rust abstraction for the 'xarray' data structure. This
        abstraction allows Rust code to leverage the 'xarray' to store
        types that implement 'ForeignOwnable'. This support is a
        dependency for memory backing feature of the Rust null block
        driver, which is waiting to be merged.

      - Set up an entry in 'MAINTAINERS' for the XArray Rust support.
        Patches will go to the new Rust XArray tree and then via the
        Rust subsystem tree for now.

      - Allow 'ForeignOwnable' to carry information about the pointed-to
        type. This helps asserting alignment requirements for the
        pointer passed to the foreign language.

   - 'container_of!': retain pointer mut-ness and add a compile-time
     check of the type of the first parameter ('$field_ptr').

   - Support optional message in 'static_assert!'.

   - Add C FFI types (e.g. 'c_int') to the prelude.

   - 'str' module: simplify KUnit tests 'format!' macro, convert
     'rusttest' tests into KUnit, take advantage of the '-> Result'
     support in KUnit '#[test]'s.

   - 'list' module: add examples for 'List', fix path of
     'assert_pinned!' (so far unused macro rule).

   - 'workqueue' module: remove 'HasWork::OFFSET'.

   - 'page' module: add 'inline' attribute.

  'macros' crate:

   - 'module' macro: place 'cleanup_module()' in '.exit.text' section.

  'pin-init' crate:

   - Add 'Wrapper<T>' trait for creating pin-initializers for wrapper
     structs with a structurally pinned value such as 'UnsafeCell<T>' or
     'MaybeUninit<T>'.

   - Add 'MaybeZeroable' derive macro to try to derive 'Zeroable', but
     not error if not all fields implement it. This is needed to derive
     'Zeroable' for all bindgen-generated structs.

   - Add 'unsafe fn cast_[pin_]init()' functions to unsafely change the
     initialized type of an initializer. These are utilized by the
     'Wrapper<T>' implementations.

   - Add support for visibility in 'Zeroable' derive macro.

   - Add support for 'union's in 'Zeroable' derive macro.

   - Upstream dev news: streamline CI, fix some bugs. Add new workflows
     to check if the user-space version and the one in the kernel tree
     have diverged. Use the issues tab [1] to track them, which should
     help folks report and diagnose issues w.r.t. 'pin-init' better.

       [1] https://github.com/rust-for-linux/pin-init/issues

  Documentation:

   - Testing: add docs on the new KUnit '#[test]' tests.

   - Coding guidelines: explain that '///' vs. '//' applies to private
     items too. Add section on C FFI types.

   - Quick Start guide: update Ubuntu instructions and split them into
     "25.04" and "24.04 LTS and older".

  And a few other cleanups and improvements"

* tag 'rust-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (78 commits)
  rust: list: Fix typo `much` in arc.rs
  rust: check type of `$ptr` in `container_of!`
  rust: workqueue: remove HasWork::OFFSET
  rust: retain pointer mut-ness in `container_of!`
  Documentation: rust: testing: add docs on the new KUnit `#[test]` tests
  Documentation: rust: rename `#[test]`s to "`rusttest` host tests"
  rust: str: take advantage of the `-> Result` support in KUnit `#[test]`'s
  rust: str: simplify KUnit tests `format!` macro
  rust: str: convert `rusttest` tests into KUnit
  rust: add `kunit_tests` to the prelude
  rust: kunit: support checked `-> Result`s in KUnit `#[test]`s
  rust: kunit: support KUnit-mapped `assert!` macros in `#[test]`s
  rust: make section names plural
  rust: list: fix path of `assert_pinned!`
  rust: compile libcore with edition 2024 for 1.87+
  rust: dma: add missing Markdown code span
  rust: task: add missing Markdown code spans and intra-doc links
  rust: pci: fix docs related to missing Markdown code spans
  rust: alloc: add missing Markdown code span
  rust: alloc: add missing Markdown code spans
  ...
2025-06-04 21:18:37 -07:00
Linus Torvalds
fd1f847350 Merge tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:

 - "zram: support algorithm-specific parameters" from Sergey Senozhatsky
   adds infrastructure for passing algorithm-specific parameters into
   zram. A single parameter `winbits' is implemented at this time.

 - "memcg: nmi-safe kmem charging" from Shakeel Butt makes memcg
   charging nmi-safe, which is required by BFP, which can operate in NMI
   context.

 - "Some random fixes and cleanup to shmem" from Kemeng Shi implements
   small fixes and cleanups in the shmem code.

 - "Skip mm selftests instead when kernel features are not present" from
   Zi Yan fixes some issues in the MM selftest code.

 - "mm/damon: build-enable essential DAMON components by default" from
   SeongJae Park reworks DAMON Kconfig to make it easier to enable
   CONFIG_DAMON.

 - "sched/numa: add statistics of numa balance task migration" from Libo
   Chen adds more info into sysfs and procfs files to improve visibility
   into the NUMA balancer's task migration activity.

 - "selftests/mm: cow and gup_longterm cleanups" from Mark Brown
   provides various updates to some of the MM selftests to make them
   play better with the overall containing framework.

* tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (43 commits)
  mm/khugepaged: clean up refcount check using folio_expected_ref_count()
  selftests/mm: fix test result reporting in gup_longterm
  selftests/mm: report unique test names for each cow test
  selftests/mm: add helper for logging test start and results
  selftests/mm: use standard ksft_finished() in cow and gup_longterm
  selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled
  sched/numa: add statistics of numa balance task
  sched/numa: fix task swap by skipping kernel threads
  tools/testing: check correct variable in open_procmap()
  tools/testing/vma: add missing function stub
  mm/gup: update comment explaining why gup_fast() disables IRQs
  selftests/mm: two fixes for the pfnmap test
  mm/khugepaged: fix race with folio split/free using temporary reference
  mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order
  mmu_notifiers: remove leftover stub macros
  selftests/mm: deduplicate test names in madv_populate
  kcov: rust: add flags for KCOV with Rust
  mm: rust: make CONFIG_MMU ifdefs more narrow
  mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
  mm/damon/Kconfig: enable CONFIG_DAMON by default
  ...
2025-06-02 16:00:26 -07:00
Shakeel Butt
940b01fc8d memcg: nmi safe memcg stats for specific archs
There are archs which have NMI but does not support this_cpu_* ops safely
in the nmi context but they support safe atomic ops in nmi context.  For
such archs, let's add infra to use atomic ops for the memcg stats which
can be updated in nmi.

At the moment, the memcg stats which get updated in the objcg charging
path are MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B. 
Rather than adding support for all memcg stats to be nmi safe, let's just
add infra to make these three stats nmi safe which this patch is doing.

Link: https://lkml.kernel.org/r/20250519063142.111219-3-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:08 -07:00
Shakeel Butt
25352d2f2d memcg: disable kmem charging in nmi for unsupported arch
Patch series "memcg: nmi-safe kmem charging", v4.

Users can attached their BPF programs at arbitrary execution points in the
kernel and such BPF programs may run in nmi context.  In addition, these
programs can trigger memcg charged kernel allocations in the nmi context. 
However memcg charging infra for kernel memory is not equipped to handle
nmi context for all architectures.

This series removes the hurdles to enable kmem charging in the nmi context
for most of the archs.  For archs without CONFIG_HAVE_NMI, this series is
a noop.  For archs with NMI support and have
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS, the previous work to make memcg
stats re-entrant is sufficient for allowing kmem charging in nmi context. 
For archs with NMI support but without
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS and with ARCH_HAVE_NMI_SAFE_CMPXCHG,
this series added infra to support kmem charging in nmi context.  Lastly
those archs with NMI support but without
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS and ARCH_HAVE_NMI_SAFE_CMPXCHG, kmem
charging in nmi context is not supported at all.

Mostly used archs have support for CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
and this series should be almost a noop (other than making
memcg_rstat_updated nmi safe) for such archs.  


This patch (of 5):

The memcg accounting and stats uses this_cpu* and atomic* ops.  There are
archs which define CONFIG_HAVE_NMI but does not define
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS and ARCH_HAVE_NMI_SAFE_CMPXCHG, so
memcg accounting for such archs in nmi context is not possible to support.
Let's just disable memcg accounting in nmi context for such archs.

Link: https://lkml.kernel.org/r/20250519063142.111219-1-shakeel.butt@linux.dev
Link: https://lkml.kernel.org/r/20250519063142.111219-2-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:08 -07:00
Linus Torvalds
48cfc5791d Merge tag 'hardening-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:

 - Update overflow helpers to ease refactoring of on-stack flex array
   instances (Gustavo A. R. Silva, Kees Cook)

 - lkdtm: Use SLAB_NO_MERGE instead of constructors (Harry Yoo)

 - Simplify CONFIG_CC_HAS_COUNTED_BY (Jan Hendrik Farr)

 - Disable u64 usercopy KUnit test on 32-bit SPARC (Thomas Weißschuh)

 - Add missed designated initializers now exposed by fixed randstruct
   (Nathan Chancellor, Kees Cook)

 - Document compilers versions for __builtin_dynamic_object_size

 - Remove ARM_SSP_PER_TASK GCC plugin

 - Fix GCC plugin randstruct, add selftests, and restore COMPILE_TEST
   builds

 - Kbuild: induce full rebuilds when dependencies change with GCC
   plugins, the Clang sanitizer .scl file, or the randstruct seed.

 - Kbuild: Switch from -Wvla to -Wvla-larger-than=1

 - Correct several __nonstring uses for -Wunterminated-string-initialization

* tag 'hardening-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
  Revert "hardening: Disable GCC randstruct for COMPILE_TEST"
  lib/tests: randstruct: Add deep function pointer layout test
  lib/tests: Add randstruct KUnit test
  randstruct: gcc-plugin: Remove bogus void member
  net: qede: Initialize qede_ll_ops with designated initializer
  scsi: qedf: Use designated initializer for struct qed_fcoe_cb_ops
  md/bcache: Mark __nonstring look-up table
  integer-wrap: Force full rebuild when .scl file changes
  randstruct: Force full rebuild when seed changes
  gcc-plugins: Force full rebuild when plugins change
  kbuild: Switch from -Wvla to -Wvla-larger-than=1
  hardening: simplify CONFIG_CC_HAS_COUNTED_BY
  overflow: Fix direct struct member initialization in _DEFINE_FLEX()
  kunit/overflow: Add tests for STACK_FLEX_ARRAY_SIZE() helper
  overflow: Add STACK_FLEX_ARRAY_SIZE() helper
  input/joystick: magellan: Mark __nonstring look-up table const
  watchdog: exar: Shorten identity name to fit correctly
  mod_devicetable: Enlarge the maximum platform_device_id name length
  overflow: Clarify expectations for getting DEFINE_FLEX variable sizes
  compiler_types: Identify compiler versions for __builtin_dynamic_object_size
  ...
2025-05-28 07:47:10 -07:00