Commit Graph

1345093 Commits

Author SHA1 Message Date
Caleb Sander Mateos
6a87fc437a ublk: get ubq from pdu in ublk_cmd_list_tw_cb()
Save a few pointer dereferences by obtaining struct ublk_queue *ubq from
the ublk_uring_cmd_pdu instead of the request's mq_hctx.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250328180411.2696494-4-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:43 -06:00
Caleb Sander Mateos
9d7fa99189 ublk: skip 1 NULL check in ublk_cmd_list_tw_cb() loop
ublk_cmd_list_tw_cb() is always performed on a non-empty request list.
So don't check whether rq is NULL on the first iteration of the loop,
just on subsequent iterations.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250328180411.2696494-3-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:43 -06:00
Caleb Sander Mateos
dfbce8b798 ublk: remove unused cmd argument to ublk_dispatch_req()
ublk_dispatch_req() never uses its struct io_uring_cmd *cmd argument.
Drop it so callers don't have to pass a value.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250328180411.2696494-2-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:43 -06:00
Ming Lei
c78ae7b71e selftests: ublk: add test for checking zero copy related parameter
ublk zero copy usually requires to set dma and segment parameter correctly,
so hard-code null target's dma & segment parameter in non-default value,
and verify if they are setup correctly by ublk driver.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-12-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:43 -06:00
Ming Lei
8c77861436 selftests: ublk: add more tests for covering MQ
Add test test_generic_02.sh for covering IO dispatch order in case of MQ.

Especially we just support ->queue_rqs() which may affect IO dispatch
order.

Add test_loop_05.sh and test_stripe_03.sh for covering MQ.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-11-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:43 -06:00
Ming Lei
daabfb50a5 ublk: rename ublk_rq_task_work_cb as ublk_cmd_tw_cb
The new name is aligned with ublk_cmd_list_tw_cb(), and looks
more readable.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-10-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:43 -06:00
Ming Lei
d796cea7b9 ublk: implement ->queue_rqs()
Implement ->queue_rqs() for improving perf in case of MQ.

In this way, we just need to call io_uring_cmd_complete_in_task() once for
whole IO batch, then both io_uring and ublk server can get exact batch from
ublk frontend.

Follows IOPS improvement:

- tests

	tools/testing/selftests/ublk/kublk add -t null -q 2 [-z]

	fio/t/io_uring -p0 /dev/ublkb0

- results:

	more than 10% IOPS boost observed

Pass all ublk selftests, especially the io dispatch order test.

Cc: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:43 -06:00
Ming Lei
1797020916 ublk: document zero copy feature
Add words to explain how zero copy feature works, and why it has to be
trusted for handling IO read command.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:43 -06:00
Ming Lei
ebf695f129 ublk: add segment parameter
IO split is usually bad in io_uring world, since -EAGAIN is caused and
IO handling may have to fallback to io-wq, this way does hurt performance.

ublk starts to support zero copy recently, for avoiding unnecessary IO
split, ublk driver's segment limit should be aligned with backend
device's segment limit.

Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
which number is aligned with ublk request segment number, so that big
memory allocation can be avoided by setting reasonable max_segments limit.

So add segment parameter for providing ublk server chance to align
segment limit with backend, and keep it reasonable from implementation
viewpoint.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:43 -06:00
Ming Lei
b460f328e2 ublk: call io_uring_cmd_to_pdu to get uring_cmd pdu
Call io_uring_cmd_to_pdu() to get uring_cmd pdu, and one big benefit
is the automatic pdu size build check.

Suggested-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250327095123.179113-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:42 -06:00
Ming Lei
1d781c0de0 ublk: add helper of ublk_need_map_io()
ublk_need_map_io() is more readable.

Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:42 -06:00
Ming Lei
705b80841e ublk: remove two unused fields from 'struct ublk_queue'
Remove two unused fields(`io_addr` & `max_io_sz`) from `struct ublk_queue`.

Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:42 -06:00
Ming Lei
7e2fe01a69 ublk: comment on ubq->canceling handling in ublk_queue_rq()
In ublk_queue_rq(), ubq->canceling has to be handled after ->fail_io and
->force_abort are dealt with, otherwise the request may not be failed
when deleting disk.

Add comment on this usage.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:42 -06:00
Ming Lei
8741d07379 ublk: make sure ubq->canceling is set when queue is frozen
Now ublk driver depends on `ubq->canceling` for deciding if the request
can be dispatched via uring_cmd & io_uring_cmd_complete_in_task().

Once ubq->canceling is set, the uring_cmd can be done via ublk_cancel_cmd()
and io_uring_cmd_done().

So set ubq->canceling when queue is frozen, this way makes sure that the
flag can be observed from ublk_queue_rq() reliably, and avoids
use-after-free on uring_cmd.

Fixes: 216c8f5ef0 ("ublk: replace monitor with cancelable uring_cmd")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:42 -06:00
Pavel Begunkov
04491732fc io_uring/net: account memory for zc sendmsg
Account pinned pages for IORING_OP_SENDMSG_ZC, just as we for
IORING_OP_SEND_ZC and net/ does for MSG_ZEROCOPY.

Fixes: 493108d95f ("io_uring/net: zerocopy sendmsg")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4f00f67ca6ac8e8ed62343ae92b5816b1e0c9c4b.1743086313.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:42 -06:00
Linus Torvalds
eff5f16bfd Merge tag 'for-6.15/io_uring-reg-vec-20250327' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe:
 "Final separate updates for io_uring.

  This started out as a series of cleanups improvements and improvements
  for registered buffers, but as the last series of the io_uring changes
  for 6.15, it also collected a few fixes for the other branches on top:

   - Add support for vectored fixed/registered buffers.

     Previously only single segments have been supported for commands,
     now vectored variants are supported as well. This series includes
     networking and file read/write support.

   - Small series unifying return codes across multi and single shot.

   - Small series cleaning up registerd buffer importing.

   - Adding support for vectored registered buffers for uring_cmd.

   - Fix for io-wq handling of command reissue.

   - Various little fixes and tweaks"

* tag 'for-6.15/io_uring-reg-vec-20250327' of git://git.kernel.dk/linux: (25 commits)
  io_uring/net: fix io_req_post_cqe abuse by send bundle
  io_uring/net: use REQ_F_IMPORT_BUFFER for send_zc
  io_uring: move min_events sanitisation
  io_uring: rename "min" arg in io_iopoll_check()
  io_uring: open code __io_post_aux_cqe()
  io_uring: defer iowq cqe overflow via task_work
  io_uring: fix retry handling off iowq
  io_uring/net: only import send_zc buffer once
  io_uring/cmd: introduce io_uring_cmd_import_fixed_vec
  io_uring/cmd: add iovec cache for commands
  io_uring/cmd: don't expose entire cmd async data
  io_uring: rename the data cmd cache
  io_uring: rely on io_prep_reg_vec for iovec placement
  io_uring: introduce io_prep_reg_iovec()
  io_uring: unify STOP_MULTISHOT with IOU_OK
  io_uring: return -EAGAIN to continue multishot
  io_uring: cap cached iovec/bvec size
  io_uring/net: implement vectored reg bufs for zctx
  io_uring/net: convert to struct iou_vec
  io_uring/net: pull vec alloc out of msghdr import
  ...
2025-03-28 15:07:04 -07:00
Linus Torvalds
6df9d086ff Merge tag 'for-6.15/io_uring-epoll-wait-20250325' of git://git.kernel.dk/linux
Pull io_uring epoll support from Jens Axboe:
 "This adds support for reading epoll events via io_uring.

  While this may seem counter-intuitive (and/or productive), the
  reasoning here is that quite a few existing epoll event loops can
  easily do a partial conversion to a completion based model, but are
  still stuck with one (or few) event types that remain readiness based.

  For that case, they then need to add the io_uring fd to the epoll
  context, and continue to rely on epoll_wait(2) for waiting on events.
  This misses out on the finer grained waiting that io_uring can do, to
  reduce context switches and wait for multiple events in one batch
  reliably.

  With adding support for reaping epoll events via io_uring, the whole
  legacy readiness based event types can still be reaped via epoll, with
  the overall waiting in the loop be driven by io_uring"

* tag 'for-6.15/io_uring-epoll-wait-20250325' of git://git.kernel.dk/linux:
  io_uring/epoll: add support for IORING_OP_EPOLL_WAIT
  io_uring/epoll: remove CONFIG_EPOLL guards
2025-03-28 14:55:32 -07:00
Linus Torvalds
ca0b04ba0b Merge tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux
Pull io_uring zero-copy receive support from Jens Axboe:
 "This adds support for zero-copy receive with io_uring, enabling fast
  bulk receive of data directly into application memory, rather than
  needing to copy the data out of kernel memory.

  While this version only supports host memory as that was the initial
  target, other memory types are planned as well, with notably GPU
  memory coming next.

  This work depends on some networking components which were queued up
  on the networking side, but have now landed in your tree.

  This is the work of Pavel Begunkov and David Wei. From the v14 posting:

    'We configure a page pool that a driver uses to fill a hw rx queue
     to hand out user pages instead of kernel pages. Any data that ends
     up hitting this hw rx queue will thus be dma'd into userspace
     memory directly, without needing to be bounced through kernel
     memory. 'Reading' data out of a socket instead becomes a
     _notification_ mechanism, where the kernel tells userspace where
     the data is. The overall approach is similar to the devmem TCP
     proposal

     This relies on hw header/data split, flow steering and RSS to
     ensure packet headers remain in kernel memory and only desired
     flows hit a hw rx queue configured for zero copy. Configuring this
     is outside of the scope of this patchset.

     We share netdev core infra with devmem TCP. The main difference is
     that io_uring is used for the uAPI and the lifetime of all objects
     are bound to an io_uring instance. Data is 'read' using a new
     io_uring request type. When done, data is returned via a new shared
     refill queue. A zero copy page pool refills a hw rx queue from this
     refill queue directly. Of course, the lifetime of these data
     buffers are managed by io_uring rather than the networking stack,
     with different refcounting rules.

     This patchset is the first step adding basic zero copy support. We
     will extend this iteratively with new features e.g. dynamically
     allocated zero copy areas, THP support, dmabuf support, improved
     copy fallback, general optimisations and more'

  In a local setup, I was able to saturate a 200G link with a single CPU
  core, and at netdev conf 0x19 earlier this month, Jamal reported
  188Gbit of bandwidth using a single core (no HT, including soft-irq).

  Safe to say the efficiency is there, as bigger links would be needed
  to find the per-core limit, and it's considerably more efficient and
  faster than the existing devmem solution"

* tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux:
  io_uring/zcrx: add selftest case for recvzc with read limit
  io_uring/zcrx: add a read limit to recvzc requests
  io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap
  io_uring: Rename KConfig to Kconfig
  io_uring/zcrx: fix leaks on failed registration
  io_uring/zcrx: recheck ifq on shutdown
  io_uring/zcrx: add selftest
  net: add documentation for io_uring zcrx
  io_uring/zcrx: add copy fallback
  io_uring/zcrx: throttle receive requests
  io_uring/zcrx: set pp memory provider for an rx queue
  io_uring/zcrx: add io_recvzc request
  io_uring/zcrx: dma-map area for the device
  io_uring/zcrx: implement zerocopy receive pp memory provider
  io_uring/zcrx: grab a net device
  io_uring/zcrx: add io_zcrx_area
  io_uring/zcrx: add interface queue and refill queue
2025-03-28 13:45:52 -07:00
Linus Torvalds
15cb9a2b66 Merge tag 'tpmdd-next-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
 "This contains a new driver: a TPM FF-A driver.

  FF comes from Firmware Framework, and A comes from Arm's A-profile.
  FF-A is essentially a standard mechanism to communicate with TrustZone
  apps such as TPM.

  Other than that, this includes a pile of fixes and small improvments"

* tag 'tpmdd-next-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: Make chip->{status,cancel,req_canceled} opt
  MAINTAINERS: TPM DEVICE DRIVER: add missing includes
  tpm: End any active auth session before shutdown
  Documentation: tpm: Add documentation for the CRB FF-A interface
  tpm_crb: Add support for the ARM FF-A start method
  ACPICA: Add start method for ARM FF-A
  tpm_crb: Clean-up and refactor check for idle support
  tpm_crb: ffa_tpm: Implement driver compliant to CRB over FF-A
  tpm/tpm_ftpm_tee: fix struct ftpm_tee_private documentation
  tpm, tpm_tis: Workaround failed command reception on Infineon devices
  tpm, tpm_tis: Fix timeout handling when waiting for TPM status
  tpm: Convert warn to dbg in tpm2_start_auth_session()
  tpm: Lazily flush auth session when getting random data
  tpm: ftpm_tee: remove incorrect of_match_ptr annotation
  tpm: do not start chip while suspended
2025-03-28 12:42:53 -07:00
Linus Torvalds
f8a4eba343 Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC fixes from Eric Biggers:
 "Fix out-of-scope array bugs in arm and arm64's crc_t10dif_arch()"

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  arm64/crc-t10dif: fix use of out-of-scope array in crc_t10dif_arch()
  arm/crc-t10dif: fix use of out-of-scope array in crc_t10dif_arch()
2025-03-28 12:41:36 -07:00
Linus Torvalds
7288511606 Merge tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
 "This brings two main changes to Landlock:

   - A signal scoping fix with a new interface for user space to know if
     it is compatible with the running kernel.

   - Audit support to give visibility on why access requests are denied,
     including the origin of the security policy, missing access rights,
     and description of object(s). This was designed to limit log spam
     as much as possible while still alerting about unexpected blocked
     access.

  With these changes come new and improved documentation, and a lot of
  new tests"

* tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (36 commits)
  landlock: Add audit documentation
  selftests/landlock: Add audit tests for network
  selftests/landlock: Add audit tests for filesystem
  selftests/landlock: Add audit tests for abstract UNIX socket scoping
  selftests/landlock: Add audit tests for ptrace
  selftests/landlock: Test audit with restrict flags
  selftests/landlock: Add tests for audit flags and domain IDs
  selftests/landlock: Extend tests for landlock_restrict_self(2)'s flags
  selftests/landlock: Add test for invalid ruleset file descriptor
  samples/landlock: Enable users to log sandbox denials
  landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF
  landlock: Add LANDLOCK_RESTRICT_SELF_LOG_*_EXEC_* flags
  landlock: Log scoped denials
  landlock: Log TCP bind and connect denials
  landlock: Log truncate and IOCTL denials
  landlock: Factor out IOCTL hooks
  landlock: Log file-related denials
  landlock: Log mount-related denials
  landlock: Add AUDIT_LANDLOCK_DOMAIN and log domain status
  landlock: Add AUDIT_LANDLOCK_ACCESS and log ptrace denials
  ...
2025-03-28 12:37:13 -07:00
Linus Torvalds
78fb88eca6 Merge tag 'caps-pr-20250327' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux
Pull capabilities update from Serge Hallyn:
 "This contains just one patch that removes a helper function whose last
  user (smack) stopped using it in 2018"

* tag 'caps-pr-20250327' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux:
  capability: Remove unused has_capability
2025-03-28 12:09:33 -07:00
Linus Torvalds
a2d4f473df Merge tag 'integrity-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull ima updates from Mimi Zohar:
 "Two performance improvements, which minimize the number of integrity
  violations"

* tag 'integrity-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: limit the number of ToMToU integrity violations
  ima: limit the number of open-writers integrity violations
2025-03-28 12:06:58 -07:00
Linus Torvalds
f174ac5ba2 Merge tag 'ipe-pr-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull ipe update from Fan Wu:
 "This contains just one commit from Randy Dunlap, which fixes
  kernel-doc warnings in the IPE subsystem"

* tag 'ipe-pr-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
  ipe: policy_fs: fix kernel-doc warnings
2025-03-28 12:00:40 -07:00
Linus Torvalds
112e43e9fd Revert "Merge tag 'irq-msi-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"
This reverts commit 36f5f026df, reversing
changes made to 43a7eec035.

Thomas says:
 "I just noticed that for some incomprehensible reason, probably sheer
  incompetemce when trying to utilize b4, I managed to merge an outdated
  _and_ buggy version of that series.

  Can you please revert that merge completely?"

Done.

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-28 11:22:54 -07:00
Linus Torvalds
acb4f33713 Merge tag 'm68knommu-for-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:

 - remove unused include of linux/fb.h

 - use strscpy() instead of strncpy()

* tag 'm68knommu-for-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: mm: Replace deprecated strncpy() with strscpy()
  m68k: Do not include <linux/fb.h>
2025-03-27 20:20:15 -07:00
Linus Torvalds
7b667acd69 Merge tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Madhavan Srinivasan:

 - Remove support for IBM Cell Blades

 - SMP support for microwatt platform

 - Support for inline static calls on PPC32

 - Enable pmu selftests for power11 platform

 - Enable hardware trace macro (HTM) hcall support

 - Support for limited address mode capability

 - Changes to RMA size from 512 MB to 768 MB to handle fadump

 - Misc fixes and cleanups

Thanks to Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann,
Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom,
Gaurav Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook,
Mahesh Salgaonkar, Michael Ellerman, Paul Mackerras, Ritesh Harjani
(IBM), Sathvika Vasireddy, Segher Boessenkool, Sourabh Jain, Vaibhav
Jain, and Venkat Rao Bagalkote.

* tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (61 commits)
  powerpc/kexec: fix physical address calculation in clear_utlb_entry()
  crypto: powerpc: Mark ghashp8-ppc.o as an OBJECT_FILES_NON_STANDARD
  powerpc: Fix 'intra_function_call not a direct call' warning
  powerpc/perf: Fix ref-counting on the PMU 'vpa_pmu'
  KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests
  powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7
  powerpc/microwatt: Add SMP support
  powerpc: Define config option for processors with broadcast TLBIE
  powerpc/microwatt: Define an idle power-save function
  powerpc/microwatt: Device-tree updates
  powerpc/microwatt: Select COMMON_CLK in order to get the clock framework
  net: toshiba: Remove reference to PPC_IBM_CELL_BLADE
  net: spider_net: Remove powerpc Cell driver
  cpufreq: ppc_cbe: Remove powerpc Cell driver
  genirq: Remove IRQ_EDGE_EOI_HANDLER
  docs: Remove reference to removed CBE_CPUFREQ_SPU_GOVERNOR
  powerpc: Remove UDBG_RTAS_CONSOLE
  powerpc/io: Use standard barrier macros in io.c
  powerpc/io: Rename _insw_ns() etc.
  powerpc/io: Use generic raw accessors
  ...
2025-03-27 19:39:08 -07:00
Linus Torvalds
a7e135fe59 Merge tag 'probes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:

 - probe-events: Add comments about entry data storing code to clarify
   where and how the entry data is stored for function return events.

 - probe-events: Log error for exceeding the number of arguments to help
   user to identify error reason via tracefs/error_log file.

 - Improve the ftracetest selftests:
    - Expand the tprobe event test to check if it can correctly find the
      wrong format tracepoint name.
    - Add new syntax error test to check whether error_log correctly
      indicates a wrong character in the tracepoint name.
    - Add a new dynamic events argument limitation test case which
      checks max number of probe arguments.

* tag 'probes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: probe-events: Add comments about entry data storing code
  selftests/ftrace: Add dynamic events argument limitation test case
  selftests/ftrace: Add new syntax error test
  selftests/ftrace: Expand the tprobe event test to check wrong format
  tracing: probe-events: Log error for exceeding the number of arguments
2025-03-27 19:31:34 -07:00
Linus Torvalds
dcf9f31c62 Merge tag 'livepatching-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:

 - Add a selftest for tracing of a livepatched function

 - Skip a selftest when kprobes are not using ftrace

 - Some documentation clean up

* tag 'livepatching-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  selftests: livepatch: test if ftrace can trace a livepatched function
  selftests: livepatch: add new ftrace helpers functions
  selftest/livepatch: Only run test-kprobe with CONFIG_KPROBES_ON_FTRACE
  docs: livepatch: move text out of code block
  livepatch: Add comment to clarify klp_add_nops()
2025-03-27 19:26:10 -07:00
Linus Torvalds
96050814a3 Merge tag 'printk-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:

 - New option "printk.debug_non_panic_cpus" allows to store printk
   messages from non-panic CPUs during panic. It might be useful when
   panic() fails. It is disabled by default because it increases the
   chance to see the messages printed before panic() and on the
   panic-CPU.

 - New build option "CONFIG_NULL_TTY_DEFAULT_CONSOLE" allows to build
   kernel without the virtual terminal support which prefers ttynull
   over serial console.

 - Do not unblank suspended consoles.

 - Some code clean up.

* tag 'printk-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk/panic: Add option to allow non-panic CPUs to write to the ring buffer.
  printk: Add an option to allow ttynull to be a default console device
  printk: Check CON_SUSPEND when unblanking a console
  printk: Rename console_start to console_resume
  printk: Rename console_stop to console_suspend
  printk: Rename resume_console to console_resume_all
  printk: Rename suspend_console to console_suspend_all
2025-03-27 19:22:24 -07:00
Linus Torvalds
a10c7949ad Merge tag 'linux_kselftest-kunit-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
 "kunit tool:
   - Changes to kunit tool to use qboot on QEMU x86_64, and build GDB
     scripts
   - Fixes kunit tool bug in parsing test plan
   - Adds test to kunit tool to check parsing late test plan

  kunit:
   - Clarifies kunit_skip() argument name
   - Adds Kunit check for the longest symbol length
   - Changes qemu_configs for sparc to use Zilog console"

* tag 'linux_kselftest-kunit-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: tool: add test to check parsing late test plan
  kunit: tool: Fix bug in parsing test plan
  Kunit to check the longest symbol length
  kunit: Clarify kunit_skip() argument name
  kunit: tool: Build GDB scripts
  kunit: qemu_configs: sparc: use Zilog console
  kunit: tool: Use qboot on QEMU x86_64
2025-03-27 19:06:07 -07:00
Linus Torvalds
8e324a5c98 Merge tag 'linux_kselftest-next-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest updates from Shuah Khan:

 - Fix bugs and clean up code in tracing, ftrace, and user_events tests

 - Add missing executables to ftrace gitignore

* tag 'linux_kselftest-next-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ftrace: add 'poll' binary to gitignore
  selftests/ftrace: Use readelf to find entry point in uprobe test
  selftests/user_events: Fix failures caused by test code
  selftests/tracing: Allow some more tests to run in instances
  selftests/ftrace: Clean up triggers after setting them
  selftests/tracing: Test only toplevel README file not the instances
2025-03-27 18:57:58 -07:00
Linus Torvalds
68f090f09b Merge tag 'ktest-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest update from Steven Rostedt:

 - Fix failure of directory of log file not existing

   If a LOG_FILE option is set for ktest to log its messages, and the
   directory path does not exist. Then ktest fails. Have ktest attempt
   to create the directory where the log file exists and if that
   succeeds continue on testing.

* tag 'ktest-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
  ktest: Fix Test Failures Due to Missing LOG_FILE Directories
2025-03-27 18:22:46 -07:00
Linus Torvalds
4fa118e5b7 Merge tag 'trace-tools-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing tooling updates from Steven Rostedt:

 - Allow RTLA to collect data via BPF

   The current implementation of rtla uses libtracefs and libtraceevent
   to pull sample events generated by the timerlat tracer from the trace
   buffer. rtla then processes the sample by updating the histogram and
   summary (current, maximum, minimum, and sum values) as well as checks
   if tracing has been stopped due to threshold overflow.

   In use cases where a large number of samples is being generated, that
   is, with measurements running on many CPUs and with a low interval,
   this sample processing design causes a significant CPU load on the
   rtla side. Furthermore, with >100 CPUs and 100us interval, rtla was
   reported as not being able to keep up with the samples and dropping
   most of them, leading to it being unusable.

   Change the way the timerlat trace processes samples by attaching a
   BPF program to the trace event using the BPF skeleton feature of
   bpftool. Unlike the current implementation, the BPF implementation
   does not check whether tracing is stopped (in BPF mode, tracing is
   always off to improve performance), but waits for a write to a BPF
   ringbuffer instead. This allows rtla to exit immediately when a
   threshold is violated, without waiting for the next iteration of the
   while loop.

   If the requirements for the BPF implementation are not met, either at
   build time or at run time, the current implementation is used as
   fallback. Which implementation is being used can be seen when running
   rtla timerlat with "-D" option. rtla can be forced to run in non-BPF
   mode by setting the RTLA_NO_BPF option to 1, for debugging purposes.

 - Fix LD_FLAGS from being dropped in build

 - Refactor code to remove duplication of save_trace_to_file

 - Always set options and do not rely on default settings

   Do not rely on the default kernel settings of the tracers when
   starting. They could have been changed by the user which gives
   inconsistent results. Always set the options that rtla expects.

 - Add creation of ctags and TAGS for traversing code

* tag 'trace-tools-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rtla: Add the ability to create ctags and etags
  rtla/tests: Test setting default options
  rtla/tests: Reset osnoise options before check
  rtla: Always set all tracer options
  rtla/osnoise: Set OSNOISE_WORKLOAD to true
  rtla: Unify apply_config between top and hist
  rtla/osnoise: Unify params struct
  rtla: Fix segfault in save_trace_to_file call
  tools/build: Use SYSTEM_BPFTOOL for system bpftool
  rtla: Refactor save_trace_to_file
  tools/rv: Keep user LDFLAGS in build
  rtla/timerlat: Test BPF mode
  rtla/timerlat_top: Use BPF to collect samples
  rtla/timerlat_top: Move divisor to update
  rtla/timerlat_hist: Use BPF to collect samples
  rtla/timerlat: Add BPF skeleton to collect samples
  rtla: Add optional dependency on BPF tooling
  tools/build: Add bpftool-skeletons feature test
  rtla/timerlat: Unify params struct
2025-03-27 17:03:01 -07:00
Linus Torvalds
744fab2d9f Merge tag 'trace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:

 - Add option traceoff_after_boot

   In order to debug kernel boot, it sometimes is helpful to enable
   tracing via the kernel command line. Unfortunately, by the time the
   login prompt appears, the trace is overwritten by the init process
   and other user space start up applications.

   Adding a "traceoff_after_boot" will disable tracing when the kernel
   passes control to init which will allow developers to be able to see
   the traces that occurred during boot.

 - Clean up the mmflags macros that display the GFP flags in trace
   events

   The macros to print the GFP flags for trace events had a bit of
   duplication. The code was restructured to remove duplication and in
   the process it also adds some flags that were missed before.

 - Removed some dead code and scripts/draw_functrace.py

   draw_functrace.py hasn't worked in years and as nobody complained
   about it, remove it.

 - Constify struct event_trigger_ops

   The event_trigger_ops is just a structure that has function pointers
   that are assigned when the variables are created. These variables
   should all be constants.

 - Other minor clean ups and fixes

* tag 'trace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Replace strncpy with memcpy for fixed-length substring copy
  tracing: Fix synth event printk format for str fields
  tracing: Do not use PERF enums when perf is not defined
  tracing: Ensure module defining synth event cannot be unloaded while tracing
  tracing: fix return value in __ftrace_event_enable_disable for TRACE_REG_UNREGISTER
  tracing/osnoise: Fix possible recursive locking for cpus_read_lock()
  tracing: Align synth event print fmt
  tracing: gfp: vsprintf: Do not print "none" when using %pGg printf format
  tracepoint: Print the function symbol when tracepoint_debug is set
  tracing: Constify struct event_trigger_ops
  scripts/tracing: Remove scripts/tracing/draw_functrace.py
  tracing: Update MAINTAINERS file to include tracepoint.c
  tracing/user_events: Slightly simplify user_seq_show()
  tracing/user_events: Don't use %pK through printk
  tracing: gfp: Remove duplication of recording GFP flags
  tracing: Remove orphaned event_trace_printk
  ring-buffer: Fix typo in comment about header page pointer
  tracing: Add traceoff_after_boot option
2025-03-27 16:22:12 -07:00
Linus Torvalds
88221ac0d5 Merge tag 'trace-latency-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull latency tracing updates from Steven Rostedt:

 - Add some trace events to osnoise and timerlat sample generation

   This adds more information to the osnoise and timerlat tracers as
   well as allows BPF programs to be attached to these locations to
   extract even more data.

 - Fix to DECLARE_TRACE_CONDITION() macro

   It wasn't used but now will be and it happened to be broken causing
   the build to fail.

 - Add scheduler specification monitors to runtime verifier (RV)

   This is a continuation of Daniel Bristot's work.

   RV allows monitors to run and react concurrently. Running the
   cumulative model is equivalent to running single components using the
   same reactors, with the advantage that it's easier to point out which
   specification failed in case of error.

   This update introduces nested monitors to RV, in short, the sysfs
   monitor folder will contain a monitor named sched, which is nothing
   but an empty container for other monitors. Controlling the sched
   monitor (enable, disable, set reactors) controls all nested monitors.

   The following scheduling monitors are added:

     - sco: scheduling context operations
       Monitor to ensure sched_set_state happens only in thread context

     - tss: task switch while scheduling
       Monitor to ensure sched_switch happens only in scheduling context

     - snroc: set non runnable on its own context
       Monitor to ensure set_state happens only in the respective task's context

     - scpd: schedule called with preemption disabled
       Monitor to ensure schedule is called with preemption disabled

     - snep: schedule does not enable preempt
       Monitor to ensure schedule does not enable preempt

     - sncid: schedule not called with interrupt disabled
       Monitor to ensure schedule is not called with interrupt disabled

* tag 'trace-latency-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tools/rv: Allow rv list to filter for container
  Documentation/rv: Add docs for the sched monitors
  verification/dot2k: Add support for nested monitors
  tools/rv: Add support for nested monitors
  rv: Add scpd, snep and sncid per-cpu monitors
  rv: Add snroc per-task monitor
  rv: Add sco and tss per-cpu monitors
  rv: Add option for nested monitors and include sched
  sched: Add sched tracepoints for RV task model
  rv: Add license identifiers to monitor files
  tracing: Fix DECLARE_TRACE_CONDITION
  trace/osnoise: Add trace events for samples
2025-03-27 16:03:52 -07:00
Linus Torvalds
31eb415bf6 Merge tag 'ftrace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:

 - Record function parameters for function and function graph tracers

   An option has been added to function tracer (func-args) and the
   function graph tracer (funcgraph-args) that when set, the tracers
   will record the registers that hold the arguments into each function
   event. On reading of the trace, it will use BTF to print those
   arguments. Most archs support up to 6 arguments (depending on the
   complexity of the arguments) and those are printed.

   If a function has more arguments then what was recorded, the output
   will end with " ... )".

   Example of function graph tracer:

	6)              | dummy_xmit [dummy](skb = 0x8887c100, dev = 0x872ca000) {
	6)              |   consume_skb(skb = 0x8887c100) {
	6)              |     skb_release_head_state(skb = 0x8887c100) {
	6)  0.178 us    |       sock_wfree(skb = 0x8887c100)
	6)  0.627 us    |     }

 - The rest of the changes are minor clean ups and fixes

* tag 'ftrace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Use hashtable.h for event_hash
  tracing: Fix use-after-free in print_graph_function_flags during tracer switching
  function_graph: Remove the unused variable func
  ftrace: Add arguments to function tracer
  ftrace: Have funcgraph-args take affect during tracing
  ftrace: Add support for function argument to graph tracer
  ftrace: Add print_function_args()
  ftrace: Have ftrace_free_filter() WARN and exit if ops is active
  fgraph: Correct typo in ftrace_return_to_handler comment
2025-03-27 15:57:29 -07:00
Linus Torvalds
dd161f74f8 Merge tag 'trace-sorttable-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing / sorttable updates from Steven Rostedt:

 - Implement arm64 build time sorting of the mcount location table

   When gcc is used to build arm64, the mcount_loc section is all zeros
   in the vmlinux elf file. The addresses are stored in the Elf_Rela
   location.

   To sort at build time, an array is allocated and the addresses are
   added to it via the content of the mcount_loc section as well as he
   Elf_Rela data. After sorting, the information is put back into the
   Elf_Rela which now has the section sorted.

 - Make sorting of mcount location table for arm64 work with clang as
   well

   When clang is used, the mcount_loc section contains the addresses,
   unlike the gcc build. An array is still created and the sorting works
   for both methods.

 - Remove weak functions from the mcount_loc section

   Have the sorttable code pass in the data of functions defined via
   'nm -S' which shows the functions as well as their sizes. Using this
   information the sorttable code can determine if a function in the
   mcount_loc section was weak and overridden. If the function is not
   found, it is set to be zero. On boot, when the mcount_loc section is
   read and the ftrace table is created, if the address in the
   mcount_loc is not in the kernel core text then it is removed and not
   added to the ftrace_filter_functions (the functions that can be
   attached by ftrace callbacks).

 - Update and fix the reporting of how much data is used for ftrace
   functions

   On boot, a report of how many pages were used by the ftrace table as
   well as how they were grouped (the table holds a list of sections
   that are groups of pages that were able to be allocated). The
   removing of the weak functions required the accounting to be updated.

* tag 'trace-sorttable-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  scripts/sorttable: Allow matches to functions before function entry
  scripts/sorttable: Use normal sort if theres no relocs in the mcount section
  ftrace: Check against is_kernel_text() instead of kaslr_offset()
  ftrace: Test mcount_loc addr before calling ftrace_call_addr()
  ftrace: Have ftrace pages output reflect freed pages
  ftrace: Update the mcount_loc check of skipped entries
  scripts/sorttable: Zero out weak functions in mcount_loc table
  scripts/sorttable: Always use an array for the mcount_loc sorting
  scripts/sorttable: Have mcount rela sort use direct values
  arm64: scripts/sorttable: Implement sorting mcount_loc at boot for arm64
2025-03-27 15:44:34 -07:00
Linus Torvalds
5c2a430e85 Merge tag 'ext4-for_linus-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Ext4 bug fixes and cleanups, including:

   - hardening against maliciously fuzzed file systems

   - backwards compatibility for the brief period when we attempted to
     ignore zero-width characters

   - avoid potentially BUG'ing if there is a file system corruption
     found during the file system unmount

   - fix free space reporting by statfs when project quotas are enabled
     and the free space is less than the remaining project quota

  Also improve performance when replaying a journal with a very large
  number of revoke records (applicable for Lustre volumes)"

* tag 'ext4-for_linus-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (71 commits)
  ext4: fix OOB read when checking dotdot dir
  ext4: on a remount, only log the ro or r/w state when it has changed
  ext4: correct the error handle in ext4_fallocate()
  ext4: Make sb update interval tunable
  ext4: avoid journaling sb update on error if journal is destroying
  ext4: define ext4_journal_destroy wrapper
  ext4: hash: simplify kzalloc(n * 1, ...) to kzalloc(n, ...)
  jbd2: add a missing data flush during file and fs synchronization
  ext4: don't over-report free space or inodes in statvfs
  ext4: clear DISCARD flag if device does not support discard
  jbd2: remove jbd2_journal_unfile_buffer()
  ext4: reorder capability check last
  ext4: update the comment about mb_optimize_scan
  jbd2: fix off-by-one while erasing journal
  ext4: remove references to bh->b_page
  ext4: goto right label 'out_mmap_sem' in ext4_setattr()
  ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all()
  ext4: introduce ITAIL helper
  jbd2: remove redundant function jbd2_journal_has_csum_v2or3_feature
  ext4: remove redundant function ext4_has_metadata_csum
  ...
2025-03-27 13:27:08 -07:00
Linus Torvalds
4a4b30ea80 Merge tag 'bcachefs-2025-03-24' of git://evilpiepirate.org/bcachefs
Pull bcachefs updates from Kent Overstreet:
 "On disk format is now soft frozen: no more required/automatic are
  anticipated before taking off the experimental label.

  Major changes/features since 6.14:

   - Scrub

   - Blocksize greater than page size support

   - A number of "rebalance spinning and doing no work" issues have been
     fixed; we now check if the write allocation will succeed in
     bch2_data_update_init(), before kicking off the read.

     There's still more work to do in this area. Later we may want to
     add another bitset btree, like rebalance_work, to track "extents
     that rebalance was requested to move but couldn't", e.g. due to
     destination target having insufficient online devices.

   - We can now support scaling well into the petabyte range: latest
     bcachefs-tools will pick an appropriate bucket size at format time
     to ensure fsck can run in available memory (e.g. a server with
     256GB of ram and 100PB of storage would want 16MB buckets).

  On disk format changes:

   - 1.21: cached backpointers (scalability improvement)

     Cached replicas now get backpointers, which means we no longer rely
     on incrementing bucket generation numbers to invalidate cached
     data: this lets us get rid of the bucket generation number garbage
     collection, which had to periodically rescan all extents to
     recompute bucket oldest_gen.

     Bucket generation numbers are now only used as a consistency check,
     but they're quite useful for that.

   - 1.22: stripe backpointers

     Stripes now have backpointers: erasure coded stripes have their own
     checksums, separate from the checksums for the extents they contain
     (and stripe checksums also cover the parity blocks). This is
     required for implementing scrub for stripes.

   - 1.23: stripe lru (scalability improvement)

     Persistent lru for stripes, ordered by "number of empty blocks".
     This is used by the stripe creation path, which depending on free
     space may create a new stripe out of a partially empty existing
     stripe instead of starting a brand new stripe.

     This replaces an in-memory heap, and means we no longer have to
     read in the stripes btree at startup.

   - 1.24: casefolding

     Case insensitive directory support, courtesy of Valve.

     This is an incompatible feature, to enable mount with
       -o version_upgrade=incompatible

   - 1.25: extent_flags

     Another incompatible feature requiring explicit opt-in to enable.

     This adds a flags entry to extents, and a flag bit that marks
     extents as poisoned.

     A poisoned extent is an extent that was unreadable due to checksum
     errors. We can't move such extents without giving them a new
     checksum, and we may have to move them (for e.g. copygc or device
     evacuate). We also don't want to delete them: in the future we'll
     have an API that lets userspace ignore checksum errors and attempt
     to deal with simple bitrot itself. Marking them as poisoned lets us
     continue to return the correct error to userspace on normal read
     calls.

  Other changes/features:

   - BCH_IOCTL_QUERY_COUNTERS: this is used by the new 'bcachefs fs top'
     command, which shows a live view of all internal filesystem
     counters.

   - Improved journal pipelining: we can now have 16 journal writes in
     flight concurrently, up from 4. We're logging significantly more to
     the journal than we used to with all the recent disk accounting
     changes and additions, so some users should see a performance
     increase on some workloads.

   - BCH_MEMBER_STATE_failed: previously, we would do no IO at all to
     devices marked as failed. Now we will attempt to read from them,
     but only if we have no better options.

   - New option, write_error_timeout: devices will be kicked out of the
     filesystem if all writes have been failing for x number of seconds.

     We now also kick devices out when notified by blk_holder_ops that
     they've gone offline.

   - Device option handling improvements: the discard option should now
     be working as expected (additionally, in -tools, all device options
     that can be set at format time can now be set at device add time,
     i.e. data_allowed, state).

   - We now try harder to read data after a checksum error: we'll do
     additional retries if necessary to a device after after it gave us
     data with a checksum error.

   - More self healing work: the full inode <-> dirent consistency
     checks that are currently run by fsck are now also run every time
     we do a lookup, meaning we'll be able to correct errors at runtime.
     Runtime self healing will be flipped on after the new changes have
     seen more testing, currently they're just checking for consistency.

   - KMSAN fixes: our KMSAN builds should be nearly clean now, which
     will put a massive dent in the syzbot dashboard"

* tag 'bcachefs-2025-03-24' of git://evilpiepirate.org/bcachefs: (180 commits)
  bcachefs: Kill unnecessary bch2_dev_usage_read()
  bcachefs: btree node write errors now print btree node
  bcachefs: Fix race in print_chain()
  bcachefs: btree_trans_restart_foreign_task()
  bcachefs: bch2_disk_accounting_mod2()
  bcachefs: zero init journal bios
  bcachefs: Eliminate padding in move_bucket_key
  bcachefs: Fix a KMSAN splat in btree_update_nodes_written()
  bcachefs: kmsan asserts
  bcachefs: Fix kmsan warnings in bch2_extent_crc_pack()
  bcachefs: Disable asm memcpys when kmsan enabled
  bcachefs: Handle backpointers with unknown data types
  bcachefs: Count BCH_DATA_parity backpointers correctly
  bcachefs: Run bch2_check_dirent_target() at lookup time
  bcachefs: Refactor bch2_check_dirent_target()
  bcachefs: Move bch2_check_dirent_target() to namei.c
  bcachefs: fs-common.c -> namei.c
  bcachefs: EIO cleanup
  bcachefs: bch2_write_prep_encoded_data() now returns errcode
  bcachefs: Simplify bch2_write_op_error()
  ...
2025-03-27 13:20:07 -07:00
Linus Torvalds
f79adee883 Merge tag 'jfs-6.14' of github.com:kleikamp/linux-shaggy
Pull jfs updates from David Kleikamp:
 "Various bug fixes and cleanups for JFS"

* tag 'jfs-6.14' of github.com:kleikamp/linux-shaggy:
  jfs: add index corruption check to DT_GETPAGE()
  fs/jfs: consolidate sanity checking in dbMount
  jfs: add sanity check for agwidth in dbMount
  jfs: Prevent copying of nlink with value 0 from disk inode
  fs/jfs: Prevent integer overflow in AG size calculation
  fs/jfs: cast inactags to s64 to prevent potential overflow
  jfs: Fix uninit-value access of imap allocated in the diMount() function
  jfs: fix slab-out-of-bounds read in ea_get()
  jfs: add check read-only before truncation in jfs_truncate_nolock()
  jfs: add check read-only before txBeginAnon() call
  jfs: reject on-disk inodes of an unsupported type
  jfs: Remove reference to bh->b_page
  jfs: Delete a couple tabs in jfs_reconfigure()
2025-03-27 13:17:39 -07:00
Linus Torvalds
fde05627a2 Merge tag 'for-linus-6.15-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs update from Mike Marshall:

 - remove two orangefs bufmap routines that no longer have callers

* tag 'for-linus-6.15-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: Bufmap deadcoding
2025-03-27 13:14:39 -07:00
Linus Torvalds
c148bc7535 Merge tag 'xfs-6.15-merge' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Carlos Maiolino:

 - XFS zoned allocator: Enables XFS to support zoned devices using its
   real-time allocator

 - Use folios/vmalloc for buffer cache backing memory

 - Some code cleanups and bug fixes

* tag 'xfs-6.15-merge' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (70 commits)
  xfs: remove the flags argument to xfs_buf_get_uncached
  xfs: remove the flags argument to xfs_buf_read_uncached
  xfs: remove xfs_buf_free_maps
  xfs: remove xfs_buf_get_maps
  xfs: call xfs_buf_alloc_backing_mem from _xfs_buf_alloc
  xfs: remove unnecessary NULL check before kvfree()
  xfs: don't wake zone space waiters without m_zone_info
  xfs: don't increment m_generation for all errors in xfs_growfs_data
  xfs: fix a missing unlock in xfs_growfs_data
  xfs: Remove duplicate xfs_rtbitmap.h header
  xfs: trigger zone GC when out of available rt blocks
  xfs: trace what memory backs a buffer
  xfs: cleanup mapping tmpfs folios into the buffer cache
  xfs: use vmalloc instead of vm_map_area for buffer backing memory
  xfs: buffer items don't straddle pages anymore
  xfs: kill XBF_UNMAPPED
  xfs: convert buffer cache to use high order folios
  xfs: remove the kmalloc to page allocator fallback
  xfs: refactor backing memory allocations for buffers
  xfs: remove xfs_buf_is_vmapped
  ...
2025-03-27 13:07:00 -07:00
Linus Torvalds
0de1e84263 Merge tag 'dlm-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:

 - two fixes to the recent rcu lookup optimizations

 - a change allowing TCP to be configured with the first of multiple IP
   address

* tag 'dlm-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: make tcp still work in multi-link env
  dlm: fix error if active rsb is not hashed
  dlm: fix error if inactive rsb is not hashed
  dlm: prevent NPD when writing a positive value to event_done
  dlm: increase max number of links for corosync3/knet
2025-03-27 13:04:31 -07:00
Linus Torvalds
81d8e5e213 Merge tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "In this round, there are three major updates: (1) folio conversion,
  (2) refactoring for mount API conversion, (3) some performance
  improvement such as direct IO, checkpoint speed, and IO priority
  hints.

  For stability, there are patches which add more sanity checks and
  fixes some major issues like i_size in atomic write operations and
  write pointer recovery in zoned devices.

  Enhancements:
   - huge folio converion work by Matthew Wilcox
   - clean up for mount API conversion by Eric Sandeen
   - improve direct IO speed in the overwrite case
   - add some sanity check on node consistency
   - set highest IO priority for checkpoint thread
   - keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages
   - add ioctl to get IO priority hint
   - add carve_out sysfs node for fsstat

  Bug fixes:
   - disable nat_bits during umount to avoid potential nat entry corruption
   - fix missing i_size update on atomic writes
   - fix missing discard for active segments
   - fix running out of free segments
   - fix out-of-bounds access in f2fs_truncate_inode_blocks()
   - call f2fs_recover_quota_end() correctly
   - fix potential deadloop in prepare_compress_overwrite()
   - fix the missing write pointer correction for zoned device
   - fix to avoid panic once fallocation fails for pinfile
   - don't retry IO for corrupted data scenario

  There are many other clean up patches and minor bug fixes as usual"

* tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
  f2fs: fix missing discard for active segments
  f2fs: optimize f2fs DIO overwrites
  f2fs: fix to avoid atomicity corruption of atomic file
  f2fs: pass sbi rather than sb to parse_options()
  f2fs: pass sbi rather than sb to quota qf_name helpers
  f2fs: defer readonly check vs norecovery
  f2fs: Pass sbi rather than sb to f2fs_set_test_dummy_encryption
  f2fs: make LAZYTIME a mount option flag
  f2fs: make INLINECRYPT a mount option flag
  f2fs: factor out an f2fs_default_check function
  f2fs: consolidate unsupported option handling errors
  f2fs: use f2fs_sb_has_device_alias during option parsing
  f2fs: add carve_out sysfs node
  f2fs: fix to avoid running out of free segments
  f2fs: Remove f2fs_write_node_page()
  f2fs: Remove f2fs_write_meta_page()
  f2fs: Remove f2fs_write_data_page()
  f2fs: Remove check for ->writepage
  Revert "f2fs: rebuild nat_bits during umount"
  f2fs: fix to avoid accessing uninitialized curseg
  ...
2025-03-27 12:55:54 -07:00
Linus Torvalds
fd71def6d9 Merge tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
 "User visible changes:

   - fall back to buffered write if direct io is done on a file that
     requires checksums
      - this avoids a problem with checksum mismatch errors, observed
        e.g. on virtual images when writes to pages under writeback
        cause the checksum mismatch reports
      - this may lead to some performance degradation but currently the
        recommended setup for VM images is to use the NOCOW file
        attribute that also disables checksums

   - fast/realtime zstd levels -15 to -1
      - supported by mount options (compress=zstd:-5) and defrag ioctl
      - improved speed, reduced compression ratio, check the commit for
        sample measurements

   - defrag ioctl extended to accept negative compression levels

   - subpage mode
      - remove warning when subpage mode is used, the feature is now
        reasonably complete and tested
      - in debug mode allow to create 2K b-tree nodes to allow testing
        subpage on x86_64 with 4K pages too

  Performance improvements:

   - in send, better file path caching improves runtime (on sample load
     by -30%)

   - on s390x with hardware zlib support prepare the input buffer in a
     better way to get the best results from the acceleration

   - minor speed improvement in encoded read, avoid memory allocation in
     synchronous mode

  Core:

   - enable stable writes on inodes, replacing manually waiting for
     writeback and allowing to skip that on inodes without checksums

   - add last checks and warnings for out-of-band dirty writes to pages,
     requiring a fixup ("fixup worker"), this should not be necessary
     since 5.8 where get_user_page() and pin_user_pages*() prevent this
      - long history behind that, we'll be happy to remove the whole
        infrastructure in the near future

   - more folio API conversions and preparations for large folio support

   - subpage cleanups and refactoring, split handling of data and
     metadata to allow future support for large folios

   - readpage works as block-by-block, no change for normal mode, this
     is preparation for future subpage updates

   - block group refcount fixes and hardening

   - delayed iput fixes

   - in zoned mode, fix zone activation on filesystem with missing
     devices

  Cleanups:

   - inode parameter cleanups

   - path auto-freeing updates

   - code flow simplifications in send

   - redundant parameter cleanups"

* tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (164 commits)
  btrfs: zoned: fix zone finishing with missing devices
  btrfs: zoned: fix zone activation with missing devices
  btrfs: remove end_no_trans label from btrfs_log_inode_parent()
  btrfs: simplify condition for logging new dentries at btrfs_log_inode_parent()
  btrfs: remove redundant else statement from btrfs_log_inode_parent()
  btrfs: use memcmp_extent_buffer() at replay_one_extent()
  btrfs: update outdated comment for overwrite_item()
  btrfs: use variables to store extent buffer and slot at overwrite_item()
  btrfs: avoid unnecessary memory allocation and copy at overwrite_item()
  btrfs: don't clobber ret in btrfs_validate_super()
  btrfs: prepare btrfs_page_mkwrite() for large folios
  btrfs: prepare extent_io.c for future large folio support
  btrfs: prepare btrfs_launcher_folio() for large folios support
  btrfs: replace PAGE_SIZE with folio_size for subpage.[ch]
  btrfs: add a size parameter to btrfs_alloc_subpage()
  btrfs: subpage: make btrfs_is_subpage() check against a folio
  btrfs: add extra warning if delayed iput is added when it's not allowed
  btrfs: avoid redundant path slot assignment in btrfs_search_forward()
  btrfs: remove unnecessary btrfs_key local variable in btrfs_search_forward()
  btrfs: simplify the return value handling in search_ioctl()
  ...
2025-03-27 12:51:48 -07:00
Linus Torvalds
b2e7b0ffa5 Merge tag 'erofs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
 "In this cycle, EROFS 48-bit block addressing is available to support
  massive datasets for model training and other large data archive use
  cases.

  In addition, byte-oriented encoded extents have been supported to
  reduce metadata sizes when using large configurations as well as to
  improve Zstd compression speed.

  There are some bugfixes and cleanups as usual.

  Summary:

   - Support 48-bit block addressing for large images

   - Introduce encoded extents to reduce metadata on larger pclusters

   - Enable unaligned compressed data to improve Zstd compression speed

   - Allow 16-byte volume names again

   - Minor cleanups"

* tag 'erofs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: enable 48-bit layout support
  erofs: support unaligned encoded data
  erofs: implement encoded extent metadata
  erofs: add encoded extent on-disk definition
  erofs: initialize decompression early
  erofs: support dot-omitted directories
  erofs: implement 48-bit block addressing for unencoded inodes
  erofs: add 48-bit block addressing on-disk support
  erofs: simplify erofs_{read,fill}_inode()
  erofs: get rid of erofs_map_blocks_flatmode()
  erofs: move {in,out}pages into struct z_erofs_decompress_req
  erofs: clean up header parsing for ztailpacking and fragments
  erofs: simplify tail inline pcluster handling
  erofs: allow 16-byte volume name again
  erofs: get rid of erofs_kmap_type
  erofs: use Z_EROFS_LCLUSTER_TYPE_MAX to simplify switches
2025-03-27 12:40:40 -07:00
Linus Torvalds
ef479de65a Merge tag 'gfs2-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:

 - Fix two bugs related to locking request cancelation (locking request
   being retried instead of canceled; canceling the wrong locking
   request)

 - Prevent a race between inode creation and deferred delete analogous
   to commit ffd1cf0443 from 6.13. This now allows to further simplify
   gfs2_evict_inode() without introducing mysterious problems

 - When in inode delete should be verified / retried "later" but that
   isn't possible, skip the delete instead of carrying it out
   immediately. This broke in 6.13

 - More folio conversions from Matthew Wilcox (plus a fix from Dan
   Carpenter)

 - Various minor fixes and cleanups

* tag 'gfs2-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (22 commits)
  gfs2: some comment clarifications
  gfs2: Fix a NULL vs IS_ERR() bug in gfs2_find_jhead()
  gfs2: Convert gfs2_meta_read_endio() to use a folio
  gfs2: Convert gfs2_end_log_write_bh() to work on a folio
  gfs2: Convert gfs2_find_jhead() to use a folio
  gfs2: Convert gfs2_jhead_pg_srch() to gfs2_jhead_folio_search()
  gfs2: Use b_folio in gfs2_check_magic()
  gfs2: Use b_folio in gfs2_submit_bhs()
  gfs2: Use b_folio in gfs2_trans_add_meta()
  gfs2: Use b_folio in gfs2_log_write_bh()
  gfs2: skip if we cannot defer delete
  gfs2: remove redundant warnings
  gfs2: minor evict fix
  gfs2: Prevent inode creation race (2)
  gfs2: Fix additional unlikely request cancelation race
  gfs2: Fix request cancelation bug
  gfs2: Check for empty queue in run_queue
  gfs2: Remove more dead code in add_to_queue
  gfs2: Replace GIF_DEFER_DELETE with GLF_DEFER_DELETE
  gfs2: glock holder GL_NOPID fix
  ...
2025-03-27 12:09:25 -07:00
Linus Torvalds
3a90a72aca Merge tag 'asm-generic-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
 "This is mainly set of cleanups of asm-generic/io.h, resolving problems
  with inconsistent semantics of ioread64/iowrite64 that were causing
  runtime and build issues.

  The "GENERIC_IOMAP" version that switches between inb()/outb() and
  readb()/writeb() style accessors is now only used on architectures
  that have PC-style ISA devices that are not memory mapped (x86, uml,
  m68k-q40 and powerpc-powernv), while alpha and parisc use a more
  complicated variant and everything else just maps the ioread
  interfaces to plan MMIO (readb/writeb etc).

  In addition there are two small changes from Raag Jadav to simplify
  the asm-generic/io.h indirect inclusions and from Jann Horn to fix a
  corner case with read_word_at_a_time"

* tag 'asm-generic-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  rwonce: fix crash by removing READ_ONCE() for unaligned read
  rwonce: handle KCSAN like KASAN in read_word_at_a_time()
  m68k: coldfire: select PCI_IOMAP for PCI
  mips: export pci_iounmap()
  mips: fix PCI_IOBASE definition
  m68k/nommu: stop using GENERIC_IOMAP
  mips: drop GENERIC_IOMAP wrapper
  powerpc: asm/io.h: remove split ioread64/iowrite64 helpers
  parisc: stop using asm-generic/iomap.h
  sh: remove duplicate ioread/iowrite helpers
  alpha: stop using asm-generic/iomap.h
  io.h: drop unused headers
  drm/draw: include missing headers
  asm-generic/io.h: rework split ioread64/iowrite64 helpers
2025-03-27 09:46:53 -07:00
Mimi Zohar
a414016218 ima: limit the number of ToMToU integrity violations
Each time a file in policy, that is already opened for read, is opened
for write, a Time-of-Measure-Time-of-Use (ToMToU) integrity violation
audit message is emitted and a violation record is added to the IMA
measurement list.  This occurs even if a ToMToU violation has already
been recorded.

Limit the number of ToMToU integrity violations per file open for read.

Note: The IMA_MAY_EMIT_TOMTOU atomic flag must be set from the reader
side based on policy.  This may result in a per file open for read
ToMToU violation.

Since IMA_MUST_MEASURE is only used for violations, rename the atomic
IMA_MUST_MEASURE flag to IMA_MAY_EMIT_TOMTOU.

Cc: stable@vger.kernel.org # applies cleanly up to linux-6.6
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Tested-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-03-27 12:40:12 -04:00