There are a couple of places in the kublk selftests ublk server which
use the legacy ublk opcodes. These operations fail (with -EOPNOTSUPP) on
a kernel compiled without CONFIG_BLKDEV_UBLK_LEGACY_OPCODES set. We
could easily require it to be set as a prerequisite for these selftests,
but since new applications should not be using the legacy opcodes, use
the ioctl-encoded opcodes everywhere in kublk.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250401-ublk_selftests-v1-1-98129c9bc8bb@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When readlen is set for a recvzc request, tcp_read_sock() will call
io_zcrx_recv_skb() one final time with len == desc->count == 0. This is
caused by the !desc->count check happening too late. The offset + 1 !=
skb->len happens earlier and causes the while loop to continue.
Fix this in io_zcrx_recv_skb() instead of tcp_read_sock(). Return early
if len is 0 i.e. the read is done.
Fixes: 6699ec9a23 ("io_uring/zcrx: add a read limit to recvzc requests")
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://lore.kernel.org/r/20250401195355.1613813-1-dw@davidwei.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A previous commit removed the use of this footnote, delete it.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 3fdf2ec7da ("Documentation: ublk: Drop Stefan Hajnoczi's message footnote")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Implement ->queue_rqs() for improving perf in case of MQ.
In this way, we just need to call io_uring_cmd_complete_in_task() once for
whole IO batch, then both io_uring and ublk server can get exact batch from
ublk frontend.
Follows IOPS improvement:
- tests
tools/testing/selftests/ublk/kublk add -t null -q 2 [-z]
fio/t/io_uring -p0 /dev/ublkb0
- results:
more than 10% IOPS boost observed
Pass all ublk selftests, especially the io dispatch order test.
Cc: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
IO split is usually bad in io_uring world, since -EAGAIN is caused and
IO handling may have to fallback to io-wq, this way does hurt performance.
ublk starts to support zero copy recently, for avoiding unnecessary IO
split, ublk driver's segment limit should be aligned with backend
device's segment limit.
Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
which number is aligned with ublk request segment number, so that big
memory allocation can be avoided by setting reasonable max_segments limit.
So add segment parameter for providing ublk server chance to align
segment limit with backend, and keep it reasonable from implementation
viewpoint.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now ublk driver depends on `ubq->canceling` for deciding if the request
can be dispatched via uring_cmd & io_uring_cmd_complete_in_task().
Once ubq->canceling is set, the uring_cmd can be done via ublk_cancel_cmd()
and io_uring_cmd_done().
So set ubq->canceling when queue is frozen, this way makes sure that the
flag can be observed from ublk_queue_rq() reliably, and avoids
use-after-free on uring_cmd.
Fixes: 216c8f5ef0 ("ublk: replace monitor with cancelable uring_cmd")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull more io_uring updates from Jens Axboe:
"Final separate updates for io_uring.
This started out as a series of cleanups improvements and improvements
for registered buffers, but as the last series of the io_uring changes
for 6.15, it also collected a few fixes for the other branches on top:
- Add support for vectored fixed/registered buffers.
Previously only single segments have been supported for commands,
now vectored variants are supported as well. This series includes
networking and file read/write support.
- Small series unifying return codes across multi and single shot.
- Small series cleaning up registerd buffer importing.
- Adding support for vectored registered buffers for uring_cmd.
- Fix for io-wq handling of command reissue.
- Various little fixes and tweaks"
* tag 'for-6.15/io_uring-reg-vec-20250327' of git://git.kernel.dk/linux: (25 commits)
io_uring/net: fix io_req_post_cqe abuse by send bundle
io_uring/net: use REQ_F_IMPORT_BUFFER for send_zc
io_uring: move min_events sanitisation
io_uring: rename "min" arg in io_iopoll_check()
io_uring: open code __io_post_aux_cqe()
io_uring: defer iowq cqe overflow via task_work
io_uring: fix retry handling off iowq
io_uring/net: only import send_zc buffer once
io_uring/cmd: introduce io_uring_cmd_import_fixed_vec
io_uring/cmd: add iovec cache for commands
io_uring/cmd: don't expose entire cmd async data
io_uring: rename the data cmd cache
io_uring: rely on io_prep_reg_vec for iovec placement
io_uring: introduce io_prep_reg_iovec()
io_uring: unify STOP_MULTISHOT with IOU_OK
io_uring: return -EAGAIN to continue multishot
io_uring: cap cached iovec/bvec size
io_uring/net: implement vectored reg bufs for zctx
io_uring/net: convert to struct iou_vec
io_uring/net: pull vec alloc out of msghdr import
...
Pull io_uring epoll support from Jens Axboe:
"This adds support for reading epoll events via io_uring.
While this may seem counter-intuitive (and/or productive), the
reasoning here is that quite a few existing epoll event loops can
easily do a partial conversion to a completion based model, but are
still stuck with one (or few) event types that remain readiness based.
For that case, they then need to add the io_uring fd to the epoll
context, and continue to rely on epoll_wait(2) for waiting on events.
This misses out on the finer grained waiting that io_uring can do, to
reduce context switches and wait for multiple events in one batch
reliably.
With adding support for reaping epoll events via io_uring, the whole
legacy readiness based event types can still be reaped via epoll, with
the overall waiting in the loop be driven by io_uring"
* tag 'for-6.15/io_uring-epoll-wait-20250325' of git://git.kernel.dk/linux:
io_uring/epoll: add support for IORING_OP_EPOLL_WAIT
io_uring/epoll: remove CONFIG_EPOLL guards
Pull io_uring zero-copy receive support from Jens Axboe:
"This adds support for zero-copy receive with io_uring, enabling fast
bulk receive of data directly into application memory, rather than
needing to copy the data out of kernel memory.
While this version only supports host memory as that was the initial
target, other memory types are planned as well, with notably GPU
memory coming next.
This work depends on some networking components which were queued up
on the networking side, but have now landed in your tree.
This is the work of Pavel Begunkov and David Wei. From the v14 posting:
'We configure a page pool that a driver uses to fill a hw rx queue
to hand out user pages instead of kernel pages. Any data that ends
up hitting this hw rx queue will thus be dma'd into userspace
memory directly, without needing to be bounced through kernel
memory. 'Reading' data out of a socket instead becomes a
_notification_ mechanism, where the kernel tells userspace where
the data is. The overall approach is similar to the devmem TCP
proposal
This relies on hw header/data split, flow steering and RSS to
ensure packet headers remain in kernel memory and only desired
flows hit a hw rx queue configured for zero copy. Configuring this
is outside of the scope of this patchset.
We share netdev core infra with devmem TCP. The main difference is
that io_uring is used for the uAPI and the lifetime of all objects
are bound to an io_uring instance. Data is 'read' using a new
io_uring request type. When done, data is returned via a new shared
refill queue. A zero copy page pool refills a hw rx queue from this
refill queue directly. Of course, the lifetime of these data
buffers are managed by io_uring rather than the networking stack,
with different refcounting rules.
This patchset is the first step adding basic zero copy support. We
will extend this iteratively with new features e.g. dynamically
allocated zero copy areas, THP support, dmabuf support, improved
copy fallback, general optimisations and more'
In a local setup, I was able to saturate a 200G link with a single CPU
core, and at netdev conf 0x19 earlier this month, Jamal reported
188Gbit of bandwidth using a single core (no HT, including soft-irq).
Safe to say the efficiency is there, as bigger links would be needed
to find the per-core limit, and it's considerably more efficient and
faster than the existing devmem solution"
* tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux:
io_uring/zcrx: add selftest case for recvzc with read limit
io_uring/zcrx: add a read limit to recvzc requests
io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap
io_uring: Rename KConfig to Kconfig
io_uring/zcrx: fix leaks on failed registration
io_uring/zcrx: recheck ifq on shutdown
io_uring/zcrx: add selftest
net: add documentation for io_uring zcrx
io_uring/zcrx: add copy fallback
io_uring/zcrx: throttle receive requests
io_uring/zcrx: set pp memory provider for an rx queue
io_uring/zcrx: add io_recvzc request
io_uring/zcrx: dma-map area for the device
io_uring/zcrx: implement zerocopy receive pp memory provider
io_uring/zcrx: grab a net device
io_uring/zcrx: add io_zcrx_area
io_uring/zcrx: add interface queue and refill queue
Pull tpm updates from Jarkko Sakkinen:
"This contains a new driver: a TPM FF-A driver.
FF comes from Firmware Framework, and A comes from Arm's A-profile.
FF-A is essentially a standard mechanism to communicate with TrustZone
apps such as TPM.
Other than that, this includes a pile of fixes and small improvments"
* tag 'tpmdd-next-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: Make chip->{status,cancel,req_canceled} opt
MAINTAINERS: TPM DEVICE DRIVER: add missing includes
tpm: End any active auth session before shutdown
Documentation: tpm: Add documentation for the CRB FF-A interface
tpm_crb: Add support for the ARM FF-A start method
ACPICA: Add start method for ARM FF-A
tpm_crb: Clean-up and refactor check for idle support
tpm_crb: ffa_tpm: Implement driver compliant to CRB over FF-A
tpm/tpm_ftpm_tee: fix struct ftpm_tee_private documentation
tpm, tpm_tis: Workaround failed command reception on Infineon devices
tpm, tpm_tis: Fix timeout handling when waiting for TPM status
tpm: Convert warn to dbg in tpm2_start_auth_session()
tpm: Lazily flush auth session when getting random data
tpm: ftpm_tee: remove incorrect of_match_ptr annotation
tpm: do not start chip while suspended
Pull CRC fixes from Eric Biggers:
"Fix out-of-scope array bugs in arm and arm64's crc_t10dif_arch()"
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
arm64/crc-t10dif: fix use of out-of-scope array in crc_t10dif_arch()
arm/crc-t10dif: fix use of out-of-scope array in crc_t10dif_arch()
Pull landlock updates from Mickaël Salaün:
"This brings two main changes to Landlock:
- A signal scoping fix with a new interface for user space to know if
it is compatible with the running kernel.
- Audit support to give visibility on why access requests are denied,
including the origin of the security policy, missing access rights,
and description of object(s). This was designed to limit log spam
as much as possible while still alerting about unexpected blocked
access.
With these changes come new and improved documentation, and a lot of
new tests"
* tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (36 commits)
landlock: Add audit documentation
selftests/landlock: Add audit tests for network
selftests/landlock: Add audit tests for filesystem
selftests/landlock: Add audit tests for abstract UNIX socket scoping
selftests/landlock: Add audit tests for ptrace
selftests/landlock: Test audit with restrict flags
selftests/landlock: Add tests for audit flags and domain IDs
selftests/landlock: Extend tests for landlock_restrict_self(2)'s flags
selftests/landlock: Add test for invalid ruleset file descriptor
samples/landlock: Enable users to log sandbox denials
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_*_EXEC_* flags
landlock: Log scoped denials
landlock: Log TCP bind and connect denials
landlock: Log truncate and IOCTL denials
landlock: Factor out IOCTL hooks
landlock: Log file-related denials
landlock: Log mount-related denials
landlock: Add AUDIT_LANDLOCK_DOMAIN and log domain status
landlock: Add AUDIT_LANDLOCK_ACCESS and log ptrace denials
...
Pull capabilities update from Serge Hallyn:
"This contains just one patch that removes a helper function whose last
user (smack) stopped using it in 2018"
* tag 'caps-pr-20250327' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux:
capability: Remove unused has_capability
Pull ima updates from Mimi Zohar:
"Two performance improvements, which minimize the number of integrity
violations"
* tag 'integrity-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: limit the number of ToMToU integrity violations
ima: limit the number of open-writers integrity violations
Pull ipe update from Fan Wu:
"This contains just one commit from Randy Dunlap, which fixes
kernel-doc warnings in the IPE subsystem"
* tag 'ipe-pr-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
ipe: policy_fs: fix kernel-doc warnings
This reverts commit 36f5f026df, reversing
changes made to 43a7eec035.
Thomas says:
"I just noticed that for some incomprehensible reason, probably sheer
incompetemce when trying to utilize b4, I managed to merge an outdated
_and_ buggy version of that series.
Can you please revert that merge completely?"
Done.
Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull m68knommu updates from Greg Ungerer:
- remove unused include of linux/fb.h
- use strscpy() instead of strncpy()
* tag 'm68knommu-for-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: mm: Replace deprecated strncpy() with strscpy()
m68k: Do not include <linux/fb.h>
Pull powerpc updates from Madhavan Srinivasan:
- Remove support for IBM Cell Blades
- SMP support for microwatt platform
- Support for inline static calls on PPC32
- Enable pmu selftests for power11 platform
- Enable hardware trace macro (HTM) hcall support
- Support for limited address mode capability
- Changes to RMA size from 512 MB to 768 MB to handle fadump
- Misc fixes and cleanups
Thanks to Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann,
Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom,
Gaurav Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook,
Mahesh Salgaonkar, Michael Ellerman, Paul Mackerras, Ritesh Harjani
(IBM), Sathvika Vasireddy, Segher Boessenkool, Sourabh Jain, Vaibhav
Jain, and Venkat Rao Bagalkote.
* tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (61 commits)
powerpc/kexec: fix physical address calculation in clear_utlb_entry()
crypto: powerpc: Mark ghashp8-ppc.o as an OBJECT_FILES_NON_STANDARD
powerpc: Fix 'intra_function_call not a direct call' warning
powerpc/perf: Fix ref-counting on the PMU 'vpa_pmu'
KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests
powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7
powerpc/microwatt: Add SMP support
powerpc: Define config option for processors with broadcast TLBIE
powerpc/microwatt: Define an idle power-save function
powerpc/microwatt: Device-tree updates
powerpc/microwatt: Select COMMON_CLK in order to get the clock framework
net: toshiba: Remove reference to PPC_IBM_CELL_BLADE
net: spider_net: Remove powerpc Cell driver
cpufreq: ppc_cbe: Remove powerpc Cell driver
genirq: Remove IRQ_EDGE_EOI_HANDLER
docs: Remove reference to removed CBE_CPUFREQ_SPU_GOVERNOR
powerpc: Remove UDBG_RTAS_CONSOLE
powerpc/io: Use standard barrier macros in io.c
powerpc/io: Rename _insw_ns() etc.
powerpc/io: Use generic raw accessors
...