linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-14 04:09:18 -04:00

Author	SHA1	Message	Date
Caleb Sander Mateos	ef9f603fd3	io_uring/cmd: drop unused res2 param from io_uring_cmd_done() Commit `79525b51ac` ("io_uring: fix nvme's 32b cqes on mixed cq") split out a separate io_uring_cmd_done32() helper for ->uring_cmd() implementations that return 32-byte CQEs. The res2 value passed to io_uring_cmd_done() is now unused because __io_uring_cmd_done() ignores it when is_cqe32 is passed as false. So drop the parameter from io_uring_cmd_done() to simplify the callers and clarify that it's not possible to return an extra value beyond the 32-bit CQE result. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-23 00:15:02 -06:00
Keith Busch	79525b51ac	io_uring: fix nvme's 32b cqes on mixed cq The nvme uring_cmd only uses 32b CQEs. If the ring uses a mixed CQ, then we need to make sure we flag the completion as a 32b CQE. On the other hand, if nvme uring_cmd was using a dedicated 32b CQE, the posting was missing the extra memcpy because it only applied to bit CQEs on a mixed CQ. Fixes: `e26dca67fd` ("io_uring: add support for IORING_SETUP_CQE_MIXED") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-20 06:26:38 -06:00
Pavel Begunkov	7ea24326e7	io_uring/query: cap number of queries If a query chain forms a cycle, it'll be looping in the kernel until the process is killed. It might be fine as any such mistake can be easily uncovered during testing, but it's still nicer to let it break out of the syscall if it executed too many queries. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-19 07:06:43 -06:00
Pavel Begunkov	2408d17832	io_uring/query: prevent infinite loops If the query chain forms a cycle, the interface will loop indefinitely. Make sure it handles fatal signals, so the user can kill the process and hence break out of the infinite loop. Fixes: `c265ae75f9` ("io_uring: introduce io_uring querying") Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-19 07:06:43 -06:00
Pavel Begunkov	31bf77dcc3	io_uring/zcrx: account niov arrays to cgroup net_iov / freelist / etc. arrays can be quite long, make sure they're accounted. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:21 -06:00
Pavel Begunkov	705d2ac7b2	io_uring/zcrx: allow synchronous buffer return Returning buffers via a ring is performant and convenient, but it becomes a problem when/if the user misconfigured the ring size and it becomes full. Add a synchronous way to return buffers back to the page pool via a new register opcode. It's supposed to be a reliable slow path for refilling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:21 -06:00
Pavel Begunkov	8fd08d8dda	io_uring/zcrx: introduce io_parse_rqe() Add a helper for verifying a rqe and extracting a niov out of it. It'll be reused in following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:21 -06:00
Pavel Begunkov	5a8b6e7c1d	io_uring/zcrx: don't adjust free cache space The cache should be empty when io_pp_zc_alloc_netmems() is called, that's promised by page pool and further checked, so there is no need to recalculate the available space in io_zcrx_ring_refill(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:21 -06:00
Pavel Begunkov	c95257f336	io_uring/zcrx: use guards for the refill lock Use guards for rq_lock in io_zcrx_ring_refill(), makes it a tad simpler. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:21 -06:00
Pavel Begunkov	73fa880eff	io_uring/zcrx: reduce netmem scope in refill Reduce the scope of a local var netmem in io_zcrx_ring_refill. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	20dda449c0	io_uring/zcrx: protect netdev with pp_lock Remove ifq->lock and reuse pp_lock to protect the netdev pointer. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	4f602f3112	io_uring/zcrx: rename dma lock In preparation for reusing the lock for other purposes, rename it to "pp_lock". As before, it can be taken deeper inside the networking stack by page pool, and so the syscall io_uring must avoid holding it while doing queue reconfiguration or anything that can result in immediate pp init/destruction. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	d8d135dfe3	io_uring/zcrx: make niov size variable Instead of using PAGE_SIZE for the niov size add a niov_shift field to ifq, and patch up all important places. Copy fallback still assumes PAGE_SIZE, so it'll be wasting some memory for now. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	5d93f7bade	io_uring/zcrx: set sgt for umem area Set struct io_zcrx_mem::sgt for umem areas as well to simplify looking up the current sg table. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	6c18511729	io_uring/zcrx: remove dmabuf_offset It was removed from uapi, so now it's always 0 and can be removed together with offset handling in io_populate_area_dma(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	439a98b972	io_uring/zcrx: deduplicate area mapping With a common type for storing dma addresses and io_populate_area_dma(), type-specific area mapping helpers are trivial, so open code them and deduplicate the call to io_populate_area_dma(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	02bb047b5f	io_uring/zcrx: pass ifq to io_zcrx_alloc_fallback() io_zcrx_copy_chunk() doesn't and shouldn't care from which area the buffer is allocated, don't try to resolve the area in it but pass the ifq to io_zcrx_alloc_fallback() and let it handle it. Also rename it for more clarity. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	d7ae46b454	io_uring/zcrx: check all niovs filled with dma addresses Add a warning if io_populate_area_dma() can't fill in all net_iovs, it should never happen. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	01464ea405	io_uring/zcrx: move area reg checks into io_import_area io_import_area() is responsible for importing memory and parsing io_uring_zcrx_area_reg, so move all area reg structure checks into the function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	d425f13146	io_uring/zcrx: don't pass slot to io_zcrx_create_area Don't pass a pointer to a pointer where an area should be stored to io_zcrx_create_area(), and let it handle finding the right place for a new area. It's more straightforward and will be needed to support multiple areas. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	c49606fc4b	io_uring/zcrx: remove extra io_zcrx_drop_netdev io_close_queue() already detaches the netdev, don't unnecessary call io_zcrx_drop_netdev() right after. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	d5e31db9a9	io_uring/zcrx: use page_pool_unref_and_test() page_pool_unref_and_test() tries to better follow usuall refcount semantics, use it instead of page_pool_unref_netmem(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	bdc0d478a1	io_uring/zcrx: replace memchar_inv with is_zero memchr_inv() is more ambiguous than mem_is_zero(), so use the latter for zero checks. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Pavel Begunkov	9eb3c57178	io_uring/zcrx: improve rqe cache alignment Refill queue entries are 16B structures, but because of the ring header placement, they're 8B aligned but not naturally / 16B aligned, which means some of them span across 2 cache lines. Push rqes to a new cache line. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-16 12:37:20 -06:00
Jens Axboe	1b3aa39007	io_uring/uring_cmd: correct signature for io_uring_mshot_cmd_post_cqe() The !CONFIG_IO_URING signature is wrong, fix that up. The non stub signature got updated for the io_br_sel changes that happened before this patch went in, but the stub one did not. Fixes: `620a50c927` ("io_uring: uring_cmd: add multishot support") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-15 09:17:24 -06:00
Jens Axboe	9adc6669a6	io_uring: correct size of overflow CQE calculation If a 32b CQE is required, don't double the size of the overflow struct, just add the size of the io_uring_cqe addition that is needed. This avoids allocating too much memory, as the io_overflow_cqe size includes the list member required to queue them too. Fixes: `e26dca67fd` ("io_uring: add support for IORING_SETUP_CQE_MIXED") Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-10 17:30:51 -06:00
Marco Crivellari	9f5f69d98e	io_uring: replace use of system_unbound_wq with system_dfl_wq Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. system_unbound_wq should be the default workqueue so as not to enforce locality constraints for random work whenever it's not required. Adding system_dfl_wq to encourage its use when unbound work should be used. queue_work() / queue_delayed_work() / mod_delayed_work() will now use the new unbound wq: whether the user still use the old wq a warn will be printed along with a wq redirect to the new one. The old system_unbound_wq will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-10 08:21:03 -06:00
Marco Crivellari	8577441d4a	io_uring: replace use of system_wq with system_percpu_wq Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. system_wq is a per-CPU worqueue, yet nothing in its name tells about that CPU affinity constraint, which is very often not required by users. Make it clear by adding a system_percpu_wq. queue_work() / queue_delayed_work() mod_delayed_work() will now use the new per-cpu wq: whether the user still stick on the old name a warn will be printed along a wq redirect to the new one. This patch add the new system_percpu_wq except for mm, fs and net subsystem, whom are handled in separated patches. The old wq will be kept for a few release cylces. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-10 08:21:02 -06:00
Caleb Sander Mateos	2f076a453f	io_uring/rsrc: respect submitter_task in io_register_clone_buffers() io_ring_ctx's enabled with IORING_SETUP_SINGLE_ISSUER are only allowed a single task submitting to the ctx. Although the documentation only mentions this restriction applying to io_uring_enter() syscalls, commit `d7cce96c44` ("io_uring: limit registration w/ SINGLE_ISSUER") extends it to io_uring_register(). Ensuring only one task interacts with the io_ring_ctx will be important to allow this task to avoid taking the uring_lock. There is, however, one gap in these checks: io_register_clone_buffers() may take the uring_lock on a second (source) io_ring_ctx, but __io_uring_register() only checks the current thread against the destination io_ring_ctx's submitter_task. Fail the IORING_REGISTER_CLONE_BUFFERS with -EEXIST if the source io_ring_ctx has a registered submitter_task other than the current task. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-08 13:21:24 -06:00
Caleb Sander Mateos	5d4c52bfa8	io_uring: don't include filetable.h in io_uring.h io_uring/io_uring.h doesn't use anything declared in io_uring/filetable.h, so drop the unnecessary #include. Add filetable.h includes in .c files previously relying on the transitive include from io_uring.h. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-08 13:20:46 -06:00
Thorsten Blum	7b0604d77a	io_uring: Replace kzalloc() + copy_from_user() with memdup_user() Replace kzalloc() followed by copy_from_user() with memdup_user() to improve and simplify io_probe(). No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-08 08:21:36 -06:00
Jens Axboe	473efbc3ca	io_uring/uring_cmd: fix __io_uring_cmd_do_in_task !CONFIG_IO_URING typo A manual application of this patch resulted in a typo for the stub function __io_uring_cmd_do_in_task(), for the case where CONFIG_IO_URING isn't true. Fix that up. Reported-by: Klara Modin <klarasmodin@gmail.com> Fixes: `df3a7762ee` ("io_uring/uring_cmd: add io_uring_cmd_tw_t type alias") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-08 08:18:15 -06:00
Pavel Begunkov	c265ae75f9	io_uring: introduce io_uring querying There are many parameters users might want to query about io_uring like available request types or the ring sizes. This patch introduces an interface for such slow path queries. It was written with several requirements in mind: - Can be used with or without an io_uring instance. Asking for supported setup flags before creating an instance as well as qeurying info about an already created ring are valid use cases. - Should be moderately fast. For example, users might use it to periodically retrieve ring attributes at runtime. As a consequence, it should be able to query multiple attributes in a single syscall. - Backward and forward compatible. - Should be reasobably easy to use. - Reduce the kernel code size for introducing new query types. It's implemented as a new registration opcode IORING_REGISTER_QUERY. The user passes one or more query strutctures linked together, each represented by struct io_uring_query_hdr. The header stores common control fields needed for processing and points to query type specific information. The header contains - The query type - The result field, which on return contains the error code for the query - Pointer to the query type specific information - The size of the query structure. The kernel will only populate up to the size, which helps with backward compatibility. The kernel can also reduce the size, so if the current kernel is older than the inteface the user tries to use, it'll get only the supported bits. - next_entry field is used to chain multiple queries. Apart from common registeration syscall failures, it can only immediately return an error code in case when the headers are incorrect or any other addresses and invalid. That usually mean that the userspace doesn't use the API right and should be corrected. All query type specific errors are returned in the header's result field. As an example, the patch adds a single query type for now, i.e. IO_URING_QUERY_OPCODES, which tells what register / request / etc. opcodes are supported, but there are particular plans to extend it. Note: there is a request probing interface via IORING_REGISTER_PROBE, but it's a mess. It requires the user to create a ring first, it only works for requests, and requires dynamic allocations. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-08 08:06:37 -06:00
Pavel Begunkov	63805d0a9b	io_uring: add macros for avaliable flags Add constants for supported setup / request / feature flags as well as the feature mask. They'll be used in the next patch. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-08 08:06:37 -06:00
Pavel Begunkov	da8bc3c81c	io_uring: add helper for *REGISTER_SEND_MSG_RING Move handling of IORING_REGISTER_SEND_MSG_RING into a separate function in preparation to growing io_uring_register_blind(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-08 08:06:37 -06:00
Caleb Sander Mateos	c2685729fa	io_uring: remove WRITE_ONCE() in io_uring_create() There's no need to use WRITE_ONCE() to set ctx->submitter_task in io_uring_create() since no other task can access the io_ring_ctx until a file descriptor is associated with it. So use a normal assignment instead of WRITE_ONCE(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250904161223.2600435-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-04 17:22:24 -06:00
Caleb Sander Mateos	9f8608fce9	io_uring/cmd: remove unused io_uring_cmd_iopoll_done() io_uring_cmd_iopoll_done()'s only caller was removed in commit `9ce6c9875f` ("nvme: always punt polled uring_cmd end_io work to task_work"). So remove the unused function too. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250902013328.1517686-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-03 17:35:26 -06:00
Caleb Sander Mateos	dd386b0d5e	io_uring/uring_cmd: correct io_uring_cmd_done() ret type io_uring_cmd_done() takes the result code for the CQE as a ssize_t ret argument. However, the CQE res field is a s32 value, as is the argument to io_req_set_res(). To clarify that only s32 values can be faithfully represented without truncation, change io_uring_cmd_done()'s ret argument type to s32. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250902012609.1513123-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-03 17:34:36 -06:00
Caleb Sander Mateos	df3a7762ee	io_uring/uring_cmd: add io_uring_cmd_tw_t type alias Introduce a function pointer type alias io_uring_cmd_tw_t for the uring_cmd task work callback. This avoids repeating the signature in several places. Also name both arguments to the callback to clarify what they represent. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250902160657.1726828-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-02 19:21:12 -06:00
Caleb Sander Mateos	8b9c9a2e7d	io_uring/register: drop redundant submitter_task check For IORING_SETUP_SINGLE_ISSUER io_ring_ctx's, io_register_resize_rings() checks that the current task is the ctx's submitter_task. However, its caller __io_uring_register() already checks this. Drop the redundant check in io_register_resize_rings(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250902215108.1925105-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-02 19:20:24 -06:00
Jens Axboe	37500634d0	io_uring/net: correct type for min_not_zero() cast The kernel test robot reports that after a recent change, the signedness of a min_not_zero() compare is now incorrect. Fix that up and cast to the right type. Fixes: `429884ff35` ("io_uring/kbuf: use struct io_br_sel for multiple buffers picking") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509020426.WJtrdwOU-lkp@intel.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-02 05:19:42 -06:00
Jens Axboe	4c0b26e23c	io_uring: add async data clear/free helpers Futex recently had an issue where it mishandled how ->async_data and REQ_F_ASYNC_DATA is handled. To avoid future issues like that, add a set of helpers that either clear or clear-and-free the async data assigned to a struct io_kiocb. Convert existing manual handling of that to use the helpers. No intended functional changes in this patch. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-08-27 11:24:25 -06:00
Jens Axboe	c986f7586b	io_uring/zcrx: add support for IORING_SETUP_CQE_MIXED zcrx currently requires the ring to be set up with fixed 32b CQEs, allow it to use IORING_SETUP_CQE_MIXED as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-08-27 11:24:22 -06:00
Jens Axboe	1e81bf1414	io_uring/uring_cmd: add support for IORING_SETUP_CQE_MIXED Certain users of uring_cmd currently require fixed 32b CQE support, which is propagated through IO_URING_F_CQE32. Allow IORING_SETUP_CQE_MIXED to cover that case as well, so not all CQEs posted need to be 32b in size. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-08-27 11:24:15 -06:00
Jens Axboe	806ecb209a	io_uring/nop: add support for IORING_SETUP_CQE_MIXED This adds support for setting IORING_NOP_CQE32 as a flag for a NOP command, in which case a 32b CQE will be posted rather than a regular one. This is the default if the ring has been setup with IORING_SETUP_CQE32. If the ring has been setup with IORING_SETUP_CQE_MIXED, then 16b CQEs will be posted without this flag set, and 32b CQEs if this flag is set. For the latter case, sqe->off is what will be posted as cqe->big_cqe[0] and sqe->addr is what will be posted as cqe->big_cqe[1]. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-08-27 11:24:15 -06:00
Jens Axboe	e26dca67fd	io_uring: add support for IORING_SETUP_CQE_MIXED Normal rings support 16b CQEs for posting completions, while certain features require the ring to be configured with IORING_SETUP_CQE32, as they need to convey more information per completion. This, in turn, makes ALL the CQEs be 32b in size. This is somewhat wasteful and inefficient, particularly when only certain CQEs need to be of the bigger variant. This adds support for setting up a ring with mixed CQE sizes, using IORING_SETUP_CQE_MIXED. When setup in this mode, CQEs posted to the ring may be either 16b or 32b in size. If a CQE is 32b in size, then IORING_CQE_F_32 is set in the CQE flags to indicate that this is the case. If this flag isn't set, the CQE is the normal 16b variant. CQEs on these types of mixed rings may also have IORING_CQE_F_SKIP set. This can happen if the ring is one (small) CQE entry away from wrapping, and an attempt is made to post a 32b CQE. As CQEs must be contigious in the CQ ring, a 32b CQE cannot wrap the ring. For this case, a single dummy CQE is posted with the SKIP flag set. The application should simply ignore those. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-08-27 11:23:57 -06:00
Jens Axboe	89a8859721	io_uring/trace: support completion tracing of mixed 32b CQEs Check for IORING_CQE_F_32 as well, not just if the ring was setup with IORING_SETUP_CQE32 to only support big CQEs. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-08-24 11:41:13 -06:00
Jens Axboe	82ceb7fcc5	io_uring/fdinfo: handle mixed sized CQEs Ensure that the CQ ring iteration handles differently sized CQEs, not just a fixed 16b or 32b size per ring. These CQEs aren't possible just yet, but prepare the fdinfo CQ ring dumping for handling them. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-08-24 11:41:13 -06:00
Jens Axboe	b69458735d	io_uring: add UAPI definitions for mixed CQE postings This adds the CQE flags related to supporting a mixed CQ ring mode, where both normal (16b) and big (32b) CQEs may be posted. No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-08-24 11:41:12 -06:00
Jens Axboe	d0201c4436	io_uring: remove io_ctx_cqe32() helper It's pretty pointless and only used for the tracing helper, get rid of it. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-08-24 11:41:12 -06:00

1 2 3 4 5 ...

1382433 Commits