linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-22 08:35:10 -04:00

Author	SHA1	Message	Date
Pavel Begunkov	c3b490930d	io_uring: don't use complete_post in kbuf Now we're handling IOPOLL completions more generically, get rid uses of _post() and send requests through the normal path. It may have some extra mertis performance wise, but we don't care much as there is a better interface for selected buffers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4deded706587f55b006dc33adf0c13cfc3b2319f.1669310258.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-25 06:11:15 -07:00
Dylan Yudaken	10d8bc3541	io_uring: spelling fix s/pushs/pushes/ Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221125103412.1425305-3-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-25 06:10:46 -07:00
Dylan Yudaken	27f35fe909	io_uring: remove io_req_complete_post_tw It's only used in one place. Inline it. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221125103412.1425305-2-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-25 06:10:46 -07:00
Dylan Yudaken	9a6924519e	io_uring: allow multishot polled reqs to defer completion Until now there was no reason for multishot polled requests to defer completions as there was no functional difference. However now this will actually defer the completions, for a performance win. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-10-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-25 06:10:04 -07:00
Dylan Yudaken	b529c96a89	io_uring: remove overflow param from io_post_aux_cqe The only call sites which would not allow overflow are also call sites which would use the io_aux_cqe as they care about ordering. So remove this parameter from io_post_aux_cqe. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-9-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-25 06:10:04 -07:00
Dylan Yudaken	2e2ef4a1da	io_uring: add lockdep assertion in io_fill_cqe_aux Add an assertion for the completion lock to io_fill_cqe_aux Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-8-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-25 06:10:04 -07:00
Dylan Yudaken	a77ab745f2	io_uring: make io_fill_cqe_aux static This is only used in io_uring.c Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-7-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-25 06:10:04 -07:00
Dylan Yudaken	9b8c54755a	io_uring: add io_aux_cqe which allows deferred completion Use the just introduced deferred post cqe completion state when possible in io_aux_cqe. If not possible fallback to io_post_aux_cqe. This introduces a complication because of allow_overflow. For deferred completions we cannot know without locking the completion_lock if it will overflow (and even if we locked it, another post could sneak in and cause this cqe to be in overflow). However since overflow protection is mostly a best effort defence in depth to prevent infinite loops of CQEs for poll, just checking the overflow bit is going to be good enough and will result in at most 16 (array size of deferred cqes) overflows. Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-6-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-25 06:10:04 -07:00
Dylan Yudaken	931147ddfa	io_uring: allow defer completion for aux posted cqes Multishot ops cannot use the compl_reqs list as the request must stay in the poll list, but that means they need to run each completion without benefiting from batching. Here introduce batching infrastructure for only small (ie 16 byte) CQEs. This restriction is ok because there are no use cases posting 32 byte CQEs. In the ring keep a batch of up to 16 posted results, and flush in the same way as compl_reqs. 16 was chosen through experimentation on a microbenchmark ([1]), as well as trying not to increase the size of the ring too much. This increases the size to 1472 bytes from 1216. [1]: `9ac66b36bc` Run with $ make -j && ./benchmark/reg.b -s 1 -t 2000 -r 10 Gives results: baseline 8309 k/s 8 18807 k/s 16 19338 k/s 32 20134 k/s Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-5-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-25 06:10:04 -07:00
Dylan Yudaken	973fc83f3a	io_uring: defer all io_req_complete_failed All failures happen under lock now, and can be deferred. To be consistent when the failure has happened after some multishot cqe has been deferred (and keep ordering), always defer failures. To make this obvious at the caller (and to help prevent a future bug) rename io_req_complete_failed to io_req_defer_failed. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-4-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-25 06:10:04 -07:00
Dylan Yudaken	c06c6c5d27	io_uring: always lock in io_apoll_task_func This is required for the failure case (io_req_complete_failed) and is missing. The alternative would be to only lock in the failure path, however all of the non-error paths in io_poll_check_events that do not do not return IOU_POLL_NO_ACTION end up locking anyway. The only extraneous lock would be for the multishot poll overflowing the CQE ring, however multishot poll would probably benefit from being locked as it will allow completions to be batched. So it seems reasonable to lock always. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-3-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-25 06:10:04 -07:00
Pavel Begunkov	2dac1a1592	io_uring: remove iopoll spinlock This reverts commit `2ccc92f4ef` io_req_complete_post() should now behave well even in case of IOPOLL, we can remove completion_lock locking. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7e171c8b530656b14a671c59100ca260e46e7f2a.1669203009.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-23 10:47:07 -07:00
Pavel Begunkov	1bec951c38	io_uring: iopoll protect complete_post io_req_complete_post() may be used by iopoll enabled rings, grab locks in this case. That requires to pass issue_flags to propagate the locking state. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cc6d854065c57c838ca8e8806f707a226b70fd2d.1669203009.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-23 10:45:31 -07:00
Pavel Begunkov	fa18fa2272	io_uring: inline __io_req_complete_put() Inline __io_req_complete_put() into io_req_complete_post(), there are no other users. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1923a4dfe80fa877f859a22ed3df2d5fc8ecf02b.1669203009.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-23 10:44:01 -07:00
Pavel Begunkov	833b5dfffc	io_uring: remove io_req_tw_post_queue Remove io_req_tw_post() and io_req_tw_post_queue(), we can use io_req_task_complete() instead. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b9b73c08022c7f1457023ac841f35c0100e70345.1669203009.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-23 10:44:00 -07:00
Pavel Begunkov	624fd779fd	io_uring: use io_req_task_complete() in timeout Use a more generic io_req_task_complete() in timeout completion task_work instead of io_req_complete_post(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bda1710b58c07bf06107421c2a65c529ea9cdcac.1669203009.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-23 10:44:00 -07:00
Pavel Begunkov	e276ae344a	io_uring: hold locks for io_req_complete_failed A preparation patch, make sure we always hold uring_lock around io_req_complete_failed(). The only place deviating from the rule is io_cancel_defer_files(), queue a tw instead. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/70760344eadaecf2939287084b9d4ba5c05a6984.1669203009.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-23 10:44:00 -07:00
Pavel Begunkov	2ccc92f4ef	io_uring: add completion locking for iopoll There are pieces of code that may allow iopoll to race filling cqes, temporarily add spinlocking around posting events. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/84d86b5c117feda075471c5c9e65208e0dccf5d0.1669203009.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-23 10:44:00 -07:00
Jens Axboe	6c16fe3c16	io_uring: kill io_cqring_ev_posted() and __io_cq_unlock_post() __io_cq_unlock_post() is identical to io_cq_unlock_post(), and io_cqring_ev_posted() has a single caller so migth as well just inline it there. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-22 06:09:30 -07:00
Jens Axboe	4061f0ef73	Revert "io_uring: disallow self-propelled ring polling" This reverts commit `7fdbc5f014`. This patch dealt with a subset of the real problem, which is a potential circular dependency on the wakup path for io_uring itself. Outside of io_uring, eventfd can also trigger this (see details in `03e02acda8`) and so can epoll (see details in `caf1aeaffc`). Now that we have a generic solution to this problem, get rid of the io_uring specific work-around. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-22 06:08:31 -07:00
Jens Axboe	4464853277	io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups Pass in EPOLL_URING_WAKE when signaling eventfd or doing poll related wakups, so that we can check for a circular event dependency between eventfd and epoll. If this flag is set when our wakeup handlers are called, then we know we have a dependency that needs to terminate multishot requests. eventfd and epoll are the only such possible dependencies. Cc: stable@vger.kernel.org # 6.0 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-22 06:08:31 -07:00
Pavel Begunkov	f9d567c75e	io_uring: inline __io_req_complete_post() There is only one user of __io_req_complete_post(), inline it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ef4c9059950a3da5cf68df00f977f1fd13bd9306.1668597569.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:45:19 -07:00
Pavel Begunkov	d759360620	io_uring: split tw fallback into a function When the target process is dying and so task_work_add() is not allowed we push all task_work item to the fallback workqueue. Move the part responsible for moving tw items out of __io_req_task_work_add() into a separate function. Makes it a bit cleaner and gives the compiler a bit of extra info. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e503dab9d7af95470ca6b214c6de17715ae4e748.1668162751.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:44:21 -07:00
Pavel Begunkov	e52d2e583e	io_uring: inline io_req_task_work_add() __io_req_task_work_add() is huge but marked inline, that makes compilers to generate lots of garbage. Inline the wrapper caller io_req_task_work_add() instead. before and after: text data bss dec hex filename 47347 16248 8 63603 f873 io_uring/io_uring.o text data bss dec hex filename 45303 16248 8 61559 f077 io_uring/io_uring.o Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/26dc8c28ca0160e3269ef3e55c5a8b917c4d4450.1668162751.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:44:18 -07:00
Lin Ma	23a6c9ac4d	io_uring: update outdated comment of callbacks Previous commit `ebc11b6c6b` ("io_uring: clean io-wq callbacks") rename io_free_work() into io_wq_free_work() for consistency. This patch also updates relevant comment to avoid misunderstanding. Fixes: `ebc11b6c6b` ("io_uring: clean io-wq callbacks") Signed-off-by: Lin Ma <linma@zju.edu.cn> Link: https://lore.kernel.org/r/20221110122103.20120-1-linma@zju.edu.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:44:16 -07:00
Lin Ma	cd42a53d25	io_uring/poll: remove outdated comments of caching Previous commit `13a99017ff` ("io_uring: remove events caching atavisms") entirely removes the events caching optimization introduced by commit `81459350d5` ("io_uring: cache req->apoll->events in req->cflags"). Hence the related comment should also be removed to avoid misunderstanding. Fixes: `13a99017ff` ("io_uring: remove events caching atavisms") Signed-off-by: Lin Ma <linma@zju.edu.cn> Link: https://lore.kernel.org/r/20221110060313.16303-1-linma@zju.edu.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:44:14 -07:00
Dylan Yudaken	e2ad599d1e	io_uring: allow multishot recv CQEs to overflow With commit `aa1df3a360` ("io_uring: fix CQE reordering"), there are stronger guarantees for overflow ordering. Specifically ensuring that userspace will not receive out of order receive CQEs. Therefore this is not needed any more for recv/recvmsg. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221107125236.260132-4-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:44:09 -07:00
Dylan Yudaken	515e269612	io_uring: revert "io_uring fix multishot accept ordering" This is no longer needed after commit `aa1df3a360` ("io_uring: fix CQE reordering"), since all reordering is now taken care of. This reverts commit `cbd2574854` ("io_uring: fix multishot accept ordering"). Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221107125236.260132-2-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:43:42 -07:00
Dylan Yudaken	ef67fcb41d	io_uring: do not always force run task_work in io_uring_register Running task work when not needed can unnecessarily delay operations. Specifically IORING_SETUP_DEFER_TASKRUN tries to avoid running task work until the user requests it. Therefore do not run it in io_uring_register any more. The one catch is that io_rsrc_ref_quiesce expects it to have run in order to process all outstanding references, and so reorder it's loop to do this. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221107123349.4106213-1-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:38:54 -07:00
Xinghui Li	df730ec21f	io_uring: fix two assignments in if conditions Fixes two errors: "ERROR: do not use assignment in if condition 130: FILE: io_uring/net.c:130: + if (!(issue_flags & IO_URING_F_UNLOCKED) && ERROR: do not use assignment in if condition 599: FILE: io_uring/poll.c:599: + } else if (!(issue_flags & IO_URING_F_UNLOCKED) &&" reported by checkpatch.pl in net.c and poll.c . Signed-off-by: Xinghui Li <korantli@tencent.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20221102082503.32236-1-korantwork@gmail.com [axboe: style tweaks] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:38:31 -07:00
Pavel Begunkov	42385b02ba	io_uring/net: move mm accounting to a slower path We can also move mm accounting to the extended callbacks. It removes a few cycles from the hot path including skipping one function call and setting io_req_task_complete as a callback directly. For user backed I/O it shouldn't make any difference taking into considering atomic mm accounting and page pinning. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1062f270273ad11c1b7b45ec59a6a317533d5e64.1667557923.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:38:31 -07:00
Pavel Begunkov	40725d1b96	io_uring: move zc reporting from the hot path Add custom tw and notif callbacks on top of usual bits also handling zc reporting. That moves it from the hot path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/40de4a6409042478e1f35adc4912e23226cb1b5c.1667557923.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:38:31 -07:00
Pavel Begunkov	bedd20bcf3	io_uring/net: inline io_notif_flush() io_notif_flush() is pretty simple, we can inline it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/332359e7bd124138dfe51340bbec829c9b265c18.1667557923.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:38:31 -07:00
Pavel Begunkov	7fa8e84192	io_uring/net: rename io_uring_tx_zerocopy_callback Just a simple renaming patch, io_uring_tx_zerocopy_callback() is too bulky and doesn't follow usual naming style. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/24d78325403ca6dcb1ec4bced1e33cacc9b832a5.1667557923.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:38:31 -07:00
Pavel Begunkov	fc1dd0d4fa	io_uring/net: preset notif tw handler We're going to have multiple notification tw functions. In preparation for future changes default the tw callback in advance so later we can replace it with other versions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7acdbea5e20eadd844513320cd454af14ba50f64.1667557923.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:38:31 -07:00
Pavel Begunkov	5bc8e8884b	io_uring/net: remove extra notif rsrc setup io_send_zc_prep() sets up notification's rsrc_node when needed, don't unconditionally install it on notif alloc. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dbe4875ac33e180b9799d8537a5e27935e82aac4.1667557923.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:38:31 -07:00
Pavel Begunkov	3671163beb	io_uring: move kbuf put out of generic tw complete There are multiple users of io_req_task_complete() including zc notifications, but only read requests use selected buffers. As we already have an rw specific tw function, move io_put_kbuf() in there. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/94374c7649aaefc3a17808dc4701f25ccd457e25.1667557923.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:38:31 -07:00
Stefan Metzmacher	e307e66981	io_uring/net: introduce IORING_SEND_ZC_REPORT_USAGE flag It might be useful for applications to detect if a zero copy transfer with SEND[MSG]_ZC was actually possible or not. The application can fallback to plain SEND[MSG] in order to avoid the overhead of two cqes per request. Or it can generate a log message that could indicate to an administrator that no zero copy was possible and could explain degraded performance. Cc: stable@vger.kernel.org # 6.1 Link: https://lore.kernel.org/io-uring/fb6a7599-8a9b-15e5-9b64-6cd9d01c6ff4@gmail.com/T/#m2b0d9df94ce43b0e69e6c089bdff0ce6babbdfaa Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8945b01756d902f5d5b0667f20b957ad3f742e5e.1666895626.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-21 07:38:31 -07:00
Pavel Begunkov	7fdbc5f014	io_uring: disallow self-propelled ring polling When we post a CQE we wake all ring pollers as it normally should be. However, if a CQE was generated by a multishot poll request targeting its own ring, it'll wake that request up, which will make it to post a new CQE, which will wake the request and so on until it exhausts all CQ entries. Don't allow multishot polling io_uring files but downgrade them to oneshots, which was always stated as a correct behaviour that the userspace should check for. Cc: stable@vger.kernel.org Fixes: `aa43477b04` ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3124038c0e7474d427538c2d915335ec28c92d21.1668785722.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-18 09:29:31 -07:00
Pavel Begunkov	100d6b17c0	io_uring: fix multishot recv request leaks Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: 1300ebb20286b ("io_uring: multishot recv") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/37762040ba9c52b81b92a2f5ebfd4ee484088951.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-17 12:33:33 -07:00
Pavel Begunkov	9148286476	io_uring: fix multishot accept request leaks Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: `390ed29b5e` ("io_uring: add IORING_ACCEPT_MULTISHOT for accept") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-17 12:33:33 -07:00
Pavel Begunkov	539bcb57da	io_uring: fix tw losing poll events We may never try to process a poll wake and its mask if there was multiple wake ups racing for queueing up a tw. Force io_poll_check_events() to update the mask by vfs_poll(). Cc: stable@vger.kernel.org Fixes: `aa43477b04` ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/00344d60f8b18907171178d7cf598de71d127b0b.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-17 12:33:33 -07:00
Pavel Begunkov	b98186aee2	io_uring: update res mask in io_poll_check_events When io_poll_check_events() collides with someone attempting to queue a task work, it'll spin for one more time. However, it'll continue to use the mask from the first iteration instead of updating it. For example, if the first wake up was a EPOLLIN and the second EPOLLOUT, the userspace will not get EPOLLOUT in time. Clear the mask for all subsequent iterations to force vfs_poll(). Cc: stable@vger.kernel.org Fixes: `aa43477b04` ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2dac97e8f691231049cb259c4ae57e79e40b537c.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-17 12:33:33 -07:00
Pavel Begunkov	12e4e8c7ab	io_uring/rw: enable bio caches for IRQ rw Now we can use IOCB_ALLOC_CACHE not only for iopoll'ed reads/write but also for normal IRQ driven I/O. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fb8bd092ed5a4a3b037e84e4777074d07aa5639a.1667384020.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-16 09:44:27 -07:00
Pavel Begunkov	5576035f15	io_uring/poll: lockdep annote io_poll_req_insert_locked Add a lockdep annotation in io_poll_req_insert_locked(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8115d8e702733754d0aea119e9b5bb63d1eb8b24.1668184658.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-11 09:59:27 -07:00
Pavel Begunkov	30a33669fa	io_uring/poll: fix double poll req->flags races io_poll_double_prepare() \| io_poll_wake() \| poll->head = NULL smp_load(&poll->head); /* NULL */ \| flags = req->flags; \| \| req->flags &= ~SINGLE_POLL; req->flags = flags \| DOUBLE_POLL \| The idea behind io_poll_double_prepare() is to serialise with the first poll entry by taking the wq lock. However, it's not safe to assume that io_poll_wake() is not running when we can't grab the lock and so we may race modifying req->flags. Skip double poll setup if that happens. It's ok because the first poll entry will only be removed when it's definitely completing, e.g. pollfree or oneshot with a valid mask. Fixes: `49f1c68e04` ("io_uring: optimise submission side poll_refs") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b7fab2d502f6121a7d7b199fe4d914a43ca9cdfd.1668184658.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-11 09:59:27 -07:00
Jens Axboe	3851d25c75	io_uring: check for rollover of buffer ID when providing buffers We already check if the chosen starting offset for the buffer IDs fit within an unsigned short, as 65535 is the maximum value for a provided buffer. But if the caller asks to add N buffers at offset M, and M + N would exceed the size of the unsigned short, we simply add buffers with wrapping around the ID. This is not necessarily a bug and could in fact be a valid use case, but it seems confusing and inconsistent with the initial check for starting offset. Let's check for wrap consistently, and error the addition if we do need to wrap. Reported-by: Olivier Langlois <olivier@trillion01.com> Link: https://github.com/axboe/liburing/issues/726 Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-10 11:07:41 -07:00
Dylan Yudaken	0fc8c2acbf	io_uring: calculate CQEs from the user visible value io_cqring_wait (and it's wake function io_has_work) used cached_cq_tail in order to calculate the number of CQEs. cached_cq_tail is set strictly before the user visible rings->cq.tail However as far as userspace is concerned, if io_uring_enter(2) is called with a minimum number of events, they will verify by checking rings->cq.tail. It is therefore possible for io_uring_enter(2) to return early with fewer events visible to the user. Instead make the wait functions read from the user visible value, so there will be no discrepency. This is triggered eventually by the following reproducer: struct io_uring_sqe sqe; struct io_uring_cqe cqe; unsigned int cqe_ready; struct io_uring ring; int ret, i; ret = io_uring_queue_init(N, &ring, 0); assert(!ret); while(true) { for (i = 0; i < N; i++) { sqe = io_uring_get_sqe(&ring); io_uring_prep_nop(sqe); sqe->flags \|= IOSQE_ASYNC; } ret = io_uring_submit(&ring); assert(ret == N); do { ret = io_uring_wait_cqes(&ring, &cqe, N, NULL, NULL); } while(ret == -EINTR); cqe_ready = io_uring_cq_ready(&ring); assert(!ret); assert(cqe_ready == N); io_uring_cq_advance(&ring, N); } Fixes: `ad3eb2c89f` ("io_uring: split overflow state into SQ and CQ side") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221108153016.1854297-1-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-08 10:36:15 -07:00
Christian Brauner	5a6f52d20c	acl: conver higher-level helpers to rely on mnt_idmap Convert an initial portion to rely on struct mnt_idmap by converting the high level xattr helpers. Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-10-31 17:48:12 +01:00
Dylan Yudaken	b3026767e1	io_uring: unlock if __io_run_local_work locked inside It is possible for tw to lock the ring, and this was not propogated out to io_run_local_work. This can cause an unlock to be missed. Instead pass a pointer to locked into __io_run_local_work. Fixes: `8ac5d85a89` ("io_uring: add local task_work run helper that is entered locked") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221027144429.3971400-3-dylany@meta.com [axboe: WARN_ON() -> WARN_ON_ONCE() and add a minor comment] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-27 09:52:12 -06:00

... 28 29 30 31 32 ...

1825 Commits