Some requests require being run async as they do not support
non-blocking. Instead of trying to issue these requests, getting -EAGAIN
and then queueing them for async issue, rather just force async upfront.
Add WARN_ON_ONCE to make sure surprising code paths do not come up,
however in those cases the bug would end up being a blocking
io_uring_enter(2) which should not be critical.
Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20230127135227.3646353-3-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If CONFIG_PREEMPT_NONE is set and the task_work chains are long, we
could be running into issues blocking others for too long. Add a
reschedule check in handle_tw_list(), and flush the ctx if we need to
reschedule.
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This generates better code for me, avoiding an extra load on arm64, and
both call sites already have this variable available for easy passing.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Every io_uring request is represented by struct io_kiocb, which is
cached locally by io_uring (not SLAB/SLUB) in the list called
submit_state.freelist. This patch simply enabled KASAN for this free
list.
This list is initially created by KMEM_CACHE, but later, managed by
io_uring. This patch basically poisons the objects that are not used
(i.e., they are the free list), and unpoisons it when the object is
allocated/removed from the list.
Touching these poisoned objects while in the freelist will cause a KASAN
warning.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If TIF_NOTIFY_RESUME is set, then we need to call resume_user_mode_work()
for PF_IO_WORKER threads. They never return to usermode, hence never get
a chance to process any items that are marked by this flag. Most notably
this includes the final put of files, but also any throttling markers set
by block cgroups.
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the target ring is using IORING_SETUP_SINGLE_ISSUER and we're posting
a message from a different thread, then we need to ensure that the
fallback task_work that posts the CQE knwos about the flags passing as
well. If not we'll always be posting 0 as the flags.
Fixes: 3563d7ed58a5 ("io_uring/msg_ring: Pass custom flags to the cqe")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch removes some "cold" fields from `struct io_issue_def`.
The plan is to keep only highly used fields into `struct io_issue_def`, so,
it may be hot in the cache. The hot fields are basically all the bitfields
and the callback functions for .issue and .prep.
The other less frequently used fields are now located in a secondary and
cold struct, called `io_cold_def`.
This is the size for the structs:
Before: io_issue_def = 56 bytes
After: io_issue_def = 24 bytes; io_cold_def = 40 bytes
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20230112144411.2624698-2-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The current io_op_def struct is becoming huge and the name is a bit
generic.
The goal of this patch is to rename this struct to `io_issue_def`. This
struct will contain the hot functions associated with the issue code
path.
For now, this patch only renames the structure, and an upcoming patch
will break up the structure in two, moving the non-issue fields to a
secondary struct.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20230112144411.2624698-1-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We needed fake nodes in __io_run_local_work() and to avoid unecessary wake
ups while the task already running task_works, but we don't need them
anymore since wake ups are protected by cq_waiting, which is always
cleared by the time we're executing deferred task_work items.
Note that because of loose sync around cq_waiting clearing
io_req_local_work_add() may wake the task more than once, but that's
fine and should be rare to not hurt perf.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8839534891f0a2f1076e78554a31ea7e099f7de5.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Flush completions is done either from the submit syscall or by the
task_work, both are in the context of the submitter task, and when it
goes for a single threaded rings like implied by ->task_complete, there
won't be any waiters on ->cq_wait but the master task. That means that
there can be no tasks sleeping on cq_wait while we run
__io_submit_flush_completions() and so waking up can be skipped.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/60ad9768ec74435a0ddaa6eec0ffa7729474f69f.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Even though io_poll_wq_wake()'s waitqueue_active reuses a barrier we do
for another waitqueue, it's not going to be the case in the future and
so we want to have a fast path for it when the ring has never been
polled.
Move poll_wq wake ups into __io_commit_cqring_flush() using a new flag
called ->poll_activated. The idea behind the flag is to set it when the
ring was polled for the first time. This requires additional sync to not
miss events, which is done here by using task_work for ->task_complete
rings, and by default enabling the flag for all other types of rings.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/060785e8e9137a920b232c0c7f575b131af19cac.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch adds a new flag (IORING_MSG_RING_FLAGS_PASS) in the message
ring operations (IORING_OP_MSG_RING). This new flag enables the sender
to specify custom flags, which will be copied over to cqe->flags in the
receiving ring. These custom flags should be specified using the
sqe->file_index field.
This mechanism provides additional flexibility when sending messages
between rings.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230103160507.617416-1-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_cqring_wait_schedule() is called after we started waiting on the cq
wq and set the state to TASK_INTERRUPTIBLE, for that reason we have to
constantly worry whether we has returned the state back to running or
not. Leave only quick checks in io_cqring_wait_schedule() and move the
rest including running task work to the callers. Note, we run tw in the
loop after the sched checks because of the fast path in the beginning of
the function.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2814fabe75e2e019e7ca43ea07daa94564349805.1672916894.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull irq fix from Borislav Petkov:
- Cleanup the firmware node for the new IRQ MSI domain properly, to
avoid leaking memory
* tag 'irq_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/msi: Free the fwnode created by msi_create_device_irq_domain()
Pull x86 fixes from Borislav Petkov:
- Start checking for -mindirect-branch-cs-prefix clang support too now
that LLVM 16 will support it
- Fix a NULL ptr deref when suspending with Xen PV
- Have a SEV-SNP guest check explicitly for features enabled by the
hypervisor and fail gracefully if some are unsupported by the guest
instead of failing in a non-obvious and hard-to-debug way
- Fix a MSI descriptor leakage under Xen
- Mark Xen's MSI domain as supporting MSI-X
- Prevent legacy PIC interrupts from being resent in software by
marking them level triggered, as they should be, which lead to a NULL
ptr deref
* tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only block
acpi: Fix suspend with Xen PV
x86/sev: Add SEV-SNP guest feature negotiation support
x86/pci/xen: Fixup fallout from the PCI/MSI overhaul
x86/pci/xen: Set MSI_FLAG_PCI_MSIX support in Xen MSI domain
x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL
Pull input fixes from Dmitry Torokhov:
- touchpads on HP 15-* laptops switched back to PS/2 emulation mode
- a quirk for Clevo PCX0DX/TUXEDO XP1511 to make sure keyboard is
responding after resume
* tag 'input-for-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add Clevo PCX0DX to i8042 quirk table
Revert "Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode"