In commit 934447a603 ("io_uring: do not recycle buffer in READV") a
temporary fix was put in io_kbuf_recycle to simply never recycle READV
buffers.
Instead of that, rather treat READV with REQ_F_BUFFER_SELECTED the same as
a READ with REQ_F_BUFFER_SELECTED. Since READV requires iov_len of 1 they
are essentially the same.
In order to do this inside io_prep_rw() add some validation to check that
it is in fact only length 1, and also extract the length of the buffer at
prep time.
This allows removal of the io_iov_buffer_select codepaths as they are only
used from the READV op.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220907165152.994979-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We need the poll_flags to know how to poll for the IO, and we should
have the batch structure in preparation for supporting batched
completions with iopoll.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Combine the two checks we have for task_work running and whether or not
we need to shuffle the mutex into one, so we unify how task_work is run
in the iopoll loop. This helps ensure that local task_work is run when
needed, and also optimizes that path to avoid a mutex shuffle if it's
not needed.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We have a few spots that drop the mutex just to run local task_work,
which immediately tries to grab it again. Add a helper that just passes
in whether we're locked already.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After the addition of iopoll support for passthrough, there's a bit of
a mixup here. Clean it up and get rid of the casting for the passthrough
command type.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some workloads rely on a registered eventfd (via
io_uring_register_eventfd(3)) in order to wake up and process the
io_uring.
In the case of a ring setup with IORING_SETUP_DEFER_TASKRUN, that eventfd
also needs to be signalled when there are tasks to run.
This changes an old behaviour which assumed 1 eventfd signal implied at
least 1 CQE, however only when this new flag is set (and so old users will
not notice). This should be expected with the IORING_SETUP_DEFER_TASKRUN
flag as it is not guaranteed that every task will result in a CQE.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220830125013.570060-7-dylany@fb.com
[axboe: fold in call_rcu() serialization fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allow deferring async tasks until the user calls io_uring_enter(2) with
the IORING_ENTER_GETEVENTS flag. Enable this mode with a flag at
io_uring_setup time. This functionality requires that the later
io_uring_enter will be called from the same submission task, and therefore
restrict this flag to work only when IORING_SETUP_SINGLE_ISSUER is also
set.
Being able to hand pick when tasks are run prevents the problem where
there is current work to be done, however task work runs anyway.
For example, a common workload would obtain a batch of CQEs, and process
each one. Interrupting this to additional taskwork would add latency but
not gain anything. If instead task work is deferred to just before more
CQEs are obtained then no additional latency is added.
The way this is implemented is by trying to keep task work local to a
io_ring_ctx, rather than to the submission task. This is required, as the
application will want to wake up only a single io_ring_ctx at a time to
process work, and so the lists of work have to be kept separate.
This has some other benefits like not having to check the task continually
in handle_tw_list (and potentially unlocking/locking those), and reducing
locks in the submit & process completions path.
There are networking cases where using this option can reduce request
latency by 50%. For example a contrived example using [1] where the client
sends 2k data and receives the same data back while doing some system
calls (to trigger task work) shows this reduction. The reason ends up
being that if sending responses is delayed by processing task work, then
the client side sits idle. Whereas reordering the sends first means that
the client runs it's workload in parallel with the local task work.
[1]:
Using https://github.com/DylanZA/netbench/tree/defer_run
Client:
./netbench --client_only 1 --control_port 10000 --host <host> --tx "epoll --threads 16 --per_thread 1 --size 2048 --resp 2048 --workload 1000"
Server:
./netbench --server_only 1 --control_port 10000 --rx "io_uring --defer_taskrun 0 --workload 100" --rx "io_uring --defer_taskrun 1 --workload 100"
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220830125013.570060-5-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is not needed, and it is normally better to wait for task work until
after submissions. This will allow greater batching if either work arrives
in the meanwhile, or if the submissions cause task work to be queued up.
For SQPOLL this also no longer runs task work, but this is handled inside
the SQPOLL loop anyway.
For IOPOLL io_iopoll_check will run task work anyway
And otherwise io_cqring_wait will run task work
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220830125013.570060-4-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guard wakeups that the user can trigger, and that may end up triggering a
call back into eventfd_signal. This is in addition to the current approach
that only guards in eventfd_signal.
Rename in_eventfd_signal -> in_eventfd at the same time to reflect this.
Without this there would be a deadlock in the following code using libaio:
int main()
{
struct io_context *ctx = NULL;
struct iocb iocb;
struct iocb *iocbs[] = { &iocb };
int evfd;
uint64_t val = 1;
evfd = eventfd(0, EFD_CLOEXEC);
assert(!io_setup(2, &ctx));
io_prep_poll(&iocb, evfd, POLLIN);
io_set_eventfd(&iocb, evfd);
assert(1 == io_submit(ctx, 1, iocbs));
write(evfd, &val, 8);
}
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220816135959.1490641-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull parisc architecture fixes from Helge Deller:
"Some small parisc architecture fixes for 6.0-rc6:
One patch lightens up a previous commit and thus unbreaks building the
debian kernel, which tries to configure a 64-bit kernel with the
ARCH=parisc environment variable set.
The other patches fixes asm/errno.h includes in the tools directory
and cleans up memory allocation in the iosapic driver.
Summary:
- Allow configuring 64-bit kernel with ARCH=parisc
- Fix asm/errno.h includes in tools directory for parisc and xtensa
- Clean up iosapic memory allocation
- Minor typo and spelling fixes"
* tag 'parisc-for-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Allow CONFIG_64BIT with ARCH=parisc
parisc: remove obsolete manual allocation aligning in iosapic
tools/include/uapi: Fix <asm/errno.h> for parisc and xtensa
Input: hp_sdc: fix spelling typo in comment
parisc: ccio-dma: Add missing iounmap in error path in ccio_probe()
Pull io_uring fixes from Jens Axboe:
"Nothing really major here, but figured it'd be nicer to just get these
flushed out for -rc6 so that the 6.1 branch will have them as well.
That'll make our lives easier going forward in terms of development,
and avoid trivial conflicts in this area.
- Simple trace rename so that the returned opcode name is consistent
with the enum definition (Stefan)
- Send zc rsrc request vs notification lifetime fix (Pavel)"
* tag 'io_uring-6.0-2022-09-18' of git://git.kernel.dk/linux:
io_uring/opdef: rename SENDZC_NOTIF to SEND_ZC
io_uring/net: fix zc fixed buf lifetime
Pull gpio fixes from Bartosz Golaszewski:
- fix the level-low interrupt type support in gpio-mpc8xxx
- convert another two drivers to using immutable irq chips
- MAINTAINERS update
* tag 'gpio-fixes-for-v6.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: mt7621: Make the irqchip immutable
gpio: ixp4xx: Make irqchip immutable
MAINTAINERS: Update HiSilicon GPIO Driver maintainer
gpio: mpc8xxx: Fix support for IRQ_TYPE_LEVEL_LOW flow_type in mpc85xx
Pull pin control fixes from Linus Walleij:
"Nothing special, just driver fixes:
- Fix IRQ wakeup and pins for UFS and SDC2 issues on the Qualcomm
SC8180x
- Fix the Rockchip driver to support interrupt on both rising and
falling edges.
- Name the Allwinner A100 R_PIO properly
- Fix several issues with the Ocelot interrupts"
* tag 'pinctrl-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: ocelot: Fix interrupt controller
pinctrl: sunxi: Fix name for A100 R_PIO
pinctrl: rockchip: Enhance support for IRQ_TYPE_EDGE_BOTH
pinctrl: qcom: sc8180x: Fix wrong pin numbers
pinctrl: qcom: sc8180x: Fix gpio_wakeirq_map
Pull block fixes from Jens Axboe:
"Two fixes for -rc6:
- Fix a mixup of sectors and bytes in the secure erase ioctl
(Mikulas)
- Fix for a bad return value for a non-blocking bio/blk queue enter
call (me)"
* tag 'block-6.0-2022-09-16' of git://git.kernel.dk/linux-block:
blk-lib: fix blkdev_issue_secure_erase
block: blk_queue_enter() / __bio_queue_enter() must return -EAGAIN for nowait
Pull io_uring fixes from Jens Axboe:
"Two small patches:
- Fix using an unsigned type for the return value, introduced in this
release (Pavel)
- Stable fix for a missing check for a fixed file on put (me)"
* tag 'io_uring-6.0-2022-09-16' of git://git.kernel.dk/linux-block:
io_uring/msg_ring: check file type before putting
io_uring/rw: fix error'ed retry return values
Pull drm fixes from Dave Airlie:
"This is the regular drm fixes pull.
The i915 and misc fixes are fairly regular, but the amdgpu contains
fixes for new hw blocks, the dcn314 specific path hookups and also has
a bunch of fixes for clang stack size warnings which are a bit churny
but fairly straightforward. This means it looks a little larger than
usual.
amdgpu:
- BACO fixes for some RDNA2 boards
- PCI AER fixes uncovered by a core PCI change
- Properly hook up dirtyfb helper
- RAS fixes for GC 11.x
- TMR fix
- DCN 3.2.x fixes
- DCN 3.1.4 fixes
- LLVM DML stack size fixes
i915:
- Revert a display patch around max DP source rate now that the
proper WaEdpLinkRateDataReload is in place
- Fix perf limit reasons bit position
- Fix unclaimmed mmio registers on suspend flow with GuC
- A vma_move_to_active fix for a regression with video decoding
- DP DSP fix
gma500:
- Locking and IRQ fixes
meson:
- OSD1 display fixes
panel-edp:
- Fix Innolux timings
rockchip:
- DP/HDMI fixes"
* tag 'drm-fixes-2022-09-16' of git://anongit.freedesktop.org/drm/drm: (42 commits)
drm/amdgpu: make sure to init common IP before gmc
drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega
drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega
drm/rockchip: Fix return type of cdn_dp_connector_mode_valid
drm/amd/display: Mark dml30's UseMinimumDCFCLK() as noinline for stack usage
drm/amd/display: Reduce number of arguments of dml31's CalculateFlipSchedule()
drm/amd/display: Reduce number of arguments of dml31's CalculateWatermarksAndDRAMSpeedChangeSupport()
drm/amd/display: Reduce number of arguments of dml32_CalculatePrefetchSchedule()
drm/amd/display: Reduce number of arguments of dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport()
drm/amd/display: Refactor SubVP calculation to remove FPU
drm/amd/display: Limit user regamma to a valid value
drm/amd/display: add workaround for subvp cursor corruption for DCN32/321
drm/amd/display: SW cursor fallback for SubVP
drm/amd/display: Round cursor width up for MALL allocation
drm/amd/display: Correct dram channel width for dcn314
drm/amd/display: Relax swizzle checks for video non-RGB formats on DCN314
drm/amd/display: Hook up DCN314 specific dml implementation
drm/amd/display: Enable dlg and vba compilation for dcn314
drm/amd/display: Fix compilation errors on DCN314
drm/amd/display: Fix divide by zero in DML
...
Pull cifs fixes from Steve French:
"Four smb3 fixes for stable:
- important fix to revalidate mapping when doing direct writes
- missing spinlock
- two fixes to socket handling
- trivial change to update internal version number for cifs.ko"
* tag '6.0-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module number
cifs: add missing spinlock around tcon refcount
cifs: always initialize struct msghdr smb_msg completely
cifs: don't send down the destination address to sendmsg for a SOCK_STREAM
cifs: revalidate mapping when doing direct writes
- Revert a display patch around max DP source rate now
that the proper WaEdpLinkRateDataReload is in place. (Ville)
- Fix perf limit reasons bit position. (Ashutosh)
- Fix unclaimmed mmio registers on suspend flow with GuC. (Umesh)
- A vma_move_to_active fix for a regression with video decoding. (Nirmoy)
- DP DSP fix. (Ankit)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YyMtmGMXRLsURoM5@intel.com
If we're invoked with a fixed file, follow the normal rules of not
calling io_fput_file(). Fixed files are permanently registered to the
ring, and do not need putting separately.
Cc: stable@vger.kernel.org
Fixes: aa184e8671 ("io_uring: don't attempt to IOPOLL for MSG_RING requests")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There's a bug in blkdev_issue_secure_erase. The statement
"unsigned int len = min_t(sector_t, nr_sects, max_sectors);"
sets the variable "len" to the length in sectors, but the statement
"bio->bi_iter.bi_size = len" treats it as if it were in bytes.
The statements "sector += len << SECTOR_SHIFT" and "nr_sects -= len <<
SECTOR_SHIFT" are thinko.
This patch fixes it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v5.19
Fixes: 44abff2c0b ("block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD")
Link: https://lore.kernel.org/r/alpine.LRH.2.02.2209141549480.28100@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The previous patch triggered a build failure for the debian kernel,
which has CONFIG_64BIT enabled, uses the CROSS_COMPILER environment
variable and uses ARCH=parisc to configure the kernel for 64-bit
support.
This patch weakens the previous patch while keeping the recommended way
to configure the kernel with:
ARCH=parisc -> build 32-bit kernel
ARCH=parisc64 -> build 64-bit kernel
while adding the possibility for debian to configure a 64-bit kernel
even if ARCH=parisc is set (PA8X00 CPU has to be selected and
CONFIG_64BIT needs to be enabled).
The downside of this patch is, that we now have a small window open
again where people may get it wrong: if they enable CONFIG_64BIT and try
to compile with a 32-bit compiler.
Fixes: 3dcfb729b5 ("parisc: Make CONFIG_64BIT available for ARCH=parisc64 only")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # 5.15+
kmalloc() returns memory with __assume_kmalloc_alignment, which is
__alignof__(unsigned long long) for parisc.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Helge Deller <deller@gmx.de>
Move common IP init before GMC init so that HDP gets
remapped before GMC init which uses it.
This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373
The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.
Fixes: 8795e182b0 ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org