Commit Graph

1412393 Commits

Author SHA1 Message Date
Michael S. Tsirkin
d08fda2cf2 virtio_input: use virtqueue_add_inbuf_cache_clean for events
The evts array contains 64 small (8-byte) input events that share
cachelines with each other. When CONFIG_DMA_API_DEBUG is enabled,
this can trigger warnings about overlapping DMA mappings within
the same cacheline.

Previous patch isolated the array in its own cachelines,
so the warnings are now spurious.

Use virtqueue_add_inbuf_cache_clean() to indicate that the CPU does not
write into these cache lines, suppressing these warnings.

Message-ID: <4c885b4046323f68cf5cadc7fbfb00216b11dd20.1767601130.git.mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2026-01-08 09:54:27 -05:00
Michael S. Tsirkin
bd2b617c49 virtio-rng: fix DMA alignment for data buffer
The data buffer in struct virtrng_info is used for DMA_FROM_DEVICE via
virtqueue_add_inbuf() and shares cachelines with the adjacent
CPU-written fields (data_avail, data_idx).

The device writing to the DMA buffer and the CPU writing to adjacent
fields could corrupt each other's data on non-cache-coherent platforms.

Add __dma_from_device_group_begin()/end() annotations to place these
in distinct cache lines.

Message-ID: <157a63b6324d1f1307ddd4faa3b62a8b90a79423.1767601130.git.mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2026-01-08 09:54:27 -05:00
Michael S. Tsirkin
2678369e8e virtio_scsi: fix DMA cacheline issues for events
Current struct virtio_scsi_event_node layout has two problems:

The event (DMA_FROM_DEVICE) and work (CPU-written via
INIT_WORK/queue_work) fields share a cacheline.
On non-cache-coherent platforms, CPU writes to work can
corrupt device-written event data.

If ARCH_DMA_MINALIGN is large enough, the 8 events in event_list share
cachelines, triggering CONFIG_DMA_API_DEBUG warnings.

Fix the corruption by moving event buffers to a separate array and
aligning using __dma_from_device_group_begin()/end().

Suppress the (now spurious) DMA debug warnings using
virtqueue_add_inbuf_cache_clean().

Message-ID: <8801aeef7576a155299f19b6887682dd3a272aba.1767601130.git.mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2026-01-08 09:54:27 -05:00
Michael S. Tsirkin
95c7b0ad6c virtio_input: fix DMA alignment for evts
On non-cache-coherent platforms, when a structure contains a buffer
used for DMA alongside fields that the CPU writes to, cacheline sharing
can cause data corruption.

The evts array is used for DMA_FROM_DEVICE operations via
virtqueue_add_inbuf(). The adjacent lock and ready fields are written
by the CPU during normal operation. If these share cachelines with evts,
CPU writes can corrupt DMA data.

Add __dma_from_device_group_begin()/end() annotations to ensure evts is
isolated in its own cachelines.

Message-ID: <cd328233198a76618809bb5cd9a6ddcaa603a8a1.1767601130.git.mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2026-01-08 09:54:26 -05:00
Michael S. Tsirkin
db191ba0c8 vsock/virtio: use virtqueue_add_inbuf_cache_clean for events
The event_list array contains 8 small (4-byte) events that share
cachelines with each other. When CONFIG_DMA_API_DEBUG is enabled,
this can trigger warnings about overlapping DMA mappings within
the same cacheline.

The previous patch isolated event_list in its own cache lines
so the warnings are spurious.

Use virtqueue_add_inbuf_cache_clean() to indicate that the CPU does not
write into these fields, suppressing the warnings.

Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Message-ID: <4b5bf63a7ebb782d87f643466b3669df567c9fe1.1767601130.git.mst@redhat.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2026-01-08 09:54:26 -05:00
Michael S. Tsirkin
63dfad0517 vsock/virtio: fix DMA alignment for event_list
On non-cache-coherent platforms, when a structure contains a buffer
used for DMA alongside fields that the CPU writes to, cacheline sharing
can cause data corruption.

The event_list array is used for DMA_FROM_DEVICE operations via
virtqueue_add_inbuf(). The adjacent event_run and guest_cid fields are
written by the CPU while the buffer is available, so mapped for the
device. If these share cachelines with event_list, CPU writes can
corrupt DMA data.

Add __dma_from_device_group_begin()/end() annotations to ensure event_list
is isolated in its own cachelines.

Message-ID: <f19ebd74f70c91cab4b0178df78cf6a6e107a96b.1767601130.git.mst@redhat.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2026-01-08 09:54:19 -05:00
Michael S. Tsirkin
5fc6dd158e virtio: add virtqueue_add_inbuf_cache_clean API
Add virtqueue_add_inbuf_cache_clean() for passing DMA_ATTR_CPU_CACHE_CLEAN
to virtqueue operations. This suppresses DMA debug cacheline overlap
warnings for buffers where proper cache management is ensured by the
caller.

Message-ID: <e50d38c974859e731e50bda7a0ee5691debf5bc4.1767601130.git.mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2026-01-02 06:22:49 -05:00
Michael S. Tsirkin
d5d8465131 dma-debug: track cache clean flag in entries
If a driver is buggy and has 2 overlapping mappings but only
sets cache clean flag on the 1st one of them, we warn.
But if it only does it for the 2nd one, we don't.

Fix by tracking cache clean flag in the entry.

Message-ID: <0ffb3513d18614539c108b4548cdfbc64274a7d1.1767601130.git.mst@redhat.com>
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2026-01-02 06:22:49 -05:00
Michael S. Tsirkin
e21dd666e4 docs: dma-api: document DMA_ATTR_CPU_CACHE_CLEAN
Document DMA_ATTR_CPU_CACHE_CLEAN as implemented in the
previous patch.

Message-ID: <0720b4be31c1b7a38edca67fd0c97983d2a56936.1767601130.git.mst@redhat.com>
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-31 19:30:17 -05:00
Michael S. Tsirkin
61868dc55a dma-mapping: add DMA_ATTR_CPU_CACHE_CLEAN
When multiple small DMA_FROM_DEVICE or DMA_BIDIRECTIONAL buffers share a
cacheline, and DMA_API_DEBUG is enabled, we get this warning:
	cacheline tracking EEXIST, overlapping mappings aren't supported.

This is because when one of the mappings is removed, while another one
is active, CPU might write into the buffer.

Add an attribute for the driver to promise not to do this, making the
overlapping safe, and suppressing the warning.

Message-ID: <2d5d091f9d84b68ea96abd545b365dd1d00bbf48.1767601130.git.mst@redhat.com>
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-31 19:30:02 -05:00
Michael S. Tsirkin
1e8b5d8555 docs: dma-api: document __dma_from_device_group_begin()/end()
Document the __dma_from_device_group_begin()/end() annotations.

Message-ID: <01ea88055ded4d70cac70ba557680fd5fa7d9ff5.1767601130.git.mst@redhat.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-31 19:28:12 -05:00
Michael S. Tsirkin
ca085faabb dma-mapping: add __dma_from_device_group_begin()/end()
When a structure contains a buffer that DMA writes to alongside fields
that the CPU writes to, cache line sharing between the DMA buffer and
CPU-written fields can cause data corruption on non-cache-coherent
platforms.

Add __dma_from_device_group_begin()/end() annotations to ensure proper
alignment to prevent this:

struct my_device {
	spinlock_t lock1;
	__dma_from_device_group_begin();
	char dma_buffer1[16];
	char dma_buffer2[16];
	__dma_from_device_group_end();
	spinlock_t lock2;
};

Message-ID: <19163086d5e4704c316f18f6da06bc1c72968904.1767601130.git.mst@redhat.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-31 19:27:45 -05:00
Jason Wang
f6a15d8549 virtio_ring: add in order support
This patch implements in order support for both split virtqueue and
packed virtqueue. Performance could be gained for the device where the
memory access could be expensive (e.g vhost-net or a real PCI device):

Benchmark with KVM guest:

Vhost-net on the host: (pktgen + XDP_DROP):

         in_order=off | in_order=on | +%
    TX:  4.51Mpps     | 5.30Mpps    | +17%
    RX:  3.47Mpps     | 3.61Mpps    | + 4%

Vhost-user(testpmd) on the host: (pktgen/XDP_DROP):

For split virtqueue:

         in_order=off | in_order=on | +%
    TX:  5.60Mpps     | 5.60Mpps    | +0.0%
    RX:  9.16Mpps     | 9.61Mpps    | +4.9%

For packed virtqueue:

         in_order=off | in_order=on | +%
    TX:  5.60Mpps     | 5.70Mpps    | +1.7%
    RX:  10.6Mpps     | 10.8Mpps    | +1.8%

Benchmark also shows no performance impact for in_order=off for queue
size with 256 and 1024.

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-20-jasowang@redhat.com>
2025-12-31 05:47:50 -05:00
Jason Wang
519b206e30 virtio_ring: factor out split detaching logic
This patch factors out the split core detaching logic that could be
reused by in order feature into a dedicated function.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-19-jasowang@redhat.com>
2025-12-31 05:47:50 -05:00
Jason Wang
9dc6b944f1 virtio_ring: factor out split indirect detaching logic
Factor out the split indirect descriptor detaching logic in order to
allow it to be reused by the in order support.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-18-jasowang@redhat.com>
2025-12-31 05:39:18 -05:00
Jason Wang
fa56d17b92 virtio_ring: factor out core logic for updating last_used_idx
Factor out the core logic for updating last_used_idx to be reused by
the packed in order implementation.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-17-jasowang@redhat.com>
2025-12-31 05:39:18 -05:00
Jason Wang
c623106c79 virtio_ring: factor out core logic of buffer detaching
Factor out core logic of buffer detaching and leave the free list
management to the caller so in_order can just call the core logic.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-16-jasowang@redhat.com>
2025-12-31 05:39:18 -05:00
Jason Wang
03f05c4eeb virtio_ring: determine descriptor flags at one time
Let's determine the last descriptor by counting the number of sg. This
would be consistent with packed virtqueue implementation and ease the
future in-order implementation.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-15-jasowang@redhat.com>
2025-12-31 05:39:18 -05:00
Jason Wang
1208473f9b virtio_ring: introduce virtqueue ops
This patch introduces virtqueue ops which is a set of callbacks
that will be called for different queue layout or features. This would
help to avoid branches for split/packed and will ease the future
implementation like in order.

Note that in order to eliminate the indirect calls this patch uses
global array of const ops to allow compiler to avoid indirect
branches.

Tested with CONFIG_MITIGATION_RETPOLINE, no performance differences
were noticed.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-14-jasowang@redhat.com>
2025-12-31 05:39:18 -05:00
Jason Wang
eff8b47d28 virtio_ring: switch to use unsigned int for virtqueue_poll_packed()
Switch to use unsigned int for virtqueue_poll_packed() to match
virtqueue_poll() and virtqueue_poll_split() and to ease the
abstraction of the virtqueue ops.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-13-jasowang@redhat.com>
2025-12-31 05:39:18 -05:00
Jason Wang
f2ad9d6b4e virtio_ring: switch to use vring_virtqueue for detach_unused_buf variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-12-jasowang@redhat.com>
2025-12-31 05:39:18 -05:00
Jason Wang
7e81017673 virtio_ring: switch to use vring_virtqueue for disable_cb variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-11-jasowang@redhat.com>
2025-12-31 05:39:18 -05:00
Jason Wang
62fa22cdab virtio_ring: use vring_virtqueue for enable_cb_delayed variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-10-jasowang@redhat.com>
2025-12-31 05:39:18 -05:00
Jason Wang
74847cb573 virtio_ring: switch to use vring_virtqueue for enable_cb_prepare variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-9-jasowang@redhat.com>
2025-12-31 05:39:17 -05:00
Jason Wang
ceea1cd0ae virtio: switch to use vring_virtqueue for virtqueue_get variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-8-jasowang@redhat.com>
2025-12-31 05:39:17 -05:00
Jason Wang
4a0fa90b10 virtio_ring: switch to use vring_virtqueue for virtqueue_add variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-7-jasowang@redhat.com>
2025-12-31 05:39:17 -05:00
Jason Wang
8b8590b708 virtio_ring: switch to use vring_virtqueue for virtqueue_kick_prepare variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-6-jasowang@redhat.com>
2025-12-31 05:39:17 -05:00
Jason Wang
9552bc0581 virtio_ring: switch to use vring_virtqueue for virtqueue resize variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-5-jasowang@redhat.com>
2025-12-31 05:39:17 -05:00
Jason Wang
40da006f13 virtio_ring: unify logic of virtqueue_poll() and more_used()
This patch unifies the logic of virtqueue_poll() and more_used() for
better code reusing and ease the future in order implementation.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-4-jasowang@redhat.com>
2025-12-31 05:39:17 -05:00
Jason Wang
79f6d68293 virtio_ring: switch to use vring_virtqueue in virtqueue_poll variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-3-jasowang@redhat.com>
2025-12-31 05:39:17 -05:00
Jason Wang
8ce8e3e558 virtio_ring: rename virtqueue_reinit_xxx to virtqueue_reset_xxx()
To be consistent with virtqueue_reset().

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-2-jasowang@redhat.com>
2025-12-31 05:39:17 -05:00
Jon Kohler
3b34d6324d vhost: use "checked" versions of get_user() and put_user()
vhost_get_user and vhost_put_user leverage __get_user and __put_user,
respectively, which were both added in 2016 by commit 6b1e6cc785
("vhost: new device IOTLB API"). In a heavy UDP transmit workload on a
vhost-net backed tap device, these functions showed up as ~11.6% of
samples in a flamegraph of the underlying vhost worker thread.

Quoting Linus from [1]:
    Anyway, every single __get_user() call I looked at looked like
    historical garbage. [...] End result: I get the feeling that we
    should just do a global search-and-replace of the __get_user/
    __put_user users, replace them with plain get_user/put_user instead,
    and then fix up any fallout (eg the coco code).

Switch to plain get_user/put_user in vhost, which results in a slight
throughput speedup. get_user now about ~8.4% of samples in flamegraph.

Basic iperf3 test on a Intel 5416S CPU with Ubuntu 25.10 guest:
TX: taskset -c 2 iperf3 -c <rx_ip> -t 60 -p 5200 -b 0 -u -i 5
RX: taskset -c 2 iperf3 -s -p 5200 -D
Before: 6.08 Gbits/sec
After:  6.32 Gbits/sec

As to what drives the speedup, Sean's patch [2] explains:
	Use the normal, checked versions for get_user() and put_user() instead of
	the double-underscore versions that omit range checks, as the checked
	versions are actually measurably faster on modern CPUs (12%+ on Intel,
	25%+ on AMD).

	The performance hit on the unchecked versions is almost entirely due to
	the added LFENCE on CPUs where LFENCE is serializing (which is effectively
	all modern CPUs), which was added by commit 304ec1b050 ("x86/uaccess:
	Use __uaccess_begin_nospec() and uaccess_try_nospec").  The small
	optimizations done by commit b19b74bc99 ("x86/mm: Rework address range
	check in get_user() and put_user()") likely shave a few cycles off, but
	the bulk of the extra latency comes from the LFENCE.

[1] https://lore.kernel.org/all/CAHk-=wiJiDSPZJTV7z3Q-u4DfLgQTNWqUqqrwSBHp0+Dh016FA@mail.gmail.com/
[2] https://lore.kernel.org/all/20251106210206.221558-1-seanjc@google.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jon Kohler <jon@nutanix.com>
Message-Id: <20251113005529.2494066-1-jon@nutanix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-26 15:00:01 -05:00
zhangdongchuan@eswincomputing.com
4b7bf8d550 virtio_ring: code cleanup in detach_buf_split
Since the return value of vring_unmap_one_split() is exactly
vq->split.desc_extra[i].next, 'i = vq->split.desc_extra[i].next' is
redundant. Assign vring_unmap_one_split() to i instead.

Since vq->split.desc_extra is assigned to extra, use extra[i].next
instead of vq->split.desc_extra[i].next to improve readability.

No change in functionality.

Signed-off-by: zhangdongchuan <zhangdongchuan@eswincomputing.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <202511261140162936986@eswincomputing.com>
2025-12-26 15:00:00 -05:00
Thomas Weißschuh
3c4629b68d virtio: uapi: avoid usage of libc types
Using libc types and headers from the UAPI headers is problematic as it
introduces a dependency on a full C toolchain.

On Linux 'unsigned long' works as a replacement for 'uintptr_t' and does
not depend on libc.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251222-uapi-virtio-v1-1-29390f87bcad@linutronix.de>
2025-12-26 15:00:00 -05:00
Stefano Garzarella
d8ee3cfdc8 vhost/vsock: improve RCU read sections around vhost_vsock_get()
vhost_vsock_get() uses hash_for_each_possible_rcu() to find the
`vhost_vsock` associated with the `guest_cid`. hash_for_each_possible_rcu()
should only be called within an RCU read section, as mentioned in the
following comment in include/linux/rculist.h:

/**
 * hlist_for_each_entry_rcu - iterate over rcu list of given type
 * @pos:	the type * to use as a loop cursor.
 * @head:	the head for your list.
 * @member:	the name of the hlist_node within the struct.
 * @cond:	optional lockdep expression if called from non-RCU protection.
 *
 * This list-traversal primitive may safely run concurrently with
 * the _rcu list-mutation primitives such as hlist_add_head_rcu()
 * as long as the traversal is guarded by rcu_read_lock().
 */

Currently, all calls to vhost_vsock_get() are between rcu_read_lock()
and rcu_read_unlock() except for calls in vhost_vsock_set_cid() and
vhost_vsock_reset_orphans(). In both cases, the current code is safe,
but we can make improvements to make it more robust.

About vhost_vsock_set_cid(), when building the kernel with
CONFIG_PROVE_RCU_LIST enabled, we get the following RCU warning when the
user space issues `ioctl(dev, VHOST_VSOCK_SET_GUEST_CID, ...)` :

  WARNING: suspicious RCU usage
  6.18.0-rc7 #62 Not tainted
  -----------------------------
  drivers/vhost/vsock.c:74 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by rpc-libvirtd/3443:
   #0: ffffffffc05032a8 (vhost_vsock_mutex){+.+.}-{4:4}, at: vhost_vsock_dev_ioctl+0x2ff/0x530 [vhost_vsock]

  stack backtrace:
  CPU: 2 UID: 0 PID: 3443 Comm: rpc-libvirtd Not tainted 6.18.0-rc7 #62 PREEMPT(none)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-7.fc42 06/10/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x75/0xb0
   dump_stack+0x14/0x1a
   lockdep_rcu_suspicious.cold+0x4e/0x97
   vhost_vsock_get+0x8f/0xa0 [vhost_vsock]
   vhost_vsock_dev_ioctl+0x307/0x530 [vhost_vsock]
   __x64_sys_ioctl+0x4f2/0xa00
   x64_sys_call+0xed0/0x1da0
   do_syscall_64+0x73/0xfa0
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
   ...
   </TASK>

This is not a real problem, because the vhost_vsock_get() caller, i.e.
vhost_vsock_set_cid(), holds the `vhost_vsock_mutex` used by the hash
table writers. Anyway, to prevent that warning, add lockdep_is_held()
condition to hash_for_each_possible_rcu() to verify that either the
caller is in an RCU read section or `vhost_vsock_mutex` is held when
CONFIG_PROVE_RCU_LIST is enabled; and also clarify the comment for
vhost_vsock_get() to better describe the locking requirements and the
scope of the returned pointer validity.

About vhost_vsock_reset_orphans(), currently this function is only
called via vsock_for_each_connected_socket(), which holds the
`vsock_table_lock` spinlock (which is also an RCU read-side critical
section). However, add an explicit RCU read lock there to make the code
more robust and explicit about the RCU requirements, and to prevent
issues if the calling context changes in the future or if
vhost_vsock_reset_orphans() is called from other contexts.

Fixes: 834e772c8d ("vhost/vsock: fix use-after-free in network stack callers")
Cc: stefanha@redhat.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20251126133826.142496-1-sgarzare@redhat.com>
Message-ID: <20251126210313.GA499503@fedora>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:57 -05:00
Michael S. Tsirkin
7f81878b04 tools/virtio: add device, device_driver stubs
Add stubs needed by virtio.h

Message-ID: <0fabf13f6ea812ebc73b1c919fb17d4dec1545db.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:56 -05:00
Michael S. Tsirkin
39cfe193f3 tools/virtio: fix up oot build
oot build tends to help uncover bugs so it's worth keeping around,
as long as it's low effort.
add stubs for a couple of macros virtio gained recently,
and disable vdpa in the test build.

Message-ID: <33968faa7994b86d1f78057358a50b8f460c7a23.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:56 -05:00
Michael S. Tsirkin
e88dfb9331 virtio_features: make it self-contained
virtio_features.h uses WARN_ON_ONCE and memset so it must
include linux/bug.h and linux/string.h

Message-ID: <579986aa9b8d023844990d2a0e267382f8ad85d5.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:56 -05:00
Michael S. Tsirkin
cec9c5e385 tools/virtio: switch to kernel's virtio_config.h
Drops stubs in virtio_config.h, use the kernel's version instead - we
are now activly developing it, so the stub became too hard to maintain.

Message-ID: <8e5c85dc8aad001f161f7e2d8799ffbccfc31381.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:56 -05:00
Michael S. Tsirkin
b0fe545b3c tools/virtio: stub might_sleep and synchronize_rcu
Add might_sleep() and synchronize_rcu() stubs needed by virtio_config.h.

might_sleep() is a no-op, synchronize_rcu doesn't work but we don't
need it to.

Created using Cursor CLI.

Message-ID: <5557e026335d808acd7b890693ee1382e73dd33a.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:56 -05:00
Michael S. Tsirkin
a2f964c45b tools/virtio: add struct cpumask to cpumask.h
Add struct cpumask stub used by virtio_config.h.

Created using Cursor CLI.

Message-ID: <eacf56399ba220513ebcd610f4a5115dc768db80.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:56 -05:00
Michael S. Tsirkin
4e949e77fa tools/virtio: pass KCFLAGS to module build
Update the mod target to pass KCFLAGS with the in-tree vhost driver
include path. This way vhost_test can find vhost headers.

Created using Cursor CLI.

Message-ID: <5473e5a5dfd2fcd261a778f2017cac669c031f23.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:56 -05:00
Michael S. Tsirkin
b6600eff05 tools/virtio: add ucopysize.h stub
Add ucopysize.h with stub implementations of check_object_size,
copy_overflow, and check_copy_size.

Created using Cursor CLI.

Message-ID: <5046df90002bb744609248404b81d33b559fe813.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:56 -05:00
Michael S. Tsirkin
c53ad75c62 tools/virtio: add dev_WARN_ONCE and is_vmalloc_addr stubs
Add dev_WARN_ONCE and is_vmalloc_addr stubs needed by virtio_ring.c.
is_vmalloc_addr stub always returns false - that's fine since it's
merely a sanity check.

Created using Cursor CLI.

Message-ID: <749e7a03b7cd56baf50a27efc3b05e50cf8f36b6.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:56 -05:00
Michael S. Tsirkin
03d768a38c tools/virtio: stub DMA mapping functions
Add dma_map_page_attrs and dma_unmap_page_attrs stubs.
Follow the same pattern as existing DMA mapping stubs.

Created using Cursor CLI.

Message-ID: <3512df1fe0e2129ea493434a21c940c50381cc93.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:56 -05:00
Michael S. Tsirkin
42059e68ea tools/virtio: add struct module forward declaration
Declarate struct module in our linux/module.h stub.

Created using Cursor CLI.

Message-ID: <c01b8d24159664cc8c49354088efa342ae9e7321.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:56 -05:00
Michael S. Tsirkin
16fe720f1d tools/virtio: use kernel's virtio.h
Replace virtio stubs with an include of the kernel header.

Message-ID: <33daf1033fc447eb8e3e54d21013ccfd99550e37.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:55 -05:00
Michael S. Tsirkin
f059588c55 virtio: make it self-contained
virtio.h uses struct module, add a forward declaration to
make the header self-contained.

Message-ID: <9171b5cac60793eb59ab044c96ee038bf1363bee.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:55 -05:00
Michael S. Tsirkin
94fb5e796a tools/virtio: fix up compiler.h stub
Add #undef __user before and after including compiler_types.h to avoid
redefinition warnings when compiling with system headers that also
define __user. This allows tools/virtio to build without warnings.

Additionally, stub out __must_check

Created using Cursor CLI.

Message-ID: <56424ce95c72cb4957070a7cd3c3c40ad5addaee.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-12-24 08:02:55 -05:00
Linus Torvalds
9448598b22 Linux 6.19-rc2 v6.19-rc2 2025-12-21 15:52:04 -08:00