Commit Graph

1155532 Commits

Author SHA1 Message Date
Kangjie Xu
313389be06 vhost-net: support VIRTIO_F_RING_RESET
Add VIRTIO_F_RING_RESET, which indicates that the driver can reset a
queue individually.

VIRTIO_F_RING_RESET feature is added to virtio-spec 1.2. The relevant
information is in
    oasis-tcs/virtio-spec#124
    oasis-tcs/virtio-spec#139

The implementation only adds the feature bit in supported features. It
does not require any other changes because we reuse the existing vhost
protocol.

The virtqueue reset process can be concluded as two parts:
1. The driver can reset a virtqueue. When it is triggered, we use the
set_backend to disable the virtqueue.
2. After the virtqueue is disabled, the driver may optionally re-enable
it. The process is basically similar to when the device is started,
except that the restart process does not need to set features and set
mem table since they do not change. QEMU will send messages containing
size, base, addr, kickfd and callfd of the virtqueue in order.
Specifically, the host kernel will receive these messages in order:
    a. VHOST_SET_VRING_NUM
    b. VHOST_SET_VRING_BASE
    c. VHOST_SET_VRING_ADDR
    d. VHOST_SET_VRING_KICK
    e. VHOST_SET_VRING_CALL
    f. VHOST_NET_SET_BACKEND
Finally, after we use set_backend to attach the virtqueue, the virtqueue
will be enabled and start to work.

Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20220825085610.80315-1-kangjie.xu@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-20 19:26:59 -05:00
Bo Liu
b3d4f02ee7 vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit
Follow the advice of the Documentation/filesystems/sysfs.rst
and show() should only use sysfs_emit() or sysfs_emit_at()
when formatting the value to be returned to user space.

Signed-off-by: Bo Liu <liubo03@inspur.com>
Message-Id: <20230129091145.2837-1-liubo03@inspur.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:58 -05:00
Jason Wang
36871fb92b vdpa: mlx5: support per virtqueue dma device
This patch implements per virtqueue dma device for mlx5_vdpa. This is
needed for virtio_vdpa to work for CVQ which is backed by vringh but
not DMA. We simply advertise the vDPA device itself as the DMA device
for CVQ then DMA API can simply use PA so the identical mapping for
CVQ can still be used. Otherwise the identical (1:1) mapping won't
work when platform IOMMU is enabled since the IOVA is allocated on
demand which is not necessarily the PA.

This fixes the following crash when mlx5 vDPA device is bound to
virtio-vdpa with platform IOMMU enabled but not in passthrough mode:

BUG: unable to handle page fault for address: ff2fb3063deb1002
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1393001067 P4D 1393002067 PUD 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 55 PID: 8923 Comm: kworker/u112:3 Kdump: loaded Not tainted 6.1.0+ #7
Hardware name: Dell Inc. PowerEdge R750/0PJ80M, BIOS 1.5.4 12/17/2021
Workqueue: mlx5_vdpa_wq mlx5_cvq_kick_handler [mlx5_vdpa]
RIP: 0010:vringh_getdesc_iotlb+0x93/0x1d0 [vringh]
Code: 14 25 40 ef 01 00 83 82 c0 0a 00 00 01 48 2b 05 93 5a 1b ea 8b 4c 24 14 48 c1 f8 06 48 c1 e0 0c 48 03 05 90 5a 1b ea 48 01 c8 <0f> b7 00 83 aa c0 0a 00 00 01 65 ff 0d bc e4 41 3f 0f 84 05 01 00
RSP: 0018:ff46821ba664fdf8 EFLAGS: 00010282
RAX: ff2fb3063deb1002 RBX: 0000000000000a20 RCX: 0000000000000002
RDX: ff2fb318d2f94380 RSI: 0000000000000002 RDI: 0000000000000001
RBP: ff2fb3065e832410 R08: ff46821ba664fe00 R09: 0000000000000001
R10: 0000000000000000 R11: 000000000000000d R12: ff2fb3065e832488
R13: ff2fb3065e8324a8 R14: ff2fb3065e8324c8 R15: ff2fb3065e8324a8
FS:  0000000000000000(0000) GS:ff2fb3257fac0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ff2fb3063deb1002 CR3: 0000001392010006 CR4: 0000000000771ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
  mlx5_cvq_kick_handler+0x89/0x2b0 [mlx5_vdpa]
  process_one_work+0x1e2/0x3b0
  ? rescuer_thread+0x390/0x390
  worker_thread+0x50/0x3a0
  ? rescuer_thread+0x390/0x390
  kthread+0xd6/0x100
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>

Reviewed-by: Eli Cohen <elic@nvidia.com>
Tested-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230119061525.75068-6-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:58 -05:00
Jason Wang
99fb2b838f vdpa: set dma mask for vDPA device
Setting DMA mask for vDPA device in case that there are virtqueue that
is not backed by DMA so the vDPA device could be advertised as the DMA
device that is used by DMA API for software emulated virtqueues.

Reviewed-by: Eli Cohen <elic@nvidia.com>
Tested-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230119061525.75068-5-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:58 -05:00
Jason Wang
a1baedb11e virtio-vdpa: support per vq dma device
This patch adds the support of per vq dma device for virito-vDPA. vDPA
parents then are allowed to use different DMA devices. This is useful
for the parents that have software or emulated virtqueues.

Reviewed-by: Eli Cohen <elic@nvidia.com>
Tested-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230119061525.75068-4-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:58 -05:00
Jason Wang
25da258fa6 vdpa: introduce get_vq_dma_device()
This patch introduces a new method to query the dma device that is use
for a specific virtqueue.

Reviewed-by: Eli Cohen <elic@nvidia.com>
Tested-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230119061525.75068-3-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:58 -05:00
Jason Wang
2713ea3c7d virtio_ring: per virtqueue dma device
This patch introduces a per virtqueue dma device. This will be used
for virtio devices whose virtqueue are backed by different underlayer
devices.

One example is the vDPA that where the control virtqueue could be
implemented through software mediation.

Some of the work are actually done before since the helper like
vring_dma_device(). This work left are:

- Let vring_dma_device() return the per virtqueue dma device instead
  of the vdev's parent.
- Allow passing a dma_device when creating the virtqueue through a new
  helper, old vring creation helper will keep using vdev's parent.

Reviewed-by: Eli Cohen <elic@nvidia.com>
Tested-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230119061525.75068-2-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:58 -05:00
Liming Wu
759aba1e6e vhost: remove unused paramete
"enabled" is defined in vhost_init_device_iotlb,
but it is never used. Let's remove it.

Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com>
Message-Id: <20230110024445.303-1-liming.wu@jaguarmicro.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
2023-02-20 19:26:58 -05:00
Liming Wu
62b763ad76 vhost-test: remove meaningless debug info
remove printk as it is meaningless.

Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com>
Message-Id: <20230105070357.274-1-liming.wu@jaguarmicro.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-20 19:26:58 -05:00
Jason Wang
6c3d329e64 vdpa_sim: get rid of DMA ops
We used to (ab)use the DMA ops for setting up identical mappings in
the IOTLB. This patch tries to get rid of the those unnecessary DMA
ops by maintaining a simple identical/passthrough mappings by
default. When bound to virtio_vdpa driver, DMA API will simply use PA
as the IOVA and we will be all fine. When the vDPA bus tries to setup
customized mapping (e.g when bound to vhost-vDPA), the
identical/passthrough mapping will be removed.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221223060021.28011-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2023-02-20 19:26:58 -05:00
Jason Wang
0899774cb3 vdpa_sim_net: vendor satistics
This patch adds support for basic vendor stats that include counters
for tx, rx and cvq.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221223055548.27810-5-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20 19:26:57 -05:00
Jason Wang
5dbb063a3e vdpa_sim: support vendor statistics
This patch adds a new config ops callback to allow individual
simulator to implement the vendor stats callback.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221223055548.27810-4-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20 19:26:57 -05:00
Jason Wang
bb105d514a vdpasim: customize allocation size
Allow individual simulator to customize the allocation size.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221223055548.27810-3-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:57 -05:00
Jason Wang
0497f23e73 vdpa_sim: switch to use __vdpa_alloc_device()
This allows us to control the allocation size of the structure.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221223055548.27810-2-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:57 -05:00
Jason Wang
2f8200efe7 vdpa_sim: use weak barriers
vDPA simulators are software emulated device, so let's switch to use
weak barriers to avoid extra overhead in the driver.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221221062146.15356-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
2023-02-20 19:26:57 -05:00
Suwan Kim
07b679f70d virtio-blk: support completion batching for the IRQ path
This patch adds completion batching to the IRQ path. It reuses batch
completion code of virtblk_poll(). It collects requests to io_comp_batch
and processes them all at once. It can boost up the performance by 2%.

To validate the performance improvement and stabilty, I did fio test with
4 vCPU VM and 12 vCPU VM respectively. Both VMs have 8GB ram and the same
number of HW queues as vCPU.
The fio cammad is as follows and I ran the fio 5 times and got IOPS average.
(io_uring, randread, direct=1, bs=512, iodepth=64 numjobs=2,4)

Test result shows about 2% improvement.

           4 vcpu VM       |   numjobs=2   |   numjobs=4
      -----------------------------------------------------------
        fio without patch  |  367.2K IOPS  |   397.6K IOPS
      -----------------------------------------------------------
        fio with patch     |  372.8K IOPS  |   407.7K IOPS

           12 vcpu VM      |   numjobs=2   |   numjobs=4
      -----------------------------------------------------------
        fio without patch  |  363.6K IOPS  |   374.8K IOPS
      -----------------------------------------------------------
        fio with patch     |  373.8K IOPS  |   385.3K IOPS

Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20221221145456.281218-3-suwan.kim027@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:57 -05:00
Suwan Kim
489e18f3d7 virtio-blk: set req->state to MQ_RQ_COMPLETE after polling I/O is finished
Driver should set req->state to MQ_RQ_COMPLETE after it finishes to process
req. But virtio-blk doesn't set MQ_RQ_COMPLETE after virtblk_poll() handles
req and req->state still remains MQ_RQ_IN_FLIGHT. Fortunately so far there
is no issue about it because blk_mq_end_request_batch() sets req->state to
MQ_RQ_IDLE.

In this patch, virblk_poll() calls blk_mq_complete_request_remote() to set
req->state to MQ_RQ_COMPLETE before it adds req to a batch completion list.
So it properly sets req->state after polling I/O is finished.

Fixes: 4e04005256 ("virtio-blk: support polling I/O")
Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20221221145456.281218-2-suwan.kim027@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:57 -05:00
Bagas Sanjaya
2b034e82ff docs: driver-api: virtio: commentize spec version checking
A sentence that checks for later spec version is meant for developers
hacking the documentation source. Make it comment block (hidden from
actual output).

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20221220095828.27588-4-bagasdotme@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:57 -05:00
Bagas Sanjaya
ae8d2247af docs: driver-api: virtio: slightly reword virtqueues allocation paragraph
"It's at this stage that" means "At this point", so use the latter as it
is more effective.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20221220095828.27588-3-bagasdotme@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:56 -05:00
Bagas Sanjaya
fb25d45694 docs: driver-api: virtio: parenthesize external reference targets
Parenthesize targets to links in "References" section to distinguish
them from remaining texts.

While at it, describe the second target.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20221220095828.27588-2-bagasdotme@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:56 -05:00
Sebastien Boeuf
f9d9f57ef0 vdpa_sim: Implement resume vdpa op
Implement resume operation for vdpa_sim devices, so vhost-vdpa will
offer that backend feature and userspace can effectively resume the
device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Message-Id: <15a4566826033c5dd9a2167e5cfb0ef4d90cea49.1672742878.git.sebastien.boeuf@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-20 19:26:56 -05:00
Sebastien Boeuf
3b688d7a08 vhost-vdpa: uAPI to resume the device
This new ioctl adds support for resuming the device from userspace.

This is required when trying to restore the device in a functioning
state after it's been suspended. It is already possible to reset a
suspended device, but that means the device must be reconfigured and
all the IOMMU/IOTLB mappings must be recreated. This new operation
allows the device to be resumed without going through a full reset.

This is particularly useful when trying to perform offline migration of
a virtual machine (also known as snapshot/restore) as it allows the VMM
to resume the virtual machine back to a running state after the snapshot
is performed.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Message-Id: <73b75fb87d25cff59768b4955a81fe7ffe5b4770.1672742878.git.sebastien.boeuf@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20 19:26:56 -05:00
Sebastien Boeuf
69106b6fb3 vhost-vdpa: Introduce RESUME backend feature bit
Userspace knows if the device can be resumed or not by checking this
feature bit.

It's only exposed if the vdpa driver backend implements the resume()
operation callback. Userspace trying to negotiate this feature when it
hasn't been exposed will result in an error.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Message-Id: <b18db236ba3d990cdb41278eb4703be9201d9514.1672742878.git.sebastien.boeuf@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20 19:26:56 -05:00
Sebastien Boeuf
1538a8a49e vdpa: Add resume operation
Add a new operation to allow a vDPA device to be resumed after it has
been suspended. Trying to resume a device that wasn't suspended will
result in a no-op.

This operation is optional. If it's not implemented, the associated
backend feature bit will not be exposed. And if the feature bit is not
exposed, invoking this operation will return an error.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Message-Id: <6e05c4b31b47f3e29cb2bd7ebd56c81f84b8f48a.1672742878.git.sebastien.boeuf@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20 19:26:56 -05:00
Alvaro Karsz
51a8f9d7f5 virtio: vdpa: new SolidNET DPU driver.
This commit includes:
 1) The driver to manage the controlplane over vDPA bus.
 2) A HW monitor device to read health values from the DPU.

Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230110165638.123745-4-alvaro.karsz@solid-run.com>
Message-Id: <20230209075128.78915-1-alvaro.karsz@solid-run.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:56 -05:00
Alvaro Karsz
d089d69cc1 PCI: Avoid FLR for SolidRun SNET DPU rev 1
This patch fixes a FLR bug on the SNET DPU rev 1 by setting the
PCI_DEV_FLAGS_NO_FLR_RESET flag.

As there is a quirk to avoid FLR (quirk_no_flr), I added a new quirk
to check the rev ID before calling to quirk_no_flr.

Without this patch, a SNET DPU rev 1 may hang when FLR is applied.

Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Message-Id: <20230110165638.123745-3-alvaro.karsz@solid-run.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Alvaro Karsz
db6c4dee4c PCI: Add SolidRun vendor ID
Add SolidRun vendor ID to pci_ids.h

The vendor ID is used in 2 different source files, the SNET vDPA driver
and PCI quirks.

Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Message-Id: <20230110165638.123745-2-alvaro.karsz@solid-run.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Eugenio Pérez
d8b3832a78 vdpa_sim_net: Offer VIRTIO_NET_F_STATUS
VIRTIO_NET_S_LINK_UP is already returned in config reads since vdpasim
creation, but the feature bit was not offered to the driver.

Tested modifying VIRTIO_NET_S_LINK_UP and different values of "status"
in qemu virtio-net options, using vhost_vdpa.

Not considering as a fix, because there should be no driver trusting in
this config read before the feature flag.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20221117155502.1394700-1-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
46fc0917bb vDPA/ifcvf: implement features provisioning
This commit implements features provisioning for ifcvf, that means:
1)checkk whether the provisioned features are supported by
the management device
2)vDPA device only presents selected feature bits

Examples:
a)The management device supported features:
$ vdpa mgmtdev show pci/0000:01:00.5
pci/0000:01:00.5:
  supported_classes net
  max_supported_vqs 9
  dev_features MTU MAC MRG_RXBUF CTRL_VQ MQ ANY_LAYOUT VERSION_1 ACCESS_PLATFORM

b)Provision a vDPA device with all supported features:
$ vdpa dev add name vdpa0 mgmtdev pci/0000:01:00.5
$ vdpa/vdpa dev config show vdpa0
vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false max_vq_pairs 4 mtu 1500
  negotiated_features MRG_RXBUF CTRL_VQ MQ VERSION_1 ACCESS_PLATFORM

c)Provision a vDPA device with a subset of the supported features:
$ vdpa dev add name vdpa0 mgmtdev pci/0000:01:00.5 device_features 0x300020020
$ vdpa dev config show vdpa0
mac 00:e8:ca:11:be:05 link up link_announce false
  negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20221125145724.1129962-13-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
267000e980 vDPA/ifcvf: retire ifcvf_private_to_vf
This commit retires ifcvf_private_to_vf, because
the vf is already a member of the adapter,
so it could be easily addressed by adapter->vf.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20221125145724.1129962-12-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
93139037b5 vDPA/ifcvf: allocate the adapter in dev_add()
The adapter is the container of the vdpa_device,
this commits allocate the adapter in dev_add()
rather than in probe(). So that the vdpa_device()
could be re-created when the userspace creates
the vdpa device, and free-ed in dev_del()

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-11-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
6a3b2f179b vDPA/ifcvf: manage ifcvf_hw in the mgmt_dev
This commit allocates the hw structure in the
management device structure. So the hardware
can be initialized once the management device
is allocated in probe.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-10-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
7cfd36b7e8 vDPA/ifcvf: ifcvf_request_irq works on ifcvf_hw
All ifcvf_request_irq's callees are refactored
to work on ifcvf_hw, so it should be decoupled
from the adapter as well

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-9-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
a70d833e69 vDPA/ifcvf: decouple config/dev IRQ requester and vectors allocator from the adapter
This commit decouples the config irq requester, the device
shared irq requester and the MSI vectors allocator from
the adapter. So they can be safely invoked since probe
before the adapter is allocated.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-8-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
f9a9ffb2e4 vDPA/ifcvf: decouple vq irq requester from the adapter
This commit decouples the vq irq requester from the adapter,
so that these functions can be invoked since probe.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-7-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
23dac55cec vDPA/ifcvf: decouple config IRQ releaser from the adapter
This commit decouples config IRQ releaser from the adapter,
so that it could be invoked once probe or in err handlers.
ifcvf_free_irq() works on ifcvf_hw in this commit

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-6-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
004cbcabab vDPA/ifcvf: decouple vq IRQ releasers from the adapter
This commit decouples the IRQ releasers from the
adapter, so that these functions could be
safely invoked once probe

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-5-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
66e3970b16 vDPA/ifcvf: alloc the mgmt_dev before the adapter
This commit reverses the order of allocating the
management device and the adapter. So that it would
be possible to move the allocation of the adapter
to dev_add().

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-4-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
af8eb69a62 vDPA/ifcvf: decouple config space ops from the adapter
This commit decopules the config space ops from the
adapter layer, so these functions can be invoked
once the device is probed.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-3-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
d59f633dd0 vDPA/ifcvf: decouple hw features manipulators from the adapter
This commit gets rid of ifcvf_adapter in hw features related
functions in ifcvf_base. Then these functions are more rubust
and de-coupling from the ifcvf_adapter layer. So these
functions could be invoded once the device is probed, even
before the adapter is allocaed.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-2-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Eli Cohen
0a59975088 vdpa/mlx5: Add RX counters to debugfs
For each interface, either VLAN tagged or untagged, add two hardware
counters: one for unicast and another for multicast. The counters count
RX packets and bytes and can be read through debugfs:

$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/mcast/packets
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/ucast/bytes

This feature is controlled via the config option
MLX5_VDPA_STEERING_DEBUG. It is off by default as it may have some
impact on performance.

includes a fixup By Yang Yingliang <yangyingliang@huawei.com>:

vdpa/mlx5: fix check wrong pointer in mlx5_vdpa_add_mac_vlan_rules()

The local variable 'rule' is not used anymore, fix return value
check after calling mlx5_add_flow_rules().

Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-9-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Message-Id: <20230104074418.1737510-1-yangyingliang@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-20 19:26:54 -05:00
Eli Cohen
2942210043 vdpa/mlx5: Add debugfs subtree
Add debugfs subtree and expose flow table ID and TIR number. This
information can be used by external tools to do extended
troubleshooting.

The information can be retrieved like so:
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/table_id
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/tirn

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-8-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Eli Cohen
72c67e9b90 vdpa/mlx5: Move some definitions to a new header file
Move some definitions from mlx5_vnet.c to newly added header file
mlx5_vnet.h. We need these definitions for the following patches that
add debugfs tree to expose information vital for debug.

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-7-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Michael S. Tsirkin
b16a1756c7 virtio_blk: mark all zone fields LE
zone is a virtio 1.x feature so all fields are LE,
they are handled as such, but have mistakenly been labeled
__virtioXX in the header.  This results in a bunch of sparse warnings.

Use the __leXX tags to make sparse happy.

Message-Id: <20221222193214.55146-1-mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-15 06:46:22 -05:00
Michael S. Tsirkin
2a9c844e89 virtio_blk: zone append in header type tweak
virtio blk returns a 64 bit append_sector in an input buffer,
in LE format. This field is not tagged as LE correctly, so
even though the generated code is ok, we get warnings from sparse:

drivers/block/virtio_blk.c:332:33: sparse: sparse: cast to restricted __le64

Make sparse happy by using the correct type.

Message-Id: <20221220125154.564265-1-mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-15 06:46:22 -05:00
Michael S. Tsirkin
04e5421e6f virtio_blk: temporary variable type tweak
virtblk_result returns blk_status_t which is a bitwise restricted type,
so we are not supposed to stuff it in a plain int temporary variable.
All we do with it is pass it on to a function expecting blk_status_t so
the generated code is ok, but we get warnings from sparse:

drivers/block/virtio_blk.c:326:36: sparse: sparse: incorrect type in initializer (different base types) @@     expected int status @@
+got restricted blk_status_t @@
drivers/block/virtio_blk.c:334:33: sparse: sparse: incorrect type in argument 2 (different base types) @@     expected restricted
+blk_status_t [usertype] error @@     got int status @@

Make sparse happy by using the correct type.

Message-Id: <20221220124152.523531-1-mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2023-02-15 06:46:22 -05:00
Dmitry Fomichev
95bfec41bd virtio-blk: add support for zoned block devices
This patch adds support for Zoned Block Devices (ZBDs) to the kernel
virtio-blk driver.

The patch accompanies the virtio-blk ZBD support draft that is now
being proposed for standardization. The latest version of the draft is
linked at

https://github.com/oasis-tcs/virtio-spec/issues/143 .

The QEMU zoned device code that implements these protocol extensions
has been developed by Sam Li and it is currently in review at the QEMU
mailing list.

A number of virtblk request structure changes has been introduced to
accommodate the functionality that is specific to zoned block devices
and, most importantly, make room for carrying the Zoned Append sector
value from the device back to the driver along with the request status.

The zone-specific code in the patch is heavily influenced by NVMe ZNS
code in drivers/nvme/host/zns.c, but it is simpler because the proposed
virtio ZBD draft only covers the zoned device features that are
relevant to the zoned functionality provided by Linux block layer.

includes the following fixup:

virtio-blk: fix probe without CONFIG_BLK_DEV_ZONED

When building without CONFIG_BLK_DEV_ZONED, VIRTIO_BLK_F_ZONED
is excluded from array of driver features.
As a result virtio_has_feature panics in virtio_check_driver_offered_feature
since that by design verifies that a feature we are checking for
is listed in the feature array.

To fix, replace the call to virtio_has_feature with a stub.

Message-Id: <20221016034127.330942-3-dmitry.fomichev@wdc.com>
Co-developed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20221220112340.518841-1-mst@redhat.com>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Debugged-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-15 06:46:22 -05:00
Ricardo Cañuelo
d16c0cd273 docs: driver-api: virtio: virtio on Linux
Basic doc about Virtio on Linux and a short tutorial on Virtio drivers.

includes the following fixup:

virtio: fix virtio_config_ops kerneldocs

Fixes two warning messages when building htmldocs:

    warning: duplicate section name 'Note'
    warning: expecting prototype for virtio_config_ops().
             Prototype was for vq_callback_t() instead

Message-Id: <20221010064359.1324353-2-ricardo.canuelo@collabora.com>
Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20221220100035.2712449-1-ricardo.canuelo@collabora.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-15 06:46:22 -05:00
Michael Sammler
d5ff73bbb0 virtio_pmem: populate numa information
Compute the numa information for a virtio_pmem device from the memory
range of the device. Previously, the target_node was always 0 since
the ndr_desc.target_node field was never explicitly set. The code for
computing the numa node is taken from cxl_pmem_region_probe in
drivers/cxl/pmem.c.

Signed-off-by: Michael Sammler <sammler@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Tested-by: Mina Almasry <almasrymina@google.com>
Message-Id: <20221115214036.1571015-1-sammler@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-15 06:46:22 -05:00
Eugenio Pérez
0e84f918fa vdpa_sim: not reset state in vdpasim_queue_ready
vdpasim_queue_ready calls vringh_init_iotlb, which resets split indexes.
But it can be called after setting a ring base with
vdpasim_set_vq_state.

Fix it by stashing them. They're still resetted in vdpasim_vq_reset.

This was discovered and tested live migrating the vdpa_sim_net device.

Fixes: 2c53d0f64c ("vdpasim: vDPA device simulator")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230118164359.1523760-2-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
2023-02-13 07:25:00 -05:00