linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-29 02:19:54 -04:00

Author	SHA1	Message	Date
Jiri Pirko	de1177e560	virtio: make virtio_find_vqs() call virtio_find_vqs_ctx() In order to prepare for conversion of virtio_find_vqs*() arguments, make virtio_find_vqs() to call virtio_find_vqs_ctx() instead of op directly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-3-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-17 05:20:56 -04:00
Jiri Pirko	87bb477c39	caif_virtio: use virtio_find_single_vq() for single virtqueue finding Since caif uses only one queue, convert to virtio_find_single_vq() helper which is made for this purpose. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-2-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-17 05:20:56 -04:00
Dragos Tatulea	8e0751af1b	vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready() VQ indices in the range [cur_num_qps, max_vqs) represent queues that have not yet been activated. .set_vq_ready should not activate these VQs. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-24-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:51 -04:00
Dragos Tatulea	2638134f71	vdpa/mlx5: Don't reset VQs more than necessary The vdpa device can be reset many times in sequence without any significant state changes in between. Previously this was not a problem: VQs were torn down only on first reset. But after VQ pre-creation was introduced, each reset will delete and re-create the hardware VQs and their associated resources. To solve this problem, avoid resetting hardware VQs if the VQs are still in a blank state. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-23-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:50 -04:00
Dragos Tatulea	0fe963d6fc	vdpa/mlx5: Re-create HW VQs under certain conditions There are a few conditions under which the hardware VQs need a full teardown and setup: - VQ size changed to something else than default value. Hardware VQ size modification is not supported. - User turns off certain device features: mergeable buffers, checksum virtio 1.0 compliance. In these cases, the TIR and RQT need to be re-created. Add a needs_teardown configuration variable and set it when detecting the above scenarios. On next DRIVER_OK, the resources will be torn down first. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-22-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:50 -04:00
Dragos Tatulea	ffb1aae43e	vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time Currently, hardware VQs are created right when the vdpa device gets into DRIVER_OK state. That is easier because most of the VQ state is known by then. This patch switches to creating all VQs and their associated resources at device creation time. The motivation is to reduce the vdpa device live migration downtime by moving the expensive operation of creating all the hardware VQs and their associated resources out of downtime on the destination VM. The VQs are now created in a blank state. The VQ configuration will happen later, on DRIVER_OK. Then the configuration will be applied when the VQs are moved to the Ready state. When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is needed: now that the VQ is already created a resume_vq() will be triggered too early when no mr has been configured yet. Skip calling resume_vq() in this case, let it be handled during DRIVER_OK. For virtio-vdpa, the device configuration is done earlier during .vdpa_dev_add() by vdpa_register_device(). Avoid calling setup_vq_resources() a second time in that case. On a 64 CPU, 256 GB VM with 1 vDPA device of 16 VQps, the full VQ resource creation + resume time was ~370ms. Now it's down to 60 ms (only VQ config and resume). The measurements were done on a ConnectX6DX based vDPA device. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-21-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:49 -04:00
Dragos Tatulea	3b3adb3bbf	vdpa/mlx5: Use suspend/resume during VQP change Resume a VQ if it is already created when the number of VQ pairs increases. This is done in preparation for VQ pre-creation which is coming in a later patch. It is necessary because calling setup_vq() on an already created VQ will return early and will not enable the queue. For symmetry, suspend a VQ instead of tearing it down when the number of VQ pairs decreases. But only if the resume operation is supported. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-20-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:49 -04:00
Dragos Tatulea	ac85cd904d	vdpa/mlx5: Forward error in suspend/resume device Start using the suspend/resume_vq() error return codes previously added. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-19-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>	2024-07-09 08:42:49 -04:00
Dragos Tatulea	843250271b	vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq() There are a few more places modifying the VQ to Ready directly. Let's consolidate them into resume_vq(). The redundant warnings for resume_vq() errors can also be dropped. There is one special case that needs to be handled for virtio-vdpa: the initialized flag must be set to true earlier in setup_vq() so that resume_vq() doesn't return early. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-18-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:48 -04:00
Dragos Tatulea	b89bb349f2	vdpa/mlx5: Add error code for suspend/resume VQ Instead of blindly calling suspend/resume_vqs(), make then return error codes. To keep compatibility, keep suspending or resuming VQs on error and return the last error code. The assumption here is that the error code would be the same. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-17-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:48 -04:00
Dragos Tatulea	fc9af25d04	vdpa/mlx5: Accept Init -> Ready VQ transition in resume_vq() Until now resume_vq() was used only for the suspend/resume scenario. This change also allows calling resume_vq() to bring it from Init to Ready state (VQ initialization). Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-16-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>	2024-07-09 08:42:48 -04:00
Dragos Tatulea	e60e9eeb36	vdpa/mlx5: Allow creation of blank VQs Based on the filled flag, create VQs that are filled or blank. Blank VQs will be filled in later through VQ modify. Downstream patches will make use of this to pre-create blank VQs at vdpa device creation. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-15-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>	2024-07-09 08:42:47 -04:00
Dragos Tatulea	ebebaf45e8	vdpa/mlx5: Set mkey modified flags on all VQs Otherwise, when virtqueues are moved from INIT to READY the latest mkey will not be set appropriately. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-14-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:47 -04:00
Dragos Tatulea	1e8dac7bb6	vdpa/mlx5: Start off rqt_size with max VQPs Currently rqt_size is initialized during device flag configuration. That's because it is the earliest moment when device knows if MQ (multi queue) is on or off. Shift this configuration earlier to device creation time. This implies that non-MQ devices will have a larger RQT size. But the configuration will still be correct. This is done in preparation for the pre-creation of hardware virtqueues at device add time. When that change will be added, RQT will be created at device creation time so it needs to be initialized to its max size. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-13-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:46 -04:00
Dragos Tatulea	ad9758fdaf	vdpa/mlx5: Set an initial size on the VQ The virtqueue size is a pre-requisite for setting up any virtqueue resources. For the upcoming optimization of creating virtqueues at device add, the virtqueue size has to be configured. The queue size check in setup_vq() will always be false. So remove it. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-12-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:46 -04:00
Dragos Tatulea	cdc3c7eaae	vdpa/mlx5: Add support for modifying the VQ features field This is done in preparation for the pre-creation of hardware virtqueues at device add time. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-11-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:46 -04:00
Dragos Tatulea	f70080c5bc	vdpa/mlx5: Add support for modifying the virtio_version VQ field This is done in preparation for the pre-creation of hardware virtqueues at device add time. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-10-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:45 -04:00
Dragos Tatulea	4a19f2942a	vdpa/mlx5: Rename init_mvqs Function is used to set default values, so name it accordingly. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-9-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>	2024-07-09 08:42:45 -04:00
Dragos Tatulea	e5bcbd1de6	vdpa/mlx5: Clear and reinitialize software VQ data on reset The hardware VQ configuration is mirrored by data in struct mlx5_vdpa_virtqueue . Instead of clearing just a few fields at reset, fully clear the struct and initialize with the appropriate default values. As clear_vqs_ready() is used only during reset, get rid of it. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-8-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:45 -04:00
Dragos Tatulea	1835ed4a5d	vdpa/mlx5: Initialize and reset device with one queue pair The virtio spec says that a vdpa device should start off with one queue pair. The driver is already compliant. This patch moves the initialization to device add and reset times. This is done in preparation for the pre-creation of hardware virtqueues at device add time. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-7-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>	2024-07-09 08:42:44 -04:00
Dragos Tatulea	a366465b48	vdpa/mlx5: Remove duplicate suspend code Use the dedicated suspend_vqs() function instead. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-6-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:44 -04:00
Dragos Tatulea	34bd86c720	vdpa/mlx5: Iterate over active VQs during suspend/resume No need to iterate over max number of VQs. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-5-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:44 -04:00
Dragos Tatulea	ad80739262	vdpa/mlx5: Drop redundant check in teardown_virtqueues() The check is done inside teardown_vq(). Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-4-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:43 -04:00
Dragos Tatulea	4c90a60ac2	vdpa/mlx5: Drop redundant code Originally, the second loop initialized the CVQ. But (`acde392949` ("vdpa/mlx5: Use consistent RQT size") initialized all the queues in the first loop, so the second iteration in init_mvqs() is never called because the first one will iterate up to max_vqs. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-3-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:43 -04:00
Dragos Tatulea	63f0cbad97	vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical ... by changing the setup_vq_resources() parameter type. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-2-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:43 -04:00
Dragos Tatulea	1f5d6476f1	vdpa/mlx5: Clarify meaning thorough function rename setup_driver()/teardown_driver() are a bit vague. These functions are used for virtqueue resources. Same for alloc_resources()/teardown_resources(): they represent fixed resources that are meant to exist during the device lifetime. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-1-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:42 -04:00
Peter-Jan Gootzen	106e4df120	virtio-fs: improved request latencies when Virtio queue is full Currently, when the Virtio queue is full, a work item is scheduled to execute in 1ms that retries adding the request to the queue. This is a large amount of time on the scale on which a virtio-fs device can operate. When using a DPU this is around 30-40us baseline without going to a remote server (4k, QD=1). This patch changes the retrying behavior to immediately filling the Virtio queue up again when a completion has been received. This reduces the 99.9th percentile latencies in our tests by 60x and slightly increases the overall throughput, when using a workload IO depth 2x the size of the Virtio queue and a DPU-powered virtio-fs device (NVIDIA BlueField DPU). Signed-off-by: Peter-Jan Gootzen <pgootzen@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Yoray Zack <yorayz@nvidia.com> Message-Id: <20240517190435.152096-3-pgootzen@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2024-07-09 08:42:42 -04:00
Peter-Jan Gootzen	2106e1f444	virtio-fs: let -ENOMEM bubble up or burst gently Currently, when the enqueueing of a request or forget operation fails with -ENOMEM, the enqueueing is retried after a timeout. This patch removes this behavior and treats -ENOMEM in these scenarios like any other error. By bubbling up the error to user space in the case of a request, and by dropping the operation in case of a forget. This behavior matches that of the FUSE layer above, and also simplifies the error handling. The latter will come in handy for upcoming patches that optimize the retrying of operations in case of -ENOSPC. Signed-off-by: Peter-Jan Gootzen <pgootzen@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Yoray Zack <yorayz@nvidia.com> Message-Id: <20240517190435.152096-2-pgootzen@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2024-07-09 08:42:41 -04:00
Jeff Johnson	e7909ad6cb	vDPA: add missing MODULE_DESCRIPTION() macros With ARCH=x86, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vdpa/vdpa.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vdpa/ifcvf/ifcvf.o Add the missing invocations of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Message-Id: <20240611-md-drivers-vdpa-v1-1-efaf2de15152@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:41 -04:00
Jeff Johnson	ab0727f3dd	virtio: add missing MODULE_DESCRIPTION() macros With ARCH=sh, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/virtio/virtio.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/virtio/virtio_ring.o Add the missing invocations of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Message-Id: <20240702-md-sh-drivers-virtio-v1-1-cf7325ab6ccc@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:41 -04:00
Jeff Johnson	e400ddf0fb	vringh: add MODULE_DESCRIPTION() Fix the allmodconfig 'make w=1' issue: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vhost/vringh.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Message-Id: <20240516-md-vringh-v1-1-31bf37779a5a@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2024-07-09 08:42:40 -04:00
Zhu Lingshan	9be237df09	MAINTAINERS: Change lingshan's email to kernel.org This commit changes lingshan's email from intel.com to kernel.org. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240514165125.74802-1-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:40 -04:00
Michael S. Tsirkin	7ad4723976	vhost: move smp_rmb() into vhost_get_avail_idx() All callers of vhost_get_avail_idx() use smp_rmb() to order the available ring entry read and avail_idx read. Make vhost_get_avail_idx() call smp_rmb() itself whenever the avail_idx is accessed. This way, the callers don't need to worry about the memory barrier. As a side benefit, we also validate the index on all paths now, which will hopefully help prevent/catch earlier future bugs. Note that current code is inconsistent in how the errors are handled. They are treated as an empty ring in some places, but as non-empty ring in other places. This patch doesn't attempt to change the existing behaviour. No functional change intended. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Will Deacon <will@kernel.org> Message-Id: <20240429232748.642356-1-gshan@redhat.com>	2024-07-09 08:42:40 -04:00
zhenwei pi	fdba68d2ad	virtio_balloon: separate vm events into a function All the VM events related statistics have dependence on 'CONFIG_VM_EVENT_COUNTERS', separate these events into a function to make code clean. Then we can remove 'CONFIG_VM_EVENT_COUNTERS' from 'update_balloon_stats'. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Message-Id: <20240423034109.1552866-2-pizhenwei@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>	2024-07-09 08:42:39 -04:00
Srujana Challa	8b6c724cda	virtio: vdpa: vDPA driver for Marvell OCTEON DPU devices This commit introduces a new vDPA driver specifically designed for managing the virtio control plane over the vDPA bus for OCTEON DPU devices. The driver consists of two layers: 1. Octep HW Layer (Octeon Endpoint): Responsible for handling hardware operations and configurations related to the DPU device. 2. Octep Main Layer: Compliant with the vDPA bus framework, this layer implements device operations for the vDPA bus. It handles device probing, bus attachment, vring operations, and other relevant tasks. Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com> Signed-off-by: Shijith Thotton <sthotton@marvell.com> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240614144659.1776067-1-schalla@marvell.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-09 08:42:39 -04:00
Denis Arefev	e269d79c7d	net: missing check virtio Two missing check in virtio_net_hdr_to_skb() allowed syzbot to crash kernels again 1. After the skb_segment function the buffer may become non-linear (nr_frags != 0), but since the SKBTX_SHARED_FRAG flag is not set anywhere the __skb_linearize function will not be executed, then the buffer will remain non-linear. Then the condition (offset >= skb_headlen(skb)) becomes true, which causes WARN_ON_ONCE in skb_checksum_help. 2. The struct sk_buff and struct virtio_net_hdr members must be mathematically related. (gso_size) must be greater than (needed) otherwise WARN_ON_ONCE. (remainder) must be greater than (needed) otherwise WARN_ON_ONCE. (remainder) may be 0 if division is without remainder. offset+2 (4191) > skb_headlen() (1116) WARNING: CPU: 1 PID: 5084 at net/core/dev.c:3303 skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303 Modules linked in: CPU: 1 PID: 5084 Comm: syz-executor336 Not tainted 6.7.0-rc3-syzkaller-00014-gdf60cee26a2e #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 RIP: 0010:skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303 Code: 89 e8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 52 01 00 00 44 89 e2 2b 53 74 4c 89 ee 48 c7 c7 40 57 e9 8b e8 af 8f dd f8 90 <0f> 0b 90 90 e9 87 fe ff ff e8 40 0f 6e f9 e9 4b fa ff ff 48 89 ef RSP: 0018:ffffc90003a9f338 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff888025125780 RCX: ffffffff814db209 RDX: ffff888015393b80 RSI: ffffffff814db216 RDI: 0000000000000001 RBP: ffff8880251257f4 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 000000000000045c R13: 000000000000105f R14: ffff8880251257f0 R15: 000000000000105d FS: 0000555555c24380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000002000f000 CR3: 0000000023151000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ip_do_fragment+0xa1b/0x18b0 net/ipv4/ip_output.c:777 ip_fragment.constprop.0+0x161/0x230 net/ipv4/ip_output.c:584 ip_finish_output_gso net/ipv4/ip_output.c:286 [inline] __ip_finish_output net/ipv4/ip_output.c:308 [inline] __ip_finish_output+0x49c/0x650 net/ipv4/ip_output.c:295 ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip_output+0x13b/0x2a0 net/ipv4/ip_output.c:433 dst_output include/net/dst.h:451 [inline] ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:129 iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82 ipip6_tunnel_xmit net/ipv6/sit.c:1034 [inline] sit_tunnel_xmit+0xed2/0x28f0 net/ipv6/sit.c:1076 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3545 [inline] dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3561 __dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4346 dev_queue_xmit include/linux/netdevice.h:3134 [inline] packet_xmit+0x257/0x380 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3087 [inline] packet_sendmsg+0x24ca/0x5240 net/packet/af_packet.c:3119 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0xd5/0x180 net/socket.c:745 __sys_sendto+0x255/0x340 net/socket.c:2190 __do_sys_sendto net/socket.c:2202 [inline] __se_sys_sendto net/socket.c:2198 [inline] __x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b Found by Linux Verification Center (linuxtesting.org) with Syzkaller Fixes: `0f6925b3e8` ("virtio_net: Do not pull payload in skb->head") Signed-off-by: Denis Arefev <arefev@swemel.ru> Message-Id: <20240613095448.27118-1-arefev@swemel.ru> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-04 11:00:31 -04:00
Yunseong Kim	ede9c33ec5	tools/virtio: creating pipe assertion in vringh_test parallel_test() function in vringh_test needs to verify the creation of the guest/host pipe. Signed-off-by: Yunseong Kim <yskelg@gmail.com> Message-Id: <20240624174905.27980-2-yskelg@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-04 11:00:31 -04:00
Xuan Zhuo	840b2d39a2	virtio_ring: fix KMSAN error for premapped mode Add kmsan for virtqueue_dma_map_single_attrs to fix: BUG: KMSAN: uninit-value in receive_buf+0x45ca/0x6990 receive_buf+0x45ca/0x6990 virtnet_poll+0x17e0/0x3130 net_rx_action+0x832/0x26e0 handle_softirqs+0x330/0x10f0 [...] Uninit was created at: __alloc_pages_noprof+0x62a/0xe60 alloc_pages_noprof+0x392/0x830 skb_page_frag_refill+0x21a/0x5c0 virtnet_rq_alloc+0x50/0x1500 try_fill_recv+0x372/0x54c0 virtnet_open+0x210/0xbe0 __dev_open+0x56e/0x920 __dev_change_flags+0x39c/0x2000 dev_change_flags+0xaa/0x200 do_setlink+0x197a/0x7420 rtnl_setlink+0x77c/0x860 [...] Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Tested-by: Alexander Potapenko <glider@google.com> Message-Id: <20240606111345.93600-1-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x Acked-by: Jason Wang <jasowang@redhat.com>	2024-07-04 11:00:31 -04:00
Michael S. Tsirkin	1e1fdcbdde	vhost/vsock: always initialize seqpacket_allow There are two issues around seqpacket_allow: 1. seqpacket_allow is not initialized when socket is created. Thus if features are never set, it will be read uninitialized. 2. if VIRTIO_VSOCK_F_SEQPACKET is set and then cleared, then seqpacket_allow will not be cleared appropriately (existing apps I know about don't usually do this but it's legal and there's no way to be sure no one relies on this). To fix: - initialize seqpacket_allow after allocation - set it unconditionally in set_features Reported-by: syzbot+6c21aeb59d0e82eb2782@syzkaller.appspotmail.com Reported-by: Jeongjun Park <aha310510@gmail.com> Fixes: `ced7b71371` ("vhost/vsock: support SEQPACKET for transport"). Tested-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Cc: David S. Miller <davem@davemloft.net> Cc: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240422100010-mutt-send-email-mst@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jakub Kicinski <kuba@kernel.org>	2024-07-04 11:00:31 -04:00
Linus Torvalds	e9d22f7a66	Merge tag 'linux_kselftest-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "One single patch to fix the non-contiguous CBM resctrl: - AMD supports non-contiguous CBM but does not report it via CPUID. This test should not use CPUID on AMD to detect non-contiguous CBM support. Fix the problem so the test uses CPUID to discover non-contiguous CBM support only on Intel" * tag 'linux_kselftest-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/resctrl: Fix non-contiguous CBM for AMD	2024-07-02 13:53:24 -07:00
Linus Torvalds	dbd8132ace	Merge tag 'vfs-6.10-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "VFS: - Improve handling of deep ancestor chains in is_subdir() - Release locks cleanly when fctnl_setlk() races with close(). When setting a file lock fails the VFS tries to cleanup the already created lock. The helper used for this calls back into the LSM layer which may cause it to fail, leaving the stale lock accessible via /proc/locks. AFS: - Fix a comma/semicolon typo" * tag 'vfs-6.10-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Convert comma to semicolon fs: better handle deep ancestor chains in is_subdir() filelock: Remove locks reliably when fcntl/close race is detected	2024-07-02 13:43:02 -07:00
Chen Ni	655593a40e	afs: Convert comma to semicolon Replace a comma between expression statements by a semicolon. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://lore.kernel.org/r/20240702024055.1411407-1-nichen@iscas.ac.cn/ Link: https://lore.kernel.org/r/20240702024055.1411407-1-nichen@iscas.ac.cn Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-02 21:23:00 +02:00
Christian Brauner	391b59b045	fs: better handle deep ancestor chains in is_subdir() Jan reported that 'cd ..' may take a long time in deep directory hierarchies under a bind-mount. If concurrent renames happen it is possible to livelock in is_subdir() because it will keep retrying. Change is_subdir() from simply retrying over and over to retry once and then acquire the rename lock to handle deep ancestor chains better. The list of alternatives to this approach were less then pleasant. Change the scope of rcu lock to cover the whole walk while at it. A big thanks to Jan and Linus. Both Jan and Linus had proposed effectively the same thing just that one version ended up being slightly more elegant. Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-02 21:18:32 +02:00
Linus Torvalds	734610514c	Merge tag 'erofs-for-6.10-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "The most important one fixes possible infinite loops reported by a smartphone vendor OPPO recently due to some unexpected zero-sized compressed pcluster out of interrupted I/Os, storage failures, etc. Another patch fixes global buffer memory leak on unloading, and the remaining one switches to use super_set_uuid() to keep with the other filesystems. Summary: - Fix possible global buffer memory leak when unloading EROFS module - Fix FS_IOC_GETFSUUID ioctl by using super_set_uuid() - Reset m_llen to 0 so then it can retry if metadata is invalid" * tag 'erofs-for-6.10-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: ensure m_llen is reset to 0 if metadata is invalid erofs: convert to use super_set_uuid to support for FS_IOC_GETFSUUID erofs: fix possible memory leak in z_erofs_gbuf_exit()	2024-07-02 11:59:34 -07:00
Jann Horn	3cad1bc010	filelock: Remove locks reliably when fcntl/close race is detected When fcntl_setlk() races with close(), it removes the created lock with do_lock_file_wait(). However, LSMs can allow the first do_lock_file_wait() that created the lock while denying the second do_lock_file_wait() that tries to remove the lock. In theory (but AFAIK not in practice), posix_lock_file() could also fail to remove a lock due to GFP_KERNEL allocation failure (when splitting a range in the middle). After the bug has been triggered, use-after-free reads will occur in lock_get_status() when userspace reads /proc/locks. This can likely be used to read arbitrary kernel memory, but can't corrupt kernel memory. This only affects systems with SELinux / Smack / AppArmor / BPF-LSM in enforcing mode and only works from some security contexts. Fix it by calling locks_remove_posix() instead, which is designed to reliably get rid of POSIX locks associated with the given file and files_struct and is also used by filp_flush(). Fixes: `c293621bbf` ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240702-fs-lock-recover-2-v1-1-edd456f63789@google.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-02 20:48:14 +02:00
Linus Torvalds	1dfe225e9a	Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "A couple of error leg problems, one affecting scsi_debug and the other affecting pure SAS (i.e. not SATA) SCSI expanders" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed scsi: scsi_debug: Fix create target debugfs failure	2024-07-01 22:57:03 -07:00
Linus Torvalds	73e931504f	Merge tag 'cxl-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dave Jiang: - Fix no cxl_nvd during pmem region auto-assemble - Avoid NULLL pointer dereference in region lookup - Add missing checks to interleave capability - Add cxl kdoc fix to address document compilation error * tag 'cxl-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl: documentation: add missing files to cxl driver-api cxl/region: check interleave capability cxl/region: Avoid null pointer dereference in region lookup cxl/mem: Fix no cxl_nvd during pmem region auto-assembling	2024-07-01 13:03:30 -07:00
Linus Torvalds	cfbc0ffea8	Merge tag 'for-6.10-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A fixup for a recent fix that prevents an infinite loop during block group reclaim. Unfortunately it introduced an unsafe way of updating block group list and could race with relocation. This could be hit on fast devices when relocation/balance does not have enough space" * tag 'for-6.10-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix adding block group to a reclaim list and the unused list during reclaim	2024-07-01 12:48:28 -07:00
Linus Torvalds	9903efbddb	Merge tag 'asm-generic-fixes-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic fix from Arnd Bergmann: "This fixes up a last minute build regression from the previous set of bug fixes" * tag 'asm-generic-fixes-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: syscalls: fix sys_fanotify_mark prototype	2024-07-01 09:41:58 -07:00
Linus Torvalds	651ab78190	Merge tag 'arm-fixes-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "A number of devicetree fixes came in for the rockchip platforms, correcting some of the address information, and reverting a change to the MMC controller configuration that caused regressions. Four drivers have one code change each, addressing minor build issues for the optee firmware driver, the litex SoC platform driver and two reset drivers. The riscv fixes as also simple, mainly turning off device nodes in the canaan dts files unless they are actually usable on a particular board. Finally, Drew takes over maintaining the THEAD RISC-V SoC platform" * tag 'arm-fixes-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: drivers/soc/litex: drop obsolete dependency on COMPILE_TEST tee: optee: ffa: Fix missing-field-initializers warning arm64: dts: rockchip: Add sound-dai-cells for RK3368 arm64: dts: rockchip: Fix the i2c address of es8316 on Cool Pi 4B reset: hisilicon: hi6220: add missing MODULE_DESCRIPTION() macro reset: gpio: Fix missing gpiolib dependency for GPIO reset controller MAINTAINERS: thead: update Maintainer arm64: dts: rockchip: fix PMIC interrupt pin on ROCK Pi E riscv: dts: starfive: Set EMMC vqmmc maximum voltage to 3.3V on JH7110 boards arm64: dts: rockchip: make poweroff(8) work on Radxa ROCK 5A Revert "arm64: dts: rockchip: remove redundant cd-gpios from rk3588 sdmmc nodes" ARM: dts: rockchip: rk3066a: add #sound-dai-cells to hdmi node arm64: dts: rockchip: Fix the value of `dlg,jack-det-rate` mismatch on rk3399-gru arm64: dts: rockchip: set correct pwm0 pinctrl on rk3588-tiger riscv: dts: canaan: Disable I/O devices unless used riscv: dts: canaan: Clean up serial aliases arm64: dts: rockchip: Rename LED related pinctrl nodes on rk3308-rock-pi-s arm64: dts: rockchip: Fix SD NAND and eMMC init on rk3308-rock-pi-s arm64: dts: rockchip: Fix rk3308 codec@ff560000 reset-names arm64: dts: rockchip: Fix the DCDC_REG2 minimum voltage on Quartz64 Model B	2024-07-01 09:36:20 -07:00

1 2 3 4 5 ...

1281117 Commits