Commit Graph

1262975 Commits

Author SHA1 Message Date
Linus Torvalds
f9c035492f Merge tag 's390-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:

 - Various virtual vs physical address usage fixes

 - Add new bitwise types and helper functions and use them in s390
   specific drivers and code to make it easier to find virtual vs
   physical address usage bugs.

   Right now virtual and physical addresses are identical for s390,
   except for module, vmalloc, and similar areas. This will be changed,
   hopefully with the next merge window, so that e.g. the kernel image
   and modules will be located close to each other, allowing for direct
   branches and also for some other simplifications.

   As a prerequisite this requires to fix all misuses of virtual and
   physical addresses. As it turned out people are so used to the
   concept that virtual and physical addresses are the same, that new
   bugs got added to code which was already fixed. In order to avoid
   that even more code gets merged which adds such bugs add and use new
   bitwise types, so that sparse can be used to find such usage bugs.

   Most likely the new types can go away again after some time

 - Provide a simple ARCH_HAS_DEBUG_VIRTUAL implementation

 - Fix kprobe branch handling: if an out-of-line single stepped relative
   branch instruction has a target address within a certain address area
   in the entry code, the program check handler may incorrectly execute
   cleanup code as if KVM code was executed, leading to crashes

 - Fix reference counting of zcrypt card objects

 - Various other small fixes and cleanups

* tag 's390-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
  s390/entry: compare gmap asce to determine guest/host fault
  s390/entry: remove OUTSIDE macro
  s390/entry: add CIF_SIE flag and remove sie64a() address check
  s390/cio: use while (i--) pattern to clean up
  s390/raw3270: make class3270 constant
  s390/raw3270: improve raw3270_init() readability
  s390/tape: make tape_class constant
  s390/vmlogrdr: make vmlogrdr_class constant
  s390/vmur: make vmur_class constant
  s390/zcrypt: make zcrypt_class constant
  s390/mm: provide simple ARCH_HAS_DEBUG_VIRTUAL support
  s390/vfio_ccw_cp: use new address translation helpers
  s390/iucv: use new address translation helpers
  s390/ctcm: use new address translation helpers
  s390/lcs: use new address translation helpers
  s390/qeth: use new address translation helpers
  s390/zfcp: use new address translation helpers
  s390/tape: fix virtual vs physical address confusion
  s390/3270: use new address translation helpers
  s390/3215: use new address translation helpers
  ...
2024-03-19 11:38:27 -07:00
Steven Rostedt (Google)
24f5bb9f24 tracing: Just use strcmp() for testing __string() and __assign_str() match
As __assign_str() no longer uses its "src" parameter, there's a check to
make sure nothing depends on it being different than what was passed to
__string(). It originally just compared the pointer passed to __string()
with the pointer passed into __assign_str() via the "src" parameter. But
there's a couple of outliers that just pass in a quoted string constant,
where comparing the pointers is UB to the compiler, as the compiler is
free to create multiple copies of the same string constant.

Instead, just use strcmp(). It may slow down the trace event, but this
will eventually be removed.

Also, fix the issue of passing NULL to strcmp() by adding a WARN_ON() to
make sure that both "src" and the pointer saved in __string() are either
both NULL or have content, and then checking if "src" is not NULL before
performing the strcmp().

Link: https://lore.kernel.org/all/CAHk-=wjxX16kWd=uxG5wzqt=aXoYDf1BgWOKk+qVmAO0zh7sjA@mail.gmail.com/

Fixes: b1afefa62c ("tracing: Use strcmp() in __assign_str() WARN_ON() check")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-19 11:23:30 -07:00
Linus Torvalds
fbd88dd057 Merge tag 'pm-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
 "These update the Energy Model to make it prevent errors due to power
  unit mismatches, fix a typo in power management documentation, convert
  one driver to using a platform remove callback returning void, address
  two cpufreq issues (one in the core and one in the DT driver), and
  enable boost support in the SCMI cpufreq driver.

  Specifics:

   - Modify the Energy Model code to bail out and complain if the unit
     of power is not uW to prevent errors due to unit mismatches (Lukasz
     Luba)

   - Make the intel_rapl platform driver use a remove callback returning
     void (Uwe Kleine-König)

   - Fix typo in the suspend and interrupts document (Saravana Kannan)

   - Make per-policy boost flags actually take effect on platforms using
     cpufreq_boost_set_sw() (Sibi Sankar)

   - Enable boost support in the SCMI cpufreq driver (Sibi Sankar)

   - Make the DT cpufreq driver use zalloc_cpumask_var() for allocating
     cpumasks to avoid using unitinialized memory (Marek Szyprowski)"

* tag 'pm-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: scmi: Enable boost support
  firmware: arm_scmi: Add support for marking certain frequencies as turbo
  cpufreq: dt: always allocate zeroed cpumask
  cpufreq: Fix per-policy boost behavior on SoCs using cpufreq_boost_set_sw()
  Documentation: power: Fix typo in suspend and interrupts doc
  PM: EM: Force device drivers to provide power in uW
  powercap: intel_rapl: Convert to platform remove callback returning void
2024-03-19 11:19:36 -07:00
Linus Torvalds
6d37f7e7d1 Merge tag 'acpi-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
 "These update ACPI documentation and kerneldoc comments.

  Specifics:

   - Add markup to generate links from footnotes in the ACPI enumeration
     document (Chris Packham)

   - Update the handle_eject_request() kerneldoc comment to document the
     arguments of the function and improve kerneldoc comments for ACPI
     suspend and hibernation functions (Yang Li)"

* tag 'acpi-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: PM: Improve kerneldoc comments for suspend and hibernation functions
  ACPI: docs: enumeration: Make footnotes links
  ACPI: Document handle_eject_request() arguments
2024-03-19 11:15:14 -07:00
Linus Torvalds
ed302ad52b Merge tag 'thermal-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more thermal control updates from Rafael Wysocki:
 "These update thermal drivers for ARM platforms by adding new hardware
  support (r8a779h0, H616 THS), addressing issues (Mediatek LVTS,
  Mediatek MT7896, thermal-of) and cleaning up code.

  Specifics:

   - Fix memory leak in the error path at probe time in the Mediatek
     LVTS driver (Christophe Jaillet)

   - Fix control buffer enablement regression on Meditek MT7896 (Frank
     Wunderlich)

   - Drop spaces before TABs in different places: thermal-of, ST drivers
     and Makefile (Geert Uytterhoeven)

   - Adjust DT binding for NXP as fsl,tmu-range min/maxItems can vary
     among several SoC versions (Fabio Estevam)

   - Add support for the H616 THS controller on Sun8i platforms (Martin
     Botka)

   - Don't fail probe due to zone registration failure because there is
     no trip points defined in the DT (Mark Brown)

   - Support variable TMU array size for new platforms (Peng Fan)

   - Adjust the DT binding for thermal-of and make the polling time not
     required and assume it is zero when not found in the DT (Konrad
     Dybcio)

   - Add r8a779h0 support in both the DT and the rcar_gen3 driver (Geert
     Uytterhoeven)"

* tag 'thermal-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal/drivers/rcar_gen3: Add support for R-Car V4M
  dt-bindings: thermal: rcar-gen3-thermal: Add r8a779h0 support
  thermal/of: Assume polling-delay(-passive) 0 when absent
  dt-bindings: thermal-zones: Don't require polling-delay(-passive)
  thermal/drivers/qoriq: Fix getting tmu range
  thermal/drivers/sun8i: Don't fail probe due to zone registration failure
  thermal/drivers/sun8i: Add support for H616 THS controller
  thermal/drivers/sun8i: Add SRAM register access code
  thermal/drivers/sun8i: Extend H6 calibration to support 4 sensors
  thermal/drivers/sun8i: Explain unknown H6 register value
  dt-bindings: thermal: sun8i: Add H616 THS controller
  soc: sunxi: sram: export register 0 for THS on H616
  dt-bindings: thermal: qoriq-thermal: Adjust fsl,tmu-range min/maxItems
  thermal: Drop spaces before TABs
  thermal/drivers/mediatek: Fix control buffer enablement on MT7896
  thermal/drivers/mediatek/lvts_thermal: Fix a memory leak in an error handling path
2024-03-19 11:11:01 -07:00
Linus Torvalds
2f3c2b3976 Merge tag 'ata-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
 "A single fix for ASMedia HBAs.

  These HBAs do not indicate that they support SATA Port Multipliers
  CAP.SPM (Supports Port Multiplier) is not set.

  Likewise, they do not allow you to probe the devices behind an
  attached PMP, as defined according to the SATA-IO PMP specification.

  Instead, they have decided to implement their own version of PMP,
  and because of this, plugging in a PMP actually works, even if the
  HBA claims that it does not support PMP.

  Revert a recent quirk for these HBAs, as that breaks ASMedia's own
  implementation of PMP.

  Unfortunately, this will once again give some users of these HBAs
  significantly increased boot time. However, a longer boot time for
  some, is the lesser evil compared to some other users not being able
  to detect their drives at all"

* tag 'ata-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ahci: asm1064: asm1166: don't limit reported ports
2024-03-19 11:05:34 -07:00
Linus Torvalds
d95fcdf496 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:

 - Per vq sizes in vdpa

 - Info query for block devices support in vdpa

 - DMA sync callbacks in vduse

 - Fixes, cleanups

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (35 commits)
  virtio_net: rename free_old_xmit_skbs to free_old_xmit
  virtio_net: unify the code for recycling the xmit ptr
  virtio-net: add cond_resched() to the command waiting loop
  virtio-net: convert rx mode setting to use workqueue
  virtio: packed: fix unmap leak for indirect desc table
  vDPA: report virtio-blk flush info to user space
  vDPA: report virtio-block read-only info to user space
  vDPA: report virtio-block write zeroes configuration to user space
  vDPA: report virtio-block discarding configuration to user space
  vDPA: report virtio-block topology info to user space
  vDPA: report virtio-block MQ info to user space
  vDPA: report virtio-block max segments in a request to user space
  vDPA: report virtio-block block-size to user space
  vDPA: report virtio-block max segment size to user space
  vDPA: report virtio-block capacity to user space
  virtio: make virtio_bus const
  vdpa: make vdpa_bus const
  vDPA/ifcvf: implement vdpa_config_ops.get_vq_num_min
  vDPA/ifcvf: get_max_vq_size to return max size
  virtio_vdpa: create vqs with the actual size
  ...
2024-03-19 08:57:39 -07:00
Linus Torvalds
0815d5cc7d Merge tag 'for-linus-6.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:

 - Xen event channel handling fix for a regression with a rare kernel
   config and some added hardening

 - better support of running Xen dom0 in PVH mode

 - a cleanup for the xen grant-dma-iommu driver

* tag 'for-linus-6.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/events: increment refcnt only if event channel is refcounted
  xen/evtchn: avoid WARN() when unbinding an event channel
  x86/xen: attempt to inflate the memory balloon on PVH
  xen/grant-dma-iommu: Convert to platform remove callback returning void
2024-03-19 08:48:09 -07:00
Rafael J. Wysocki
a6d6590917 Merge branches 'pm-em', 'pm-powercap' and 'pm-sleep'
Merge additional updates related to the Energy Model, power capping
and system-wide power management for 6.9-rc1:

 - Modify the Energy Model code to bail out and complain if the unit of
   power is not uW to prevent errors due to unit mismatches (Lukasz
   Luba).

 - Make the intel_rapl platform driver use a remove callback returning
   void (Uwe Kleine-König).

 - Fix typo in the suspend and interrupts document (Saravana Kannan).

* pm-em:
  PM: EM: Force device drivers to provide power in uW

* pm-powercap:
  powercap: intel_rapl: Convert to platform remove callback returning void

* pm-sleep:
  Documentation: power: Fix typo in suspend and interrupts doc
2024-03-19 13:25:49 +01:00
Rafael J. Wysocki
a873add22a Merge branch 'acpi-docs'
Merge an ACPI documentation update for 6.9-rc1 which adds markup to
generate links from footnotes in the enumeration document.

* acpi-docs:
  ACPI: docs: enumeration: Make footnotes links
2024-03-19 13:16:15 +01:00
Conrad Kostecki
6cd8adc3e1 ahci: asm1064: asm1166: don't limit reported ports
Previously, patches have been added to limit the reported count of SATA
ports for asm1064 and asm1166 SATA controllers, as those controllers do
report more ports than physically having.

While it is allowed to report more ports than physically having in CAP.NP,
it is not allowed to report more ports than physically having in the PI
(Ports Implemented) register, which is what these HBAs do.
(This is a AHCI spec violation.)

Unfortunately, it seems that the PMP implementation in these ASMedia HBAs
is also violating the AHCI and SATA-IO PMP specification.

What these HBAs do is that they do not report that they support PMP
(CAP.SPM (Supports Port Multiplier) is not set).

Instead, they have decided to add extra "virtual" ports in the PI register
that is used if a port multiplier is connected to any of the physical
ports of the HBA.

Enumerating the devices behind the PMP as specified in the AHCI and
SATA-IO specifications, by using PMP READ and PMP WRITE commands to the
physical ports of the HBA is not possible, you have to use the "virtual"
ports.

This is of course bad, because this gives us no way to detect the device
and vendor ID of the PMP actually connected to the HBA, which means that
we can not apply the proper PMP quirks for the PMP that is connected to
the HBA.

Limiting the port map will thus stop these controllers from working with
SATA Port Multipliers.

This patch reverts both patches for asm1064 and asm1166, so old behavior
is restored and SATA PMP will work again, but it will also reintroduce the
(minutes long) extra boot time for the ASMedia controllers that do not
have a PMP connected (either on the PCIe card itself, or an external PMP).

However, a longer boot time for some, is the lesser evil compared to some
other users not being able to detect their drives at all.

Fixes: 0077a504e1 ("ahci: asm1166: correct count of reported ports")
Fixes: 9815e39617 ("ahci: asm1064: correct count of reported ports")
Cc: stable@vger.kernel.org
Reported-by: Matt <cryptearth@googlemail.com>
Signed-off-by: Conrad Kostecki <conikost@gentoo.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
[cassel: rewrote commit message]
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-03-19 12:06:54 +01:00
Xuan Zhuo
5da7137de7 virtio_net: rename free_old_xmit_skbs to free_old_xmit
Since free_old_xmit_skbs not only deals with skb, but also xdp frame and
subsequent added xsk, so change the name of this function to
free_old_xmit.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20240229072044.77388-19-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 03:19:22 -04:00
Xuan Zhuo
b1dc24aba7 virtio_net: unify the code for recycling the xmit ptr
There are two completely similar and independent implementations. This
is inconvenient for the subsequent addition of new types. So extract a
function from this piece of code and call this function uniformly to
recover old xmit ptr.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20240229072044.77388-18-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 03:19:22 -04:00
Jason Wang
0d197a1471 virtio-net: add cond_resched() to the command waiting loop
Adding cond_resched() to the command waiting loop for a better
co-operation with the scheduler. This allows to give CPU a breath to
run other task(workqueue) instead of busy looping when preemption is
not allowed on a device whose CVQ might be slow.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230720083839.481487-3-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
2024-03-19 03:19:22 -04:00
Jason Wang
b9f7425239 virtio-net: convert rx mode setting to use workqueue
This patch convert rx mode setting to be done in a workqueue, this is
a must for allow to sleep when waiting for the cvq command to
response since current code is executed under addr spin lock.

Note that we need to disable and flush the workqueue during freeze,
this means the rx mode setting is lost after resuming. This is not the
bug of this patch as we never try to restore rx mode setting during
resume.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230720083839.481487-2-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
2024-03-19 03:19:22 -04:00
Xuan Zhuo
d5c0ed17fe virtio: packed: fix unmap leak for indirect desc table
When use_dma_api and premapped are true, then the do_unmap is false.

Because the do_unmap is false, vring_unmap_extra_packed is not called by
detach_buf_packed.

  if (unlikely(vq->do_unmap)) {
                curr = id;
                for (i = 0; i < state->num; i++) {
                        vring_unmap_extra_packed(vq,
                                                 &vq->packed.desc_extra[curr]);
                        curr = vq->packed.desc_extra[curr].next;
                }
  }

So the indirect desc table is not unmapped. This causes the unmap leak.

So here, we check vq->use_dma_api instead. Synchronously, dma info is
updated based on use_dma_api judgment

This bug does not occur, because no driver use the premapped with
indirect.

Fixes: b319940f83 ("virtio_ring: skip unmap for premapped")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20240223071833.26095-1-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 03:19:22 -04:00
Zhu Lingshan
1ac61ddfee vDPA: report virtio-blk flush info to user space
This commit reports whether a virtio-blk device
support cache flush command to user space

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-11-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:51 -04:00
Zhu Lingshan
ae1374b7f7 vDPA: report virtio-block read-only info to user space
This commit report read-only information of
virtio-blk devices to user space.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-10-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:51 -04:00
Zhu Lingshan
6bdc7846e6 vDPA: report virtio-block write zeroes configuration to user space
This commits reports write zeroes configuration of
virtio-block devices to user space, includes:
1)maximum write zeroes sectors size
2)maximum write zeroes segment number

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-9-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:51 -04:00
Zhu Lingshan
65848f46e1 vDPA: report virtio-block discarding configuration to user space
This commit reports virtio-blk discarding configuration
to user space,includes:
1) the maximum discard sectors
2) maximum number of discard segments for the block driver to use
3) the alignment for splitting a discarding request

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-8-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:51 -04:00
Zhu Lingshan
c9d989b4ab vDPA: report virtio-block topology info to user space
This commit allows vDPA reporting topology information of
virtio-blk devices to user space, includes:
1) the number of logical blocks per physical block
2) offset of first aligned logical block
3) suggested minimum I/O size in blocks
4) optimal (suggested maximum) I/O size in blocks

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-7-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:51 -04:00
Zhu Lingshan
54fb04b02e vDPA: report virtio-block MQ info to user space
This commits allows vDPA reporting virtio-block multi-queue
configuration to user sapce.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-6-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:51 -04:00
Zhu Lingshan
81f64e1d91 vDPA: report virtio-block max segments in a request to user space
This commit allows vDPA reporting the maximum number of
segments in a request of virtio-block devices to
user space.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-5-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:51 -04:00
Zhu Lingshan
3a1d33fb7e vDPA: report virtio-block block-size to user space
This commit allows reporting the block size of a
virtio-block device to user space.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-4-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:51 -04:00
Zhu Lingshan
330b8aea69 vDPA: report virtio-block max segment size to user space
This commit allows reporting the max size of any
single segment of virtio-block devices to user space.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-3-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:51 -04:00
Zhu Lingshan
c2475a9a78 vDPA: report virtio-block capacity to user space
This commit allows userspace to query capacity of
a virtio-block device.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-2-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:51 -04:00
Ricardo B. Marliere
2b666ee296 virtio: make virtio_bus const
Now that the driver core can properly handle constant struct bus_type,
move the virtio_bus variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Message-Id: <20240204-bus_cleanup-virtio-v1-1-3bcb2212aaa0@marliere.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Jason Wang <jasowang@redhat.com>
2024-03-19 02:45:50 -04:00
Ricardo B. Marliere
8169ed6220 vdpa: make vdpa_bus const
Now that the driver core can properly handle constant struct bus_type,
move the vdpa_bus variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Message-Id: <20240204-bus_cleanup-vdpa-v1-1-1745eccb0a5c@marliere.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-19 02:45:50 -04:00
Zhu Lingshan
56d61ae558 vDPA/ifcvf: implement vdpa_config_ops.get_vq_num_min
IFCVF HW supports operation with vq size less than the max size,
as the spec required.

This commit implements vdpa_config_ops.get_vq_num_min to report
the minimal size of the virtqueues, which gives vDPA framework
a chance to reduce the vring size.

We need at least one descriptor to be functional, but it is better
no less than 64 to meet ceratin performance requirements.
Actually the framework would allocate at least a PAGE for the vq.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-11-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:50 -04:00
Zhu Lingshan
cd21470674 vDPA/ifcvf: get_max_vq_size to return max size
Since we already implemented vdpa_config_ops.get_vq_size,
so get_max_vq_size can return the acutal max size of the
virtqueues other than the max allowed safe size.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-10-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:50 -04:00
Zhu Lingshan
273ae08fa0 virtio_vdpa: create vqs with the actual size
The size of a virtqueue is a per vq configuration,
this commit allows virtio_vdpa to create
virtqueues with the actual size of a specific
vq size that supported by the backend device.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-9-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:50 -04:00
Zhu Lingshan
47e62e6d62 vduse: implement vdpa_config_ops.get_vq_size for vduse
This commit implements get_vq_size for vdpa_config_ops. This
new interface is used to report per vq size.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-8-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:50 -04:00
Zhu Lingshan
f6fa2f7ea0 vdpa_sim: implement vdpa_config_ops.get_vq_size for vDPA simulator
This commit implements vdpa_config_ops.get_vq_size for vDPA
simulator, this new interface can help report per vq size.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-7-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:50 -04:00
Zhu Lingshan
1da13e6470 eni_vdpa: implement vdpa_config_ops.get_vq_size
This commit implements get_vq_size which report
per vq size in vdpa_config_ops

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-6-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:50 -04:00
Zhu Lingshan
a97f9c8fcc vp_vdpa: implement vdpa_config_ops.get_vq_size
This commit implements vdpa_config_ops.get_vq_size in
vp_vdpa, which reports per virtqueue size.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-5-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:50 -04:00
Zhu Lingshan
36503e5e06 vDPA/ifcvf: implement vdpa_config_ops.get_vq_size
This commit implements vdpa_ops.get_vq_size to report
the size of a specific virtqueue.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-4-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:50 -04:00
Zhu Lingshan
0a926fc972 vDPA: introduce get_vq_size to vdpa_config_ops
This commit introduces a new interface get_vq_size to
vDPA config ops, this new interface intends to report
the size of a specific virtqueue

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-3-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:50 -04:00
Zhu Lingshan
1496c47065 vhost-vdpa: uapi to support reporting per vq size
The size of a virtqueue is a per vq configuration.
This commit introduce a new ioctl uAPI to support this flexibility.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-2-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:49 -04:00
Shannon Nelson
ba6faaa68d vdpa/pds: fixes for VF vdpa flr-aer handling
This addresses a couple of things found while testing the FLR and AER
handling with the VFs.
 - release irqs before calling vp_modern_remove()
 - make sure we have a valid struct pointer before using it to release irqs
 - make sure the FW is alive before trying to add a new device

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20240220011050.30913-1-shannon.nelson@amd.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:49 -04:00
Maxime Coquelin
d7b4e3287c vduse: implement DMA sync callbacks
Since commit 295525e29a ("virtio_net: merge dma
operations when filling mergeable buffers"), VDUSE device
require support for DMA's .sync_single_for_cpu() operation
as the memory is non-coherent between the device and CPU
because of the use of a bounce buffer.

This patch implements both .sync_single_for_cpu() and
.sync_single_for_device() callbacks, and also skip bounce
buffer copies during DMA map and unmap operations if the
DMA_ATTR_SKIP_CPU_SYNC attribute is set to avoid extra
copies of the same buffer.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-Id: <20240219170606.587290-1-maxime.coquelin@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:49 -04:00
Jonah Palmer
749a401683 vdpa/mlx5: Allow CVQ size changes
The MLX driver was not updating its control virtqueue size at set_vq_num
and instead always initialized to MLX5_CVQ_MAX_ENT (16) at
setup_cvq_vring.

Qemu would try to set the size to 64 by default, however, because the
CVQ size always was initialized to 16, an error would be thrown when
sending >16 control messages (as used-ring entry 17 is initialized to 0).
For example, starting a guest with x-svq=on and then executing the
following command would produce the error below:

 # for i in {1..20}; do ifconfig eth0 hw ether XX:xx:XX:xx:XX:XX; done

 qemu-system-x86_64: Insufficient written data (0)
 [  435.331223] virtio_net virtio0: Failed to set mac address by vq command.
 SIOCSIFHWADDR: Invalid argument

Acked-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <20240216142502.78095-1-jonah.palmer@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Fixes: 5262912ef3 ("vdpa/mlx5: Add support for control VQ and MAC setting")
2024-03-19 02:45:49 -04:00
Steve Sistare
c4e8b5aed7 vdpa: skip suspend/resume ops if not DRIVER_OK
If a vdpa device is not in state DRIVER_OK, then there is no driver state
to preserve, so no need to call the suspend and resume driver ops.

Suggested-by: Eugenio Perez Martin <eperezma@redhat.com>"
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Message-Id: <1707834358-165470-1-git-send-email-steven.sistare@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
2024-03-19 02:45:49 -04:00
David Hildenbrand
310227f428 virtio: reenable config if freezing device failed
Currently, we don't reenable the config if freezing the device failed.

For example, virtio-mem currently doesn't support suspend+resume, and
trying to freeze the device will always fail. Afterwards, the device
will no longer respond to resize requests, because it won't get notified
about config changes.

Let's fix this by re-enabling the config if freezing fails.

Fixes: 22b7050a02 ("virtio: defer config changed notifications")
Cc: <stable@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20240213135425.795001-1-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:49 -04:00
Steve Sistare
9588e7fc51 vdpa_sim: reset must not run
vdpasim_do_reset sets running to true, which is wrong, as it allows
vdpasim_kick_vq to post work requests before the device has been
configured.  To fix, do not set running until VIRTIO_CONFIG_S_DRIVER_OK
is set.

Fixes: 0c89e2a3a9 ("vdpa_sim: Implement suspend vdpa op")
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <1707517807-137331-1-git-send-email-steven.sistare@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:49 -04:00
Suzuki K Poulose
ec6ecb844d virtio: uapi: Drop __packed attribute in linux/virtio_pci.h
Commit 92792ac752 ("virtio-pci: Introduce admin command sending function")
added "__packed" structures to UAPI header linux/virtio_pci.h. This triggers
build failures in the consumer userspace applications without proper "definition"
of __packed (e.g., kvmtool build fails).

Moreover, the structures are already packed well, and doesn't need explicit
packing, similar to the rest of the structures in all virtio_* headers. Remove
the __packed attribute.

Fixes: 92792ac752 ("virtio-pci: Introduce admin command sending function")
Cc: Feng Liu <feliu@nvidia.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Yishai Hadas <yishaih@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Message-Id: <20240125232039.913606-1-suzuki.poulose@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:49 -04:00
Andrew Melnychenko
f6baca2d32 vhost: Added pad cleanup if vnet_hdr is not present.
When the Qemu launched with vhost but without tap vnet_hdr,
vhost tries to copy vnet_hdr from socket iter with size 0
to the page that may contain some trash.
That trash can be interpreted as unpredictable values for
vnet_hdr.
That leads to dropping some packets and in some cases to
stalling vhost routine when the vhost_net tries to process
packets and fails in a loop.

Qemu options:
  -netdev tap,vhost=on,vnet_hdr=off,...

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Message-Id: <20240115194840.1183077-1-andrew@daynix.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19 02:45:49 -04:00
Linus Torvalds
b3603fcb79 Merge tag 'dlm-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:

 - Fix mistaken variable assignment that caused a refcounting problem

 - Revert a recent change that began using atomic counters where they
   were not needed (for lkb wait_count)

 - Add comments around forced state reset for waiting lock operations
   during recovery

* tag 'dlm-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: add comments about forced waiters reset
  dlm: revert atomic_t lkb_wait_count
  dlm: fix user space lkb refcounting
2024-03-18 15:39:48 -07:00
Linus Torvalds
6207b37eb5 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
 "Very small update this cycle:

   - Minor code improvements in fi, rxe, ipoib, mana, cxgb4, mlx5,
     irdma, rxe, rtrs, mana

   - Simplify the hns hem mechanism

   - Fix EFA's MSI-X allocation in resource constrained configurations

   - Fix a KASN splat in srpt

   - Narrow hns's congestion control selection to QPs granularity and
     allow userspace to select it

   - Solve a parallel module loading race between the CM module and a
     driver module

   - Flexible array cleanup

   - Dump hns's SCC Conext to 'rdma res' for debugging

   - Make mana build page lists for HW objects that require a 0 offset
     correctly

   - Stuck CM ID debugging"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (29 commits)
  RDMA/cm: add timeout to cm_destroy_id wait
  RDMA/mana_ib: Use virtual address in dma regions for MRs
  RDMA/mana_ib: Fix bug in creation of dma regions
  RDMA/hns: Append SCC context to the raw dump of QPC
  RDMA/uverbs: Avoid -Wflex-array-member-not-at-end warnings
  RDMA/hns: Support userspace configuring congestion control algorithm with QP granularity
  RDMA/rtrs-clt: Check strnlen return len in sysfs mpath_policy_store()
  RDMA/uverbs: Remove flexible arrays from struct *_filter
  RDMA/device: Fix a race between mad_client and cm_client init
  RDMA/hns: Fix mis-modifying default congestion control algorithm
  RDMA/rxe: Remove unused 'iova' parameter from rxe_mr_init_user
  RDMA/srpt: Do not register event handler until srpt device is fully setup
  RDMA/irdma: Remove duplicate assignment
  RDMA/efa: Limit EQs to available MSI-X vectors
  RDMA/mlx5: Delete unused mlx5_ib_copy_pas prototype
  RDMA/cxgb4: Delete unused c4iw_ep_redirect prototype
  RDMA/mana_ib: Introduce mana_ib_install_cq_cb helper function
  RDMA/mana_ib: Introduce mana_ib_get_netdev helper function
  RDMA/mana_ib: Introduce mdev_to_gc helper function
  RDMA/hns: Simplify 'struct hns_roce_hem' allocation
  ...
2024-03-18 15:34:03 -07:00
Linus Torvalds
65b64246f2 Merge tag 'ktest-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest updates from Steven Rostedt:

 - Allow variables to contain variables. This makes the shell commands
   have a bit more flexibility to reuse existing variables.

 - Have make_warnings_file in build-only mode require limited variables

   The make_warnings_file test will create a file with all existing
   warnings (which can be used to compare against in builds with new
   commits). Add it to the build-only list that doesn't require other
   variables (like how to reset a machine), as the make_warnings_file
   makes the most sense on build only tests.

* tag 'ktest-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
  ktest: force $buildonly = 1 for 'make_warnings_file' test type
  ktest.pl: Process variables within variables
2024-03-18 15:27:03 -07:00
Linus Torvalds
ad584d73a2 Merge tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
 "Main user visible change:

   - User events can now have "multi formats"

     The current user events have a single format. If another event is
     created with a different format, it will fail to be created. That
     is, once an event name is used, it cannot be used again with a
     different format. This can cause issues if a library is using an
     event and updates its format. An application using the older format
     will prevent an application using the new library from registering
     its event.

     A task could also DOS another application if it knows the event
     names, and it creates events with different formats.

     The multi-format event is in a different name space from the single
     format. Both the event name and its format are the unique
     identifier. This will allow two different applications to use the
     same user event name but with different payloads.

   - Added support to have ftrace_dump_on_oops dump out instances and
     not just the main top level tracing buffer.

  Other changes:

   - Add eventfs_root_inode

     Only the root inode has a dentry that is static (never goes away)
     and stores it upon creation. There's no reason that the thousands
     of other eventfs inodes should have a pointer that never gets set
     in its descriptor. Create a eventfs_root_inode desciptor that has a
     eventfs_inode descriptor and a dentry pointer, and only the root
     inode will use this.

   - Added WARN_ON()s in eventfs

     There's some conditionals remaining in eventfs that should never be
     hit, but instead of removing them, add WARN_ON() around them to
     make sure that they are never hit.

   - Have saved_cmdlines allocation also include the map_cmdline_to_pid
     array

     The saved_cmdlines structure allocates a large amount of data to
     hold its mappings. Within it, it has three arrays. Two are already
     apart of it: map_pid_to_cmdline[] and saved_cmdlines[]. More memory
     can be saved by also including the map_cmdline_to_pid[] array as
     well.

   - Restructure __string() and __assign_str() macros used in
     TRACE_EVENT()

     Dynamic strings in TRACE_EVENT() are declared with:

         __string(name, source)

     And assigned with:

        __assign_str(name, source)

     In the tracepoint callback of the event, the __string() is used to
     get the size needed to allocate on the ring buffer and
     __assign_str() is used to copy the string into the ring buffer.
     There's a helper structure that is created in the TRACE_EVENT()
     macro logic that will hold the string length and its position in
     the ring buffer which is created by __string().

     There are several trace events that have a function to create the
     string to save. This function is executed twice. Once for
     __string() and again for __assign_str(). There's no reason for
     this. The helper structure could also save the string it used in
     __string() and simply copy that into __assign_str() (it also
     already has its length).

     By using the structure to store the source string for the
     assignment, it means that the second argument to __assign_str() is
     no longer needed.

     It will be removed in the next merge window, but for now add a
     warning if the source string given to __string() is different than
     the source string given to __assign_str(), as the source to
     __assign_str() isn't even used and will be going away.

   - Added checks to make sure that the source of __string() is also the
     source of __assign_str() so that it can be safely removed in the
     next merge window.

     Included fixes that the above check found.

   - Other minor clean ups and fixes"

* tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits)
  tracing: Add __string_src() helper to help compilers not to get confused
  tracing: Use strcmp() in __assign_str() WARN_ON() check
  tracepoints: Use WARN() and not WARN_ON() for warnings
  tracing: Use div64_u64() instead of do_div()
  tracing: Support to dump instance traces by ftrace_dump_on_oops
  tracing: Remove second parameter to __assign_rel_str()
  tracing: Add warning if string in __assign_str() does not match __string()
  tracing: Add __string_len() example
  tracing: Remove __assign_str_len()
  ftrace: Fix most kernel-doc warnings
  tracing: Decrement the snapshot if the snapshot trigger fails to register
  tracing: Fix snapshot counter going between two tracers that use it
  tracing: Use EVENT_NULL_STR macro instead of open coding "(null)"
  tracing: Use ? : shortcut in trace macros
  tracing: Do not calculate strlen() twice for __string() fields
  tracing: Rework __assign_str() and __string() to not duplicate getting the string
  cxl/trace: Properly initialize cxl_poison region name
  net: hns3: tracing: fix hclgevf trace event strings
  drm/i915: Add missing ; to __assign_str() macros in tracepoint code
  NFSD: Fix nfsd_clid_class use of __string_len() macro
  ...
2024-03-18 15:11:44 -07:00