linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-10 15:13:44 -04:00

Author	SHA1	Message	Date
David Hildenbrand	ddf4098514	virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY Just like we do with alloc_contig_range(), let's convert all unknown errors to -EBUSY, but WARN so we can look into the issue. For example, offline_pages() could fail with -EINTR, which would be unexpected in our case. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230713145551.2824980-3-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:51:46 -04:00
David Hildenbrand	f504e15b94	virtio-mem: remove unsafe unplug in Big Block Mode (BBM) When "unsafe unplug" is enabled, we don't fake-offline all memory ahead of actual memory offlining using alloc_contig_range(). Instead, we rely on offline_pages() to also perform actual page migration, which might fail or take a very long time. In that case, it's possible to easily run into endless loops that cannot be aborted anymore (as offlining is triggered by a workqueue then): For example, a single (accidentally) permanently unmovable page in ZONE_MOVABLE results in an endless loop. For ZONE_NORMAL, races between isolating the pageblock (and checking for unmovable pages) and concurrent page allocation are possible and similarly result in endless loops. The idea of the unsafe unplug mode was to make it possible to more reliably unplug large memory blocks. However, (a) we really should be tackling that differently, by extending the alloc_contig_range()-based mechanism; and (b) this mode is not the default and as far as I know, it's unused either way. So let's simply get rid of it. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230713145551.2824980-2-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:51:46 -04:00
Shannon Nelson	8efc365b20	pds_vdpa: fix up debugfs feature bit printing Make clearer in debugfs output the difference between the hw feature bits, the features supported through the driver, and the features that have been negotiated. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230711042437.69381-6-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-08-10 15:51:46 -04:00
Allen Hubbe	c0a6c5cbf1	pds_vdpa: alloc irq vectors on DRIVER_OK We were allocating irq vectors at the time the aux dev was probed, but that is before the PCI VF is assigned to a separate iommu domain by vhost_vdpa. Because vhost_vdpa later changes the iommu domain the interrupts do not work. Instead, we can allocate the irq vectors later when we see DRIVER_OK and know that the reassignment of the PCI VF to an iommu domain has already happened. Fixes: `151cc834f3` ("pds_vdpa: add support for vdpa and vdpamgmt interfaces") Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230711042437.69381-5-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:51:45 -04:00
Shannon Nelson	ed88863040	pds_vdpa: clean and reset vqs entries Make sure that we initialize the vqs[] entries the same way both for initial setup and after a vq reset. Fixes: `151cc834f3` ("pds_vdpa: add support for vdpa and vdpamgmt interfaces") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230711042437.69381-4-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-08-10 15:51:45 -04:00
Shannon Nelson	abdf31bd91	pds_vdpa: always allow offering VIRTIO_NET_F_MAC Our driver sets a mac if the HW is 00:..:00 so we need to be sure to advertise VIRTIO_NET_F_MAC even if the HW doesn't. We also need to be sure that virtio_net sees the VIRTIO_NET_F_MAC and doesn't rewrite the mac address that a user may have set with the vdpa utility. After reading the hw_feature bits, add the VIRTIO_NET_F_MAC to the driver's supported_features and use that for reporting what is available. If the HW is not advertising it, be sure to strip the VIRTIO_NET_F_MAC before finishing the feature negotiation. If the user specifies a device_features bitpattern in the vdpa utility without the VIRTIO_NET_F_MAC set, then don't set the mac. Fixes: `151cc834f3` ("pds_vdpa: add support for vdpa and vdpamgmt interfaces") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230711042437.69381-3-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-08-10 15:51:45 -04:00
Allen Hubbe	0cd2c13b1c	pds_vdpa: reset to vdpa specified mac When the vdpa device is reset, also reinitialize it with the mac address that was assigned when the device was added. Fixes: `151cc834f3` ("pds_vdpa: add support for vdpa and vdpamgmt interfaces") Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230711042437.69381-2-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-08-10 15:51:45 -04:00
Hawkins Jiawei	2c507ce90e	virtio-net: Zero max_tx_vq field for VIRTIO_NET_CTRL_MQ_HASH_CONFIG case Kernel uses `struct virtio_net_ctrl_rss` to save command-specific-data for both the VIRTIO_NET_CTRL_MQ_HASH_CONFIG and VIRTIO_NET_CTRL_MQ_RSS_CONFIG commands. According to the VirtIO standard, "Field reserved MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure, defined in 5.1.6.5.7.". Yet for the VIRTIO_NET_CTRL_MQ_HASH_CONFIG command case, the `max_tx_vq` field in struct virtio_net_ctrl_rss, which corresponds to the `reserved` field in struct virtio_net_hash_config, is not zeroed, thereby violating the VirtIO standard. This patch solves this problem by zeroing this field in virtnet_init_default_rss(). Cc: Andrew Melnychenko <andrew@daynix.com> Cc: stable@vger.kernel.org Fixes: `c7114b1249` ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20230810110405.25558-1-yin31149@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-08-10 15:51:45 -04:00
Dragos Tatulea	810b0cc1c2	vdpa/mlx5: Fix crash on shutdown for when no ndev exists The ndev was accessed on shutdown without a check if it actually exists. This triggered the crash pasted below. Instead of doing the ndev check, delete the shutdown handler altogether. The irqs will be released at the parent VF level (mlx5_core). BUG: kernel NULL pointer dereference, address: 0000000000000300 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-rc2_for_upstream_min_debug_2023_07_17_15_05 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017 RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000 RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000 R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] device_shutdown+0x13e/0x1e0 kernel_restart+0x36/0x90 __do_sys_reboot+0x141/0x210 ? vfs_writev+0xcd/0x140 ? handle_mm_fault+0x161/0x260 ? do_writev+0x6b/0x110 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f496990fb56 RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56 RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8 </TASK> CR2: 0000000000000300 ---[ end trace 0000000000000000 ]--- Fixes: `bc9a2b3e68` ("vdpa/mlx5: Support interrupt bypassing") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20230803152648.199297-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:24:29 -04:00
Eugenio Pérez	ad03a0f44c	vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary mlx5_vdpa_destroy_mr can be called from .set_map with data ASID after the control virtqueue ASID iotlb has been populated. The control vq iotlb must not be cleared, since it will not be populated again. So call the ASID aware destroy function which makes sure that the right vq resource is destroyed. Fixes: `8fcd20c307` ("vdpa/mlx5: Support different address spaces for control and data") Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Message-Id: <20230802171231.11001-5-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-08-10 15:24:29 -04:00
Dragos Tatulea	9ee811009a	vdpa/mlx5: Fix mr->initialized semantics The mr->initialized flag is shared between the control vq and data vq part of the mr init/uninit. But if the control vq and data vq get placed in different ASIDs, it can happen that initializing the control vq will prevent the data vq mr from being initialized. This patch consolidates the control and data vq init parts into their own init functions. The mr->initialized will now be used for the data vq only. The control vq currently doesn't need a flag. The uninitializing part is also taken care of: mlx5_vdpa_destroy_mr got split into data and control vq functions which are now also ASID aware. Fixes: `8fcd20c307` ("vdpa/mlx5: Support different address spaces for control and data") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Message-Id: <20230802171231.11001-3-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-08-10 15:24:28 -04:00
Dragos Tatulea	3fe0241933	vdpa/mlx5: Correct default number of queues when MQ is on The standard specifies that the initial number of queues is the default, which is 1 (1 tx, 1 rx). Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230727172354.68243-2-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>	2023-08-10 15:24:28 -04:00
Gal Pressman	df95570464	virtio-vdpa: Fix cpumask memory leak in virtio_vdpa_find_vqs() Free the cpumask allocated by create_affinity_masks() before returning from the function. Fixes: `3dad56823b` ("virtio-vdpa: Support interrupt affinity spreading mechanism") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20230726191036.14324-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>	2023-08-10 15:24:28 -04:00
Maxime Coquelin	7ca26efb09	vduse: Use proper spinlock for IRQ injection The IRQ injection work used spin_lock_irq() to protect the scheduling of the softirq, but spin_lock_bh() should be used. With spin_lock_irq(), we noticed delay of more than 6 seconds between the time a NAPI polling work is scheduled and the time it is executed. Fixes: `c8a6153b6c` ("vduse: Introduce VDUSE - vDPA Device in Userspace") Cc: xieyongji@bytedance.com Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20230705114505.63274-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>	2023-08-10 15:24:28 -04:00
Dragos Tatulea	f46c1e1620	vdpa: Enable strict validation for netlinks ops The previous patches added the missing nla policies that were required for validation to work. Now strict validation on netlink ops can be enabled. This patch does it. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Cc: stable@vger.kernel.org Message-Id: <20230727175757.73988-9-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:24:28 -04:00
Lin Ma	5d6ba607d6	vdpa: Add max vqp attr to vdpa_nl_policy for nlattr length check The vdpa_nl_policy structure is used to validate the nlattr when parsing the incoming nlmsg. It will ensure the attribute being described produces a valid nlattr pointer in info->attrs before entering into each handler in vdpa_nl_ops. That is to say, the missing part in vdpa_nl_policy may lead to illegal nlattr after parsing, which could lead to OOB read just like CVE-2023-3773. This patch adds the missing nla_policy for vdpa max vqp attr to avoid such bugs. Fixes: `ad69dd0bf2` ("vdpa: Introduce query of device config layout") Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: stable@vger.kernel.org Message-Id: <20230727175757.73988-7-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:24:28 -04:00
Lin Ma	b3003e1b54	vdpa: Add queue index attr to vdpa_nl_policy for nlattr length check The vdpa_nl_policy structure is used to validate the nlattr when parsing the incoming nlmsg. It will ensure the attribute being described produces a valid nlattr pointer in info->attrs before entering into each handler in vdpa_nl_ops. That is to say, the missing part in vdpa_nl_policy may lead to illegal nlattr after parsing, which could lead to OOB read just like CVE-2023-3773. This patch adds the missing nla_policy for vdpa queue index attr to avoid such bugs. Fixes: `13b00b1356` ("vdpa: Add support for querying vendor statistics") Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: stable@vger.kernelorg Message-Id: <20230727175757.73988-5-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:24:28 -04:00
Lin Ma	79c8651587	vdpa: Add features attr to vdpa_nl_policy for nlattr length check The vdpa_nl_policy structure is used to validate the nlattr when parsing the incoming nlmsg. It will ensure the attribute being described produces a valid nlattr pointer in info->attrs before entering into each handler in vdpa_nl_ops. That is to say, the missing part in vdpa_nl_policy may lead to illegal nlattr after parsing, which could lead to OOB read just like CVE-2023-3773. This patch adds the missing nla_policy for vdpa features attr to avoid such bugs. Fixes: `90fea5a800` ("vdpa: device feature provisioning") Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: stable@vger.kernel.org Message-Id: <20230727175757.73988-3-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:24:28 -04:00
Feng Liu	13f3efaca0	virtio-pci: Fix legacy device flag setting error in probe The 'is_legacy' flag is used to differentiate between legacy vs modern device. Currently, it is based on the value of vp_dev->ldev.ioaddr. However, due to the shared memory of the union between struct virtio_pci_legacy_device and struct virtio_pci_modern_device, when virtio_pci_modern_probe modifies the content of struct virtio_pci_modern_device, it affects the content of struct virtio_pci_legacy_device, and ldev.ioaddr is no longer zero, causing the 'is_legacy' flag to be set as true. To resolve issue, when legacy device is probed, mark 'is_legacy' as true, when modern device is probed, keep 'is_legacy' as false. Fixes: `4f0fc22534` ("virtio_pci: Optimize virtio_pci_device structure size") Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20230719154550.79536-1-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-08-10 15:24:28 -04:00
Mike Christie	8d4bdf11f0	MAINTAINERS: add vhost-scsi entry and myself as a co-maintainer I've been doing a lot of the development on vhost-scsi the last couple of years, so per Michael T's suggestion this adds me as co-maintainer. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230715142027.5572-1-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2023-08-10 15:24:28 -04:00
Mike Christie	c5ace19efb	vhost-scsi: Rename vhost_scsi_iov_to_sgl Rename vhost_scsi_iov_to_sgl to vhost_scsi_map_iov_to_sgl so it matches matches the naming style used for vhost_scsi_copy_iov_to_sgl. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230709202859.138387-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2023-08-10 15:24:28 -04:00
Mike Christie	5ced58bfa1	vhost-scsi: Fix alignment handling with windows The linux block layer requires bios/requests to have lengths with a 512 byte alignment. Some drivers/layers like dm-crypt and the directi IO code will test for it and just fail. Other drivers like SCSI just assume the requirement is met and will end up in infinte retry loops. The problem for drivers like SCSI is that it uses functions like blk_rq_cur_sectors and blk_rq_sectors which divide the request's length by 512. If there's lefovers then it just gets dropped. But other code in the block/scsi layer may use blk_rq_bytes/blk_rq_cur_bytes and end up thinking there is still data left and try to retry the cmd. We can then end up getting stuck in retry loops where part of the block/scsi thinks there is data left, but other parts think we want to do IOs of zero length. Linux will always check for alignment, but windows will not. When vhost-scsi then translates the iovec it gets from a windows guest to a scatterlist, we can end up with sg items where the sg->length is not divisible by 512 due to the misaligned offset: sg[0].offset = 255; sg[0].length = 3841; sg... sg[N].offset = 0; sg[N].length = 255; When the lio backends then convert the SG to bios or other iovecs, we end up sending them with the same misaligned values and can hit the issues above. This just has us drop down to allocating a temp page and copying the data when we detect a misaligned buffer and the IO is large enough that it will get split into multiple bad IOs. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230709202859.138387-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2023-08-10 15:24:27 -04:00
Shannon Nelson	9ad1a29cb0	pds_vdpa: protect Makefile from unconfigured debugfs debugfs.h protects itself from an undefined DEBUG_FS, so it is not necessary to check it in the driver code or the Makefile. The driver code had been updated for this, but the Makefile had missed the update. Link: https://lore.kernel.org/linux-next/fec68c3c-8249-7af4-5390-0495386a76f9@infradead.org/ Fixes: `a16291b5bc` ("pds_vdpa: Add new vDPA driver for AMD/Pensando DSC") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230706231718.54198-1-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Acked-by: Jason Wang <jasowang@redhat.com>	2023-08-10 15:24:27 -04:00
Wolfram Sang	55c91fedd0	virtio-mmio: don't break lifecycle of vm_dev vm_dev has a separate lifecycle because it has a 'struct device' embedded. Thus, having a release callback for it is correct. Allocating the vm_dev struct with devres totally breaks this protection, though. Instead of waiting for the vm_dev release callback, the memory is freed when the platform_device is removed. Resulting in a use-after-free when finally the callback is to be called. To easily see the problem, compile the kernel with CONFIG_DEBUG_KOBJECT_RELEASE and unbind with sysfs. The fix is easy, don't use devres in this case. Found during my research about object lifetime problems. Fixes: `7eb781b1bb` ("virtio_mmio: add cleanup for virtio_mmio_probe") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Message-Id: <20230629120526.7184-1-wsa+renesas@sang-engineering.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:24:27 -04:00
Linus Torvalds	52a93d39b1	Linux 6.5-rc5 v6.5-rc5	2023-08-06 15:07:51 -07:00
Linus Torvalds	0108963f14	Merge tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a wrong check for O_TMPFILE during RESOLVE_CACHED lookup - Clean up directory iterators and clarify file_needs_f_pos_lock() * tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: rely on ->iterate_shared to determine f_pos locking vfs: get rid of old '->iterate' directory operation proc: fix missing conversion to 'iterate_shared' open: make RESOLVE_CACHED correctly test for O_TMPFILE	2023-08-06 10:43:52 -07:00
Christian Brauner	7d84d1b9af	fs: rely on ->iterate_shared to determine f_pos locking Now that we removed ->iterate we don't need to check for either ->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check for ->iterate_shared instead. This will tell us whether we need to unconditionally take the lock. Not just does it allow us to avoid checking f_inode's mode it also actually clearly shows that we're locking because of readdir. Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-08-06 15:08:36 +02:00
Linus Torvalds	3e32715496	vfs: get rid of old '->iterate' directory operation All users now just use '->iterate_shared()', which only takes the directory inode lock for reading. Filesystems that never got convered to shared mode now instead use a wrapper that drops the lock, re-takes it in write mode, calls the old function, and then downgrades the lock back to read mode. This way the VFS layer and other callers no longer need to care about filesystems that never got converted to the modern era. The filesystems that use the new wrapper are ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, and vboxsf. Honestly, several of them look like they really could just iterate their directories in shared mode and skip the wrapper entirely, but the point of this change is to not change semantics or fix filesystems that haven't been fixed in the last 7+ years, but to finally get rid of the dual iterators. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-08-06 15:08:35 +02:00
Linus Torvalds	0a2c2baafa	proc: fix missing conversion to 'iterate_shared' I'm looking at the directory handling due to the discussion about f_pos locking (see commit `797964253d`: "file: reinstate f_pos locking optimization for regular files"), and wanting to clean that up. And one source of ugliness is how we were supposed to move filesystems over to the '->iterate_shared()' function that only takes the inode lock for reading many many years ago, but several filesystems still use the bad old '->iterate()' that takes the inode lock for exclusive access. See commit `6192269444` ("introduce a parallel variant of ->iterate()") that also added some documentation stating Old method is only used if the new one is absent; eventually it will be removed. Switch while you still can; the old one won't stay. and that was back in April 2016. Here we are, many years later, and the old version is still clearly sadly alive and well. Now, some of those old style iterators are probably just because the filesystem may end up having per-inode mutable data that it uses for iterating a directory, but at least one case is just a mistake. Al switched over most filesystems to use '->iterate_shared()' back when it was introduced. In particular, the /proc filesystem was converted as one of the first ones in commit `f50752eaa0` ("switch all procfs directories ->iterate_shared()"). But then later one new user of '->iterate()' was then re-introduced by commit `6d9c939dbe` ("procfs: add smack subdir to attrs"). And that's clearly not what we wanted, since that new case just uses the same 'proc_pident_readdir()' and 'proc_pident_lookup()' helper functions that other /proc pident directories use, and they are most definitely safe to use with the inode lock held shared. So just fix it. This still leaves a fair number of oddball filesystems using the old-style directory iterator (ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, and vboxsf), but at least we don't have any remaining in the core filesystems. I'm going to add a wrapper function that just drops the read-lock and takes it as a write lock, so that we can clean up the core vfs layer and make all the ugly 'this filesystem needs exclusive inode locking' be just filesystem-internal warts. I just didn't want to make that conversion when we still had a core user left. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-08-06 15:08:35 +02:00
Aleksa Sarai	a0fc452a5d	open: make RESOLVE_CACHED correctly test for O_TMPFILE O_TMPFILE is actually __O_TMPFILE\|O_DIRECTORY. This means that the old fast-path check for RESOLVE_CACHED would reject all users passing O_DIRECTORY with -EAGAIN, when in fact the intended test was to check for __O_TMPFILE. Cc: stable@vger.kernel.org # v5.12+ Fixes: `99668f6180` ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Message-Id: <20230806-resolve_cached-o_tmpfile-v1-1-7ba16308465e@cyphar.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-08-06 15:08:35 +02:00
Linus Torvalds	f0ab9f34e5	Merge tag 'rust-fixes-6.5-rc5' of https://github.com/Rust-for-Linux/linux Pull rust fixes from Miguel Ojeda: - Allocator: prevent mis-aligned allocation - Types: delete 'ForeignOwnable::borrow_mut'. A sound replacement is planned for the merge window - Build: fix bindgen error with UBSAN_BOUNDS_STRICT * tag 'rust-fixes-6.5-rc5' of https://github.com/Rust-for-Linux/linux: rust: fix bindgen build error with UBSAN_BOUNDS_STRICT rust: delete `ForeignOwnable::borrow_mut` rust: allocator: Prevent mis-aligned allocation	2023-08-05 19:28:02 -07:00
Linus Torvalds	fb0d91991c	Merge tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ata fix from Damien Le Moal: - Prevent the scsi disk driver from issuing a START STOP UNIT command for ATA devices during system resume as this causes various issues reported by multiple users. * tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata,scsi: do not issue START STOP UNIT on resume	2023-08-05 18:45:18 -07:00
Linus Torvalds	f6a6916859	Merge tag '6.5-rc4-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fix from Steve French: - Fix DFS interlink problem (different namespace) * tag '6.5-rc4-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix dfs link mount against w2k8	2023-08-05 13:44:06 -07:00
Linus Torvalds	251a94f1f6	Merge tag 'powerpc-6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix vmemmap altmap boundary check which could cause memory hotunplug failure - Create a dummy stackframe to fix ftrace stack unwind - Fix secondary thread bringup for Book3E ELFv2 kernels - Use early_ioremap/unmap() in via_calibrate_decr() Thanks to Aneesh Kumar K.V, Benjamin Gray, Christophe Leroy, David Hildenbrand, and Naveen N Rao. * tag 'powerpc-6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powermac: Use early_* IO variants in via_calibrate_decr() powerpc/64e: Fix secondary thread bringup for ELFv2 kernels powerpc/ftrace: Create a dummy stackframe to fix stack unwind powerpc/mm/altmap: Fix altmap boundary check	2023-08-05 13:16:17 -07:00
Linus Torvalds	947c2a8358	Merge tag 'parisc-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture fixes from Helge Deller: - early fixmap preallocation to fix boot failures on kernel >= 6.4 - remove DMA leftover code in parport_gsc - drop old comments and code style fixes * tag 'parisc-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: unaligned: Add required spaces after ',' parport: gsc: remove DMA leftover code parisc: pci-dma: remove unused and dead EISA code and comment parisc/mm: preallocate fixmap page tables at init	2023-08-05 13:09:05 -07:00
Linus Torvalds	c9d26d8de1	Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A few clk driver fixes for some SoC clk drivers: - Change a usleep() to udelay() to avoid scheduling while atomic in the Amlogic PLL code - Revert a patch to the Mediatek MT8183 driver that caused an out-of-bounds write - Return the right error value when devm_of_iomap() fails in imx93_clocks_probe() - Constrain the Kconfig for the fixed mmio clk so that it depends on HAS_IOMEM and can't be compiled on architectures such as s390" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: fixed-mmio: make COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM clk: imx93: Propagate correct error in imx93_clocks_probe() clk: mediatek: mt8183: Add back SSPM related clocks clk: meson: change usleep_range() to udelay() for atomic context	2023-08-04 19:35:09 -07:00
Linus Torvalds	024ff300db	Merge tag 'hyperv-fixes-signed-20230804' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix a bug in a python script for Hyper-V (Ani Sinha) - Workaround a bug in Hyper-V when IBT is enabled (Michael Kelley) - Fix an issue parsing MP table when Linux runs in VTL2 (Saurabh Sengar) - Several cleanup patches (Nischala Yelchuri, Kameron Carr, YueHaibing, ZhiHu) * tag 'hyperv-fixes-signed-20230804' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: Remove unused extern declaration vmbus_ontimer() x86/hyperv: add noop functions to x86_init mpparse functions vmbus_testing: fix wrong python syntax for integer value comparison x86/hyperv: fix a warning in mshyperv.h x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction x86/hyperv: Improve code for referencing hyperv_pcpu_input_arg Drivers: hv: Change hv_free_hyperv_page() to take void * argument	2023-08-04 17:16:14 -07:00
Linus Torvalds	e661f98c82	Merge tag 'riscv-for-linus-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A pair of fixes for build-related failures in the selftests - A fix for a sparse warning in acpi_os_ioremap() - A fix to restore the kernel PA offset in vmcoreinfo, to fix crash handling * tag 'riscv-for-linus-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: Documentation: kdump: Add va_kernel_pa_offset for RISCV64 riscv: Export va_kernel_pa_offset in vmcoreinfo RISC-V: ACPI: Fix acpi_os_ioremap to return iomem address selftests: riscv: Fix compilation error with vstate_exec_nolibc.c selftests/riscv: fix potential build failure during the "emit_tests" step	2023-08-04 16:04:37 -07:00
Linus Torvalds	ea4f142ffa	Merge tag 'pm-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix a sparse warning triggered by the TPMI interface recently added to the Intel RAPL power capping driver (Zhang Rui)" * tag 'pm-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: powercap: intel_rapl: Fix a sparse warning in TPMI interface	2023-08-04 15:54:03 -07:00
Linus Torvalds	e6fda526d9	Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: "More SVE/SME fixes for ptrace() and for the (potentially future) case where SME is implemented in hardware without SVE support" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/fpsimd: Sync and zero pad FPSIMD state for streaming SVE arm64/fpsimd: Sync FPSIMD state with SVE for SME only systems arm64/ptrace: Don't enable SVE when setting streaming SVE arm64/ptrace: Flush FP state when setting ZT0 arm64/fpsimd: Clear SME state in the target task when setting the VL	2023-08-04 12:11:40 -07:00
Linus Torvalds	c8273a2586	Merge tag 'mtd/fixes-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd fixes from Miquel Raynal: "Raw NAND fixes: - fsl_upm: Fix an off-by one test in fun_exec_op() - Rockchip: - Align hwecc vs. raw page helper layouts - Fix oobfree offset and description - Meson: Fix OOB available bytes for ECC - Omap ELM: Fix incorrect type in assignment SPI-NOR fix: - Avoid holes in struct spi_mem_op Hyperbus fix: - Add Tudor as reviewer in MAINTAINERS SPI-NAND fixes: - Winbond and Toshiba: Fix ecc_get_status" * tag 'mtd/fixes-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: fsl_upm: Fix an off-by one test in fun_exec_op() mtd: spi-nor: avoid holes in struct spi_mem_op MAINTAINERS: Add myself as reviewer for HYPERBUS mtd: rawnand: rockchip: Align hwecc vs. raw page helper layouts mtd: rawnand: rockchip: fix oobfree offset and description mtd: rawnand: meson: fix OOB available bytes for ECC mtd: rawnand: omap_elm: Fix incorrect type in assignment mtd: spinand: winbond: Fix ecc_get_status mtd: spinand: toshiba: Fix ecc_get_status	2023-08-04 12:01:26 -07:00
Linus Torvalds	4142fc6743	Merge tag 'drm-fixes-2023-08-04' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "Small set of fixes this week, i915 and a few misc ones. I didn't see an amd pull so maybe next week it'll have a few more on that driver. ttm: - NULL ptr deref fix panel: - add missing MODULE_DEVICE_TABLE imx/ipuv3: - timing fix i915: - Fix bug in getting msg length in AUX CH registers handler - Gen12 AUX invalidation fixes - Fix premature release of request's reusable memory" * tag 'drm-fixes-2023-08-04' of git://anongit.freedesktop.org/drm/drm: drm/panel: samsung-s6d7aa0: Add MODULE_DEVICE_TABLE drm/i915: Fix premature release of request's reusable memory drm/i915/gt: Support aux invalidation on all engines drm/i915/gt: Poll aux invalidation register bit on invalidation drm/i915/gt: Enable the CCS_FLUSH bit in the pipe control and in the CS drm/i915/gt: Rename flags with bit_group_X according to the datasheet drm/i915/gt: Ensure memory quiesced before invalidation drm/i915: Add the gen12_needs_ccs_aux_inv helper drm/i915/gt: Cleanup aux invalidation registers drm/i915/gvt: Fix bug in getting msg length in AUX CH registers handler drm/imx/ipuv3: Fix front porch adjustment upon hactive aligning drm/ttm: check null pointer before accessing when swapping	2023-08-04 11:50:22 -07:00
Linus Torvalds	4593f3c2c6	Merge tag 'ceph-for-6.5-rc5' of https://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "Two patches to improve RBD exclusive lock interaction with osd_request_timeout option and another fix to reduce the potential for erroneous blocklisting -- this time in CephFS. All going to stable" * tag 'ceph-for-6.5-rc5' of https://github.com/ceph/ceph-client: libceph: fix potential hang in ceph_osdc_notify() rbd: prevent busy loop when requesting exclusive lock ceph: defer stopping mdsc delayed_work	2023-08-04 11:29:38 -07:00
Linus Torvalds	797964253d	file: reinstate f_pos locking optimization for regular files In commit `20ea1e7d13` ("file: always lock position for FMODE_ATOMIC_POS") we ended up always taking the file pos lock, because pidfd_getfd() could get a reference to the file even when it didn't have an elevated file count due to threading of other sharing cases. But Mateusz Guzik reports that the extra locking is actually measurable, so let's re-introduce the optimization, and only force the locking for directory traversal. Directories need the lock for correctness reasons, while regular files only need it for "POSIX semantics". Since pidfd_getfd() is about debuggers etc special things that are _way_ outside of POSIX, we can relax the rules for that case. Reported-by: Mateusz Guzik <mjguzik@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/linux-fsdevel/20230803095311.ijpvhx3fyrbkasul@f/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-08-04 11:22:14 -07:00
Mark Brown	69af56ae56	arm64/fpsimd: Sync and zero pad FPSIMD state for streaming SVE We have a function sve_sync_from_fpsimd_zeropad() which is used by the ptrace code to update the SVE state when the user writes to the the FPSIMD register set. Currently this checks that the task has SVE enabled but this will miss updates for tasks which have streaming SVE enabled if SVE has not been enabled for the thread, also do the conversion if the task has streaming SVE enabled. Fixes: `e12310a0d3` ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-3-49df214bfb3e@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-08-04 16:18:32 +01:00
Mark Brown	507ea5dd92	arm64/fpsimd: Sync FPSIMD state with SVE for SME only systems Currently we guard FPSIMD/SVE state conversions with a check for the system supporting SVE but SME only systems may need to sync streaming mode SVE state so add a check for SME support too. These functions are only used by the ptrace code. Fixes: `e12310a0d3` ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-2-49df214bfb3e@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-08-04 16:18:31 +01:00
Mark Brown	045aecdfcb	arm64/ptrace: Don't enable SVE when setting streaming SVE Systems which implement SME without also implementing SVE are architecturally valid but were not initially supported by the kernel, unfortunately we missed one issue in the ptrace code. The SVE register setting code is shared between SVE and streaming mode SVE. When we set full SVE register state we currently enable TIF_SVE unconditionally, in the case where streaming SVE is being configured on a system that supports vanilla SVE this is not an issue since we always initialise enough state for both vector lengths but on a system which only support SME it will result in us attempting to restore the SVE vector length after having set streaming SVE registers. Fix this by making the enabling of SVE conditional on setting SVE vector state. If we set streaming SVE state and SVE was not already enabled this will result in a SVE access trap on next use of normal SVE, this will cause us to flush our register state but this is fine since the only way to trigger a SVE access trap would be to exit streaming mode which will cause the in register state to be flushed anyway. Fixes: `e12310a0d3` ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-1-49df214bfb3e@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-08-04 16:18:31 +01:00
Andrea Righi	b055448843	rust: fix bindgen build error with UBSAN_BOUNDS_STRICT With commit `2d47c6956a` ("ubsan: Tighten UBSAN_BOUNDS on GCC") if CONFIG_UBSAN is enabled and gcc supports -fsanitize=bounds-strict, we can trigger the following build error due to bindgen lacking support for this additional build option: BINDGEN rust/bindings/bindings_generated.rs error: unsupported argument 'bounds-strict' to option '-fsanitize=' Fix by adding -fsanitize=bounds-strict to the list of skipped gcc flags for bindgen. Fixes: `2d47c6956a` ("ubsan: Tighten UBSAN_BOUNDS on GCC") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Link: https://lore.kernel.org/r/20230711071914.133946-1-andrea.righi@canonical.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>	2023-08-04 17:10:50 +02:00
Alice Ryhl	1d24eb2d53	rust: delete `ForeignOwnable::borrow_mut` We discovered that the current design of `borrow_mut` is problematic. This patch removes it until a better solution can be found. Specifically, the current design gives you access to a `&mut T`, which lets you change where the `ForeignOwnable` points (e.g., with `core::mem::swap`). No upcoming user of this API intended to make that possible, making all of them unsound. Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Gary Guo <gary@garyguo.net> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Fixes: `0fc4424d24` ("rust: types: introduce `ForeignOwnable`") Link: https://lore.kernel.org/r/20230706094615.3080784-1-aliceryhl@google.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>	2023-08-04 17:10:50 +02:00
Boqun Feng	b3d8aa84bb	rust: allocator: Prevent mis-aligned allocation Currently the rust allocator simply passes the size of the type Layout to krealloc(), and in theory the alignment requirement from the type Layout may be larger than the guarantee provided by SLAB, which means the allocated object is mis-aligned. Fix this by adjusting the allocation size to the nearest power of two, which SLAB always guarantees a size-aligned allocation. And because Rust guarantees that the original size must be a multiple of alignment and the alignment must be a power of two, then the alignment requirement is satisfied. Suggested-by: Vlastimil Babka <vbabka@suse.cz> Co-developed-by: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk> Signed-off-by: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Cc: stable@vger.kernel.org # v6.1+ Acked-by: Vlastimil Babka <vbabka@suse.cz> Fixes: `247b365dc8` ("rust: add `kernel` crate") Link: https://github.com/Rust-for-Linux/linux/issues/974 Link: https://lore.kernel.org/r/20230730012905.643822-2-boqun.feng@gmail.com [ Applied rewording of comment as discussed in the mailing list. ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>	2023-08-04 17:10:31 +02:00

1 2 3 4 5 ...

1201130 Commits