In newer versions of virtio-scsi we just reset the timer when an a
command times out, so TMFs are never sent for the cmd time out case.
However, in older kernels and for the TMF inject cases, we can still get
resets and we end up just failing immediately so the guest might see the
device get offlined and IO errors.
For the older kernel cases, we want the same end result as the
modern virtio-scsi driver where we let the lower levels fire their error
handling and handle the problem. And at the upper levels we want to
wait. This patch ties the LUN reset handling into the LIO TMF code which
will just wait for outstanding commands to complete like we are doing in
the modern virtio-scsi case.
Note: I did not handle the ABORT case to keep this simple. For ABORTs
LIO just waits on the cmd like how it does for the RESET case. If
an ABORT fails, the guest OS ends up escalating to LUN RESET, so in
the end we get the same behavior where we wait on the outstanding
cmds.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/1604986403-4931-6-git-send-email-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
We might not do the final se_cmd put from vhost_scsi_complete_cmd_work.
When the last put happens a little later then we could race where
vhost_scsi_complete_cmd_work does vhost_signal, the guest runs and sends
more IO, and vhost_scsi_handle_vq runs but does not find any free cmds.
This patch has us delay completing the cmd until the last lio core ref
is dropped. We then know that once we signal to the guest that the cmd
is completed that if it queues a new command it will find a free cmd.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>
Link: https://lore.kernel.org/r/1604986403-4931-4-git-send-email-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
We currently are limited to 256 cmds per session. This leads to problems
where if the user has increased virtqueue_size to more than 2 or
cmd_per_lun to more than 256 vhost_scsi_get_tag can fail and the guest
will get IO errors.
This patch moves the cmd allocation to per vq so we can easily match
whatever the user has specified for num_queues and
virtqueue_size/cmd_per_lun. It also makes it easier to control how much
memory we preallocate. For cases, where perf is not as important and
we can use the current defaults (1 vq and 128 cmds per vq) memory use
from preallocate cmds is cut in half. For cases, where we are willing
to use more memory for higher perf, cmd mem use will now increase as
the num queues and queue depth increases.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/1604986403-4931-3-git-send-email-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
After merging the drm-misc tree, linux-next build (arm
multi_v7_defconfig) failed like this:
In file included from drivers/gpu/drm/nouveau/nouveau_ttm.c:26:
include/linux/swiotlb.h: In function 'swiotlb_max_mapping_size':
include/linux/swiotlb.h:99:9: error: 'SIZE_MAX' undeclared (first use in this function)
99 | return SIZE_MAX;
| ^~~~~~~~
include/linux/swiotlb.h:7:1: note: 'SIZE_MAX' is defined in header '<stdint.h>'; did you forget to '#include <stdint.h>'?
6 | #include <linux/init.h>
+++ |+#include <stdint.h>
7 | #include <linux/types.h>
include/linux/swiotlb.h:99:9: note: each undeclared identifier is reported only once for each function it appears in
99 | return SIZE_MAX;
| ^~~~~~~~
Caused by commit
abe420bfae ("swiotlb: Introduce swiotlb_max_mapping_size()")
but only exposed by commit "drm/nouveu: fix swiotlb include"
Fix it by including linux/limits.h as appropriate.
Fixes: abe420bfae ("swiotlb: Introduce swiotlb_max_mapping_size()")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20201102124327.2f82b2a7@canb.auug.org.au
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vdpa_sim generates a ramdom MAC address but it is never used by upper
layers because the VIRTIO_NET_F_MAC bit is not set in the features list.
Because of that, virtio-net always regenerates a random MAC address each
time it is loaded whereas the address should only change on vdpa_sim
load/unload.
Fix that by adding VIRTIO_NET_F_MAC in the features list of vdpa_sim.
Fixes: 2c53d0f64c ("vdpasim: vDPA device simulator")
Cc: jasowang@redhat.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Link: https://lore.kernel.org/r/20201029122050.776445-2-lvivier@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
LKP considered variable 'ret' in vhost_vdpa_setup_vq_irq() as
a unused variable, so suggest we remove it. Actually it stores
return value of irq_bypass_register_producer(), but we did not
check it, we should handle the failure case.
This commit will print a message if irq bypass register producer
fail, in this case, vqs still remain functional.
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20201023104046.404794-1-lingshan.zhu@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
This reverts commit 7ed9e3d97c.
The patch creates a DoS risk since it can result in a high order memory
allocation.
Fixes: 7ed9e3d97c ("vhost-vdpa: fix page pinning leakage in error path")
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch introduces a new ioctl for vhost-vdpa device that can
report the iova range by the device.
For device that implements get_iova_range() method, we fetch it from
the vDPA device. If device doesn't implement get_iova_range() but
depends on platform IOMMU, we will query via DOMAIN_ATTR_GEOMETRY,
otherwise [0, ULLONG_MAX] is assumed.
For safety, this patch also rules out the map request which is not in
the valid range.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20201023090043.14430-3-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If riov and wiov are both defined and they point to different
objects, only riov is initialized. If the wiov is not initialized
by the caller, the function fails returning -EINVAL and printing
"Readable desc 0x... after writable" error message.
This issue happens when descriptors have both readable and writable
buffers (eg. virtio-blk devices has virtio_blk_outhdr in the readable
buffer and status as last byte of writable buffer) and we call
__vringh_iov() to get both type of buffers in two different iovecs.
Let's replace the 'else if' clause with 'if' to initialize both
riov and wiov if they are not NULL.
As checkpatch pointed out, we also avoid crashing the kernel
when riov and wiov are both NULL, replacing BUG() with WARN_ON()
and returning -EINVAL.
Fixes: f87d0fbb57 ("vringh: host-side implementation of virtio rings.")
Cc: stable@vger.kernel.org
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201008204256.162292-1-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
set_map() is used by mlx5 vdpa to create a memory region based on the
address map passed by the iotlb argument. If we get successive calls, we
will destroy the current memory region and build another one based on
the new address mapping. We also need to setup the hardware resources
since they depend on the memory region.
If these calls happen before DRIVER_OK, It means that driver VQs may
also not been setup and we may not create them yet. In this case we want
to avoid setting up the other resources and defer this till we get
DRIVER OK.
Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20200908123346.GA169007@mtl-vdi-166.wap.labs.mlnx
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
An architecture may restrict host access to guest memory,
e.g. IBM s390 Secure Execution or AMD SEV.
Provide a new Kconfig entry the architecture can select,
CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, when it provides
the arch_has_restricted_virtio_memory_access callback to advertise
to VIRTIO common code when the architecture restricts memory access
from the host.
The common code can then fail the probe for any device where
VIRTIO_F_ACCESS_PLATFORM is required, but not set.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Link: https://lore.kernel.org/r/1599728030-17085-2-git-send-email-pmorel@linux.ibm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Merge misc fixes from Andrew Morton:
"Five fixes.
Subsystems affected by this patch series: MAINTAINERS, mm/pagemap,
mm/swap, and mm/hugetlb"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
mm: validate inode in mapping_set_error()
mm: mmap: Fix general protection fault in unlink_file_vma()
MAINTAINERS: Antoine Tenart's email address
MAINTAINERS: change hardening mailing list
Pull vfs fix from Al Viro:
"Fixes an obvious bug (memory leak introduced in 5.8)"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
pipe: Fix memory leaks in create_pipe_files()
Pull x86 fixes from Ingo Molnar:
"Two fixes:
- Fix a (hopefully final) IRQ state tracking bug vs MCE handling
- Fix a documentation link"
* tag 'x86-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/x86: Fix incorrect references to zero-page.txt
x86/mce: Use idtentry_nmi_enter/exit()
Pull perf fix from Ingo Molnar:
"Fix an error handling bug that can cause a lockup if a CPU is offline
(doh ...)"
* tag 'perf-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix task_function_call() error handling
The syzbot reported the below general protection fault:
general protection fault, probably for non-canonical address
0xe00eeaee0000003b: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x00777770000001d8-0x00777770000001df]
CPU: 1 PID: 10488 Comm: syz-executor721 Not tainted 5.9.0-rc3-syzkaller #0
RIP: 0010:unlink_file_vma+0x57/0xb0 mm/mmap.c:164
Call Trace:
free_pgtables+0x1b3/0x2f0 mm/memory.c:415
exit_mmap+0x2c0/0x530 mm/mmap.c:3184
__mmput+0x122/0x470 kernel/fork.c:1076
mmput+0x53/0x60 kernel/fork.c:1097
exit_mm kernel/exit.c:483 [inline]
do_exit+0xa8b/0x29f0 kernel/exit.c:793
do_group_exit+0x125/0x310 kernel/exit.c:903
get_signal+0x428/0x1f00 kernel/signal.c:2757
arch_do_signal+0x82/0x2520 arch/x86/kernel/signal.c:811
exit_to_user_mode_loop kernel/entry/common.c:136 [inline]
exit_to_user_mode_prepare+0x1ae/0x200 kernel/entry/common.c:167
syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:242
entry_SYSCALL_64_after_hwframe+0x44/0xa9
It's because the ->mmap() callback can change vma->vm_file and fput the
original file. But the commit d70cec8983 ("mm: mmap: merge vma after
call_mmap() if possible") failed to catch this case and always fput()
the original file, hence add an extra fput().
[ Thanks Hillf for pointing this extra fput() out. ]
Fixes: d70cec8983 ("mm: mmap: merge vma after call_mmap() if possible")
Reported-by: syzbot+c5d5a51dcbb558ca0cb5@syzkaller.appspotmail.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christian König <ckoenig.leichtzumerken@gmail.com>
Cc: Hongxiang Lou <louhongxiang@huawei.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Link: https://lkml.kernel.org/r/20200916090733.31427-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull i2c fixes from Wolfram Sang:
"Some more driver bugfixes for I2C. Including a revert - the updated
series for it will come during the next merge window"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: owl: Clear NACK and BUS error bits
Revert "i2c: imx: Fix reset of I2SR_IAL flag"
i2c: meson: fixup rate calculation with filter delay
i2c: meson: keep peripheral clock enabled
i2c: meson: fix clock setting overwrite
i2c: imx: Fix reset of I2SR_IAL flag
On setxattr() syscall path due to an apprent typo the size of a dynamically
allocated memory chunk for storing struct smb2_file_full_ea_info object is
computed incorrectly, to be more precise the first addend is the size of
a pointer instead of the wanted object size. Coincidentally it makes no
difference on 64-bit platforms, however on 32-bit targets the following
memcpy() writes 4 bytes of data outside of the dynamically allocated memory.
=============================================================================
BUG kmalloc-16 (Not tainted): Redzone overwritten
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc
INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201
INFO: Object 0x6f171df3 @offset=352 fp=0x00000000
Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69 ........snrub.fi
Redzone 79e69a6f: 73 68 32 0a sh2.
Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
CPU: 0 PID: 8196 Comm: attr Tainted: G B 5.9.0-rc8+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Call Trace:
dump_stack+0x54/0x6e
print_trailer+0x12c/0x134
check_bytes_and_report.cold+0x3e/0x69
check_object+0x18c/0x250
free_debug_processing+0xfe/0x230
__slab_free+0x1c0/0x300
kfree+0x1d3/0x220
smb2_set_ea+0x27d/0x540
cifs_xattr_set+0x57f/0x620
__vfs_setxattr+0x4e/0x60
__vfs_setxattr_noperm+0x4e/0x100
__vfs_setxattr_locked+0xae/0xd0
vfs_setxattr+0x4e/0xe0
setxattr+0x12c/0x1a0
path_setxattr+0xa4/0xc0
__ia32_sys_lsetxattr+0x1d/0x20
__do_fast_syscall_32+0x40/0x70
do_fast_syscall_32+0x29/0x60
do_SYSENTER_32+0x15/0x20
entry_SYSENTER_32+0x9f/0xf2
Fixes: 5517554e43 ("cifs: Add support for writing attributes on SMB2+")
Signed-off-by: Vladimir Zapolskiy <vladimir@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There have been elusive reports of filemap_fault() hitting its
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
with CONFIG_READ_ONLY_THP_FOR_FS=y.
Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
without NUMA reuses the same huge page after collapse_file() failed
(whereas NUMA targets its allocation to the respective node each time).
And most of us were usually testing with CONFIG_NUMA=y kernels.
collapse_file(old start)
new_page = khugepaged_alloc_page(hpage)
__SetPageLocked(new_page)
new_page->index = start // hpage->index=old offset
new_page->mapping = mapping
xas_store(&xas, new_page)
filemap_fault
page = find_get_page(mapping, offset)
// if offset falls inside hpage then
// compound_head(page) == hpage
lock_page_maybe_drop_mmap()
__lock_page(page)
// collapse fails
xas_store(&xas, old page)
new_page->mapping = NULL
unlock_page(new_page)
collapse_file(new start)
new_page = khugepaged_alloc_page(hpage)
__SetPageLocked(new_page)
new_page->index = start // hpage->index=new offset
new_page->mapping = mapping // mapping becomes valid again
// since compound_head(page) == hpage
// page_to_pgoff(page) got changed
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
An initial patch replaced __SetPageLocked() by lock_page(), which did
fix the race which Suren illustrates above. But testing showed that it's
not good enough: if the racing task's __lock_page() gets delayed long
after its find_get_page(), then it may follow collapse_file(new start)'s
successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.
It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
check and retry (as is done for mapping), with similar relaxations in
find_lock_entry() and pagecache_get_page(): but it's not obvious what
else might get caught out; and khugepaged non-NUMA appears to be unique
in exposing a page to page cache, then revoking, without going through
a full cycle of freeing before reuse.
Instead, non-NUMA khugepaged_prealloc_page() release the old page
if anyone else has a reference to it (1% of cases when I tested).
Although never reported on huge tmpfs, I believe its find_lock_entry()
has been at similar risk; but huge tmpfs does not rely on khugepaged
for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.
Reported-by: Denis Lisov <dennis.lissov@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org
Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw
Reported-and-analyzed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 87c460a0bd ("mm/khugepaged: collapse_shmem() without freezing new_page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.9+
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the NACK and BUS error bits are set by the hardware, the driver is
responsible for clearing them by writing "1" into the corresponding
status registers.
Hence perform the necessary operations in owl_i2c_interrupt().
Fixes: d211e62af4 ("i2c: Add Actions Semiconductor Owl family S900 I2C driver")
Reported-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
This reverts commit fa4d305568. An updated
version was sent. So, revert this version and give the new version more
time for testing.
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Pull spi fix from Mark Brown:
"One last minute fix for v5.9 which has been causing crashes in test
systems with the fsl-dspi driver when they hit deferred probe (and
which I probably let cook in next a bit longer than is ideal).
And an update to MAINTAINERS reflecting Serge's extensive and
detailed recent work on the DesignWare driver"
* tag 'spi-fix-v5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
MAINTAINERS: Add maintainer of DW APB SSI driver
spi: fsl-dspi: fix NULL pointer dereference
Pull RISC-V fixes from Palmer Dabbelt:
"Two fixes this week:
- A fix to actually reserve the device tree's memory. Without this
the device tree can be overwritten on systems that don't otherwise
reserve it. This issue should only manifest on !MMU systems.
- A workaround for a BUG() that triggers when the memory that
originally contained initdata is freed and later repurposed. This
triggers a BUG() on builds that had HARDENED_USERCOPY enabled"
* tag 'riscv-for-linus-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fixup bootup failure with HARDENED_USERCOPY
RISC-V: Make sure memblock reserves the memory containing DT
Pull power supply fix from Sebastian Reichel:
"Just a single change to revert enablement of packet error checking for
battery data on Chromebooks, since some of their embedded controllers
do not handle it correctly"
* tag 'for-v5.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: sbs-battery: chromebook workaround for PEC
Pull GPIO fixes from Linus Walleij:
"Some late fixes: one IRQ issue and one compilation issue for UML.
- Fix a compilation issue with User Mode Linux
- Handle spurious interrupts properly in the PCA953x driver"
* tag 'gpio-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: pca953x: Survive spurious interrupts
gpiolib: Disable compat ->read() code in UML case
Pull MMC fix from Ulf Hansson:
"Assign a proper discard granularity rather than incorrectly set it to
zero"
* tag 'mmc-v5.9-rc4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: core: don't set limits.discard_granularity as 0