Mikulas asked in "Do we still need commit a0ee5ec520 ('tmpfs: allocate
on read when stacked')?" in [1]
Lukas noticed this unusual behavior of loop device backed by tmpfs in [2].
Normally, shmem_file_read_iter() copies the ZERO_PAGE when reading
holes; but if it looks like it might be a read for "a stacking
filesystem", it allocates actual pages to the page cache, and even marks
them as dirty. And reads from the loop device do satisfy the test that
is used.
This oddity was added for an old version of unionfs, to help to limit
its usage to the limited size of the tmpfs mount involved; but about the
same time as the tmpfs mod went in (2.6.25), unionfs was reworked to
proceed differently; and the mod kept just in case others needed it.
Do we still need it? I cannot answer with more certainty than "Probably
not". It's nasty enough that we really should try to delete it; but if
a regression is reported somewhere, then we might have to revert later.
It's not quite as simple as just removing the test (as Mikulas did):
xfstests generic/013 hung because splice from tmpfs failed on page not
up-to-date and page mapping unset. That can be fixed just by marking
the ZERO_PAGE as Uptodate, which of course it is: do so in
pagecache_init() - it might be useful to others than tmpfs.
My intention, though, was to stop using the ZERO_PAGE here altogether:
surely iov_iter_zero() is better for this case? Sadly not: it relies on
clear_user(), and the x86 clear_user() is slower than its copy_user() [3].
But while we are still using the ZERO_PAGE, let's stop dirtying its
struct page cacheline with unnecessary get_page() and put_page().
Link: https://lore.kernel.org/linux-mm/alpine.LRH.2.02.2007210510230.6959@file01.intranet.prod.int.rdu2.redhat.com/ [1]
Link: https://lore.kernel.org/linux-mm/20211126075100.gd64odg2bcptiqeb@work/ [2]
Link: https://lore.kernel.org/lkml/2f5ca5e4-e250-a41c-11fb-a7f4ebc7e1c9@google.com/ [3]
Link: https://lkml.kernel.org/r/90bc5e69-9984-b5fa-a685-be55f2b64b@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Lukas Czerner <lczerner@redhat.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When I added page_mapped() resilience in __delete_from_page_cache() for
the mapping_exiting() case, I missed that mapping_set_exiting() is done
in truncate_inode_pages_final(), which is not actually called for shmem.
(Today, it is folio_mapped() resilience in filemap_unaccount_folio().)
So the fixup to avoid a memory leak in this case never worked on shmem:
add a mapping_set_exiting() in shmem_evict_inode() at last. But this is
hardly a candidate for stable, since it's only useful if "Bad page".
Link: https://lkml.kernel.org/r/beefffda-6326-e36d-2d41-ed15b51af872@google.com
Fixes: 06b241f32c ("mm: __delete_from_page_cache show Bad page if mapped")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm/gup: some cleanups", v5.
This patch (of 5):
Alex reported invalid page pointer returned with pin_user_pages_remote()
from vfio after upstream commit 4b6c33b322 ("vfio/type1: Prepare for
batched pinning with struct vfio_batch").
It turns out that it's not the fault of the vfio commit; however after
vfio switches to a full page buffer to store the page pointers it starts
to expose the problem easier.
The problem is for VM_PFNMAP vmas we should normally fail with an
-EFAULT then vfio will carry on to handle the MMIO regions. However
when the bug triggered, follow_page_mask() returned -EEXIST for such a
page, which will jump over the current page, leaving that entry in
**pages untouched. However the caller is not aware of it, hence the
caller will reference the page as usual even if the pointer data can be
anything.
We had that -EEXIST logic since commit 1027e4436b ("mm: make GUP
handle pfn mapping unless FOLL_GET is requested") which seems very
reasonable. It could be that when we reworked GUP with FOLL_PIN we
could have overlooked that special path in commit 3faa52c03f ("mm/gup:
track FOLL_PIN pages"), even if that commit rightfully touched up
follow_devmap_pud() on checking FOLL_PIN when it needs to return an
-EEXIST.
Attaching the Fixes to the FOLL_PIN rework commit, as it happened later
than 1027e4436b.
[jhubbard@nvidia.com: added some tags, removed a reference to an out of tree module.]
Link: https://lkml.kernel.org/r/20220207062213.235127-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20220204020010.68930-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20220204020010.68930-2-jhubbard@nvidia.com
Fixes: 3faa52c03f ("mm/gup: track FOLL_PIN pages")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Debugged-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ever since these macros were introduced in commit b56c0d8937
("kthread: implement kthread_worker"), there has been precisely one user
(commit 4d11542070, "NVMe: Async IO queue deletion"), and that user
went away in 2016 with db3cbfff5b ("NVMe: IO queue deletion
re-write").
Apart from being unused, these macros are also awkward to use (which may
contribute to them not being used): Having a way to statically (or
on-stack) allocating the storage for the struct kthread_worker itself
doesn't help much, since obviously one needs to have some code for
actually _spawning_ the worker thread, which must have error checking.
And these days we have the kthread_create_worker() interface which both
allocates the struct kthread_worker and spawns the kthread.
Link: https://lkml.kernel.org/r/20220314145343.494694-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull kvm fix from Paolo Bonzini:
"Fix for the SLS mitigation, which makes a 'SETcc/RET' pair grow
to 'SETcc/RET/INT3'.
This doesn't fit in 4 bytes any more, so the alignment has to
change to 8 for this case"
* tag 'for-linus-5.17' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm/emulate: Fix SETcc emulation function offsets with SLS
Pull input fixes from Dmitry Torokhov:
"Two driver fixes:
- a fix for zinitix touchscreen to properly report contacts
- a fix for aiptek tablet driver to be more resilient to devices with
incorrect descriptors"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: aiptek - properly check endpoint type
Input: zinitix - do not report shadow fingers
The commit in Fixes started adding INT3 after RETs as a mitigation
against straight-line speculation.
The fastop SETcc implementation in kvm's insn emulator uses macro magic
to generate all possible SETcc functions and to jump to them when
emulating the respective instruction.
However, it hardcodes the size and alignment of those functions to 4: a
three-byte SETcc insn and a single-byte RET. BUT, with SLS, there's an
INT3 that gets slapped after the RET, which brings the whole scheme out
of alignment:
15: 0f 90 c0 seto %al
18: c3 ret
19: cc int3
1a: 0f 1f 00 nopl (%rax)
1d: 0f 91 c0 setno %al
20: c3 ret
21: cc int3
22: 0f 1f 00 nopl (%rax)
25: 0f 92 c0 setb %al
28: c3 ret
29: cc int3
and this explodes like this:
int3: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 2435 Comm: qemu-system-x86 Not tainted 5.17.0-rc8-sls #1
Hardware name: Dell Inc. Precision WorkStation T3400 /0TP412, BIOS A14 04/30/2012
RIP: 0010:setc+0x5/0x8 [kvm]
Code: 00 00 0f 1f 00 0f b6 05 43 24 06 00 c3 cc 0f 1f 80 00 00 00 00 0f 90 c0 c3 cc 0f \
1f 00 0f 91 c0 c3 cc 0f 1f 00 0f 92 c0 c3 cc <0f> 1f 00 0f 93 c0 c3 cc 0f 1f 00 \
0f 94 c0 c3 cc 0f 1f 00 0f 95 c0
Call Trace:
<TASK>
? x86_emulate_insn [kvm]
? x86_emulate_instruction [kvm]
? vmx_handle_exit [kvm_intel]
? kvm_arch_vcpu_ioctl_run [kvm]
? kvm_vcpu_ioctl [kvm]
? __x64_sys_ioctl
? do_syscall_64
? entry_SYSCALL_64_after_hwframe
</TASK>
Raise the alignment value when SLS is enabled and use a macro for that
instead of hard-coding naked numbers.
Fixes: e463a09af2 ("x86: Add straight-line-speculation mitigation")
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Link: https://lore.kernel.org/r/YjGzJwjrvxg5YZ0Z@audible.transient.net
[Add a comment and a bit of safety checking, since this is going to be changed
again for IBT support. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull ARM SoC fix from Arnd Bergmann:
"Here is one last regression fix for 5.17, reverting a patch that went
into 5.16 as a cleanup that ended up breaking external interrupts on
Layerscape chips.
The revert makes it work again, but also reintroduces a build time
warning about the nonstandard DT binding that will have to be dealt
with in the future"
* tag 'soc-fixes-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
Revert "arm64: dts: freescale: Fix 'interrupt-map' parent address cells"
Pull SCSI fixes from James Bottomley:
"Two small(ish) fixes, both in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fnic: Finish scsi_cmnd before dropping the spinlock
scsi: mpt3sas: Page fault in reply q processing
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Avoid iterating empty evlist, fixing a segfault with 'perf stat --null'
- Ignore case in topdown.slots check, fixing issue with Intel Icelake
JSON metrics.
- Fix symbol size calculation condition for fixing up corner case
symbol end address obtained from Kallsyms.
* tag 'perf-tools-fixes-for-v5.17-2022-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf parse-events: Ignore case in topdown.slots check
perf evlist: Avoid iteration for empty evlist.
perf symbols: Fix symbol size calculation condition
Pull char/misc driver fix from Greg KH:
"Here is a single driver fix for 5.17-final that has been submitted
many times but I somehow missed it in my patch queue:
- fix for counter sysfs code for reported problem
This has been in linux-next all week with no reported issues"
* tag 'char-misc-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
counter: Stop using dev_get_drvdata() to get the counter device
Pull USB fixes from Greg KH:
"Here are some small remaining USB fixes for 5.17-final.
They include:
- two USB gadget driver fixes for reported problems
- usbtmc driver fix for syzbot found issues
- musb patch partial revert to resolve a reported regression.
All of these have been in linux-next this week with no reported
problems"
* tag 'usb-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: Fix use-after-free bug by not setting udc->dev.driver
usb: usbtmc: Fix bug in pipe direction for control transfers
partially Revert "usb: musb: Set the DT node on the child device"
usb: gadget: rndis: prevent integer overflow in rndis_set_response()
Before this patch, the symbol end address fixup to be called, needed two
conditions being met:
if (prev->end == prev->start && prev->end != curr->start)
Where
"prev->end == prev->start" means that prev is zero-long
(and thus needs a fixup)
and
"prev->end != curr->start" means that fixup hasn't been applied yet
However, this logic is incorrect in the following situation:
*curr = {rb_node = {__rb_parent_color = 278218928,
rb_right = 0x0, rb_left = 0x0},
start = 0xc000000000062354,
end = 0xc000000000062354, namelen = 40, type = 2 '\002',
binding = 0 '\000', idle = 0 '\000', ignore = 0 '\000',
inlined = 0 '\000', arch_sym = 0 '\000', annotate2 = false,
name = 0x1159739e "kprobe_optinsn_page\t[__builtin__kprobes]"}
*prev = {rb_node = {__rb_parent_color = 278219041,
rb_right = 0x109548b0, rb_left = 0x109547c0},
start = 0xc000000000062354,
end = 0xc000000000062354, namelen = 12, type = 2 '\002',
binding = 1 '\001', idle = 0 '\000', ignore = 0 '\000',
inlined = 0 '\000', arch_sym = 0 '\000', annotate2 = false,
name = 0x1095486e "optinsn_slot"}
In this case, prev->start == prev->end == curr->start == curr->end,
thus the condition above thinks that "we need a fixup due to zero
length of prev symbol, but it has been probably done, since the
prev->end == curr->start", which is wrong.
After the patch, the execution path proceeds to arch__symbols__fixup_end
function which fixes up the size of prev symbol by adding page_size to
its end offset.
Fixes: 3b01a413c1 ("perf symbols: Improve kallsyms symbol end addr calculation")
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20220317135536.805-1-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull arm64 fixes from Catalin Marinas:
"Fix two compiler warnings introduced by recent commits: pointer
arithmetic and double initialisation of struct field"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: errata: avoid duplicate field initializer
arm64: fix clang warning about TRAMP_VALIAS
Pull cifs fix from Steve French:
"Small fix for regression in multiuser mounts.
The additional improvements suggested by Ronnie to make the server and
session status handling code easier to read can wait for the 5.18
merge window."
* tag '5.17-rc8-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix incorrect session setup check for multiuser mounts
Pull block fixes from Jens Axboe:
- Revert of a nvme target feature (Hannes)
- Fix a memory leak with rq-qos (Ming)
* tag 'block-5.17-2022-03-18' of git://git.kernel.dk/linux-block:
nvmet: revert "nvmet: make discovery NQN configurable"
block: release rq qos structures for queue without disk
Pull drm fixes from Dave Airlie:
"A few minor changes to finish things off, one mgag200 regression, imx
fix and couple of panel changes.
imx:
- Don't test bus flags in atomic check
mgag200:
- Fix PLL setup on some models
panel:
- Fix bpp settings on Innolux G070Y2-L01
- Fix DRM_PANEL_EDP Kconfig dependencies"
* tag 'drm-fixes-2022-03-18' of git://anongit.freedesktop.org/drm/drm:
drm: Don't make DRM_PANEL_BRIDGE dependent on DRM_KMS_HELPERS
drm/panel: simple: Fix Innolux G070Y2-L01 BPP settings
drm/imx: parallel-display: Remove bus flags check in imx_pd_bridge_atomic_check()
drm/mgag200: Fix PLL setup for g200wb and g200ew
The '.type' field is initialized both in place and in the macro
as reported by this W=1 warning:
arch/arm64/include/asm/cpufeature.h:281:9: error: initialized field overwritten [-Werror=override-init]
281 | (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU)
| ^
arch/arm64/kernel/cpu_errata.c:136:17: note: in expansion of macro 'ARM64_CPUCAP_LOCAL_CPU_ERRATUM'
136 | .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/arm64/kernel/cpu_errata.c:145:9: note: in expansion of macro 'ERRATA_MIDR_RANGE'
145 | ERRATA_MIDR_RANGE(m, var, r_min, var, r_max)
| ^~~~~~~~~~~~~~~~~
arch/arm64/kernel/cpu_errata.c:613:17: note: in expansion of macro 'ERRATA_MIDR_REV_RANGE'
613 | ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A510, 0, 0, 2),
| ^~~~~~~~~~~~~~~~~~~~~
arch/arm64/include/asm/cpufeature.h:281:9: note: (near initialization for 'arm64_errata[18].type')
281 | (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU)
| ^
Remove the extranous initializer.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 1dd498e5e2 ("KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata")
Link: https://lore.kernel.org/r/20220316183800.1546731-1-arnd@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The newly introduced TRAMP_VALIAS definition causes a build warning
with clang-14:
arch/arm64/include/asm/vectors.h:66:31: error: arithmetic on a null pointer treated as a cast from integer to pointer is a GNU extension [-Werror,-Wnull-pointer-arithmetic]
return (char *)TRAMP_VALIAS + SZ_2K * slot;
Change the addition to something clang does not complain about.
Fixes: bd09128d16 ("arm64: Add percpu vectors for EL1")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20220316183833.1563139-1-arnd@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>