struct file_ra_state ra.mmap_miss could be accessed concurrently during
page faults as noticed by KCSAN,
BUG: KCSAN: data-race in filemap_fault / filemap_map_pages
write to 0xffff9b1700a2c1b4 of 4 bytes by task 3292 on cpu 30:
filemap_fault+0x920/0xfc0
do_sync_mmap_readahead at mm/filemap.c:2384
(inlined by) filemap_fault at mm/filemap.c:2486
__xfs_filemap_fault+0x112/0x3e0 [xfs]
xfs_filemap_fault+0x74/0x90 [xfs]
__do_fault+0x9e/0x220
do_fault+0x4a0/0x920
__handle_mm_fault+0xc69/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
read to 0xffff9b1700a2c1b4 of 4 bytes by task 3313 on cpu 32:
filemap_map_pages+0xc2e/0xd80
filemap_map_pages at mm/filemap.c:2625
do_fault+0x3da/0x920
__handle_mm_fault+0xc69/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
Reported by Kernel Concurrency Sanitizer on:
CPU: 32 PID: 3313 Comm: systemd-udevd Tainted: G W L 5.5.0-next-20200210+ #1
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
ra.mmap_miss is used to contribute the readahead decisions, a data race
could be undesirable. Both the read and write is only under non-exclusive
mmap_sem, two concurrent writers could even underflow the counter. Fix
the underflow by writing to a local variable before committing a final
store to ra.mmap_miss given a small inaccuracy of the counter should be
acceptable.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Qian Cai <cai@lca.pw>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Marco Elver <elver@google.com>
Link: http://lkml.kernel.org/r/20200211030134.1847-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
swap_cache_info.* could be accessed concurrently as noticed by
KCSAN,
BUG: KCSAN: data-race in lookup_swap_cache / lookup_swap_cache
write to 0xffffffff85517318 of 8 bytes by task 94138 on cpu 101:
lookup_swap_cache+0x12e/0x460
lookup_swap_cache at mm/swap_state.c:322
do_swap_page+0x112/0xeb0
__handle_mm_fault+0xc7a/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
read to 0xffffffff85517318 of 8 bytes by task 91655 on cpu 100:
lookup_swap_cache+0x117/0x460
lookup_swap_cache at mm/swap_state.c:322
shmem_swapin_page+0xc7/0x9e0
shmem_getpage_gfp+0x2ca/0x16c0
shmem_fault+0xef/0x3c0
__do_fault+0x9e/0x220
do_fault+0x4a0/0x920
__handle_mm_fault+0xc69/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
Reported by Kernel Concurrency Sanitizer on:
CPU: 100 PID: 91655 Comm: systemd-journal Tainted: G W O L 5.5.0-next-20200204+ #6
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
write to 0xffffffff8d717308 of 8 bytes by task 11365 on cpu 87:
__delete_from_swap_cache+0x681/0x8b0
__delete_from_swap_cache at mm/swap_state.c:178
read to 0xffffffff8d717308 of 8 bytes by task 11275 on cpu 53:
__delete_from_swap_cache+0x66e/0x8b0
__delete_from_swap_cache at mm/swap_state.c:178
Both the read and write are done as lockless. Since swap_cache_info.*
are only used to print out counter information, even if any of them
missed a few incremental due to data races, it will be harmless, so just
mark it as an intentional data race using the data_race() macro.
While at it, fix a checkpatch.pl warning,
WARNING: Single statement macros should not use a do {} while (0) loop
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Link: http://lkml.kernel.org/r/20200207003715.1578-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct swap_info_struct si.flags could be accessed concurrently as noticed
by KCSAN,
BUG: KCSAN: data-race in scan_swap_map_slots / swap_readpage
write to 0xffff9c77b80ac400 of 8 bytes by task 91325 on cpu 16:
scan_swap_map_slots+0x6fe/0xb50
scan_swap_map_slots at mm/swapfile.c:887
get_swap_pages+0x39d/0x5c0
get_swap_page+0x377/0x524
add_to_swap+0xe4/0x1c0
shrink_page_list+0x1740/0x2820
shrink_inactive_list+0x316/0x8b0
shrink_lruvec+0x8dc/0x1380
shrink_node+0x317/0xd80
do_try_to_free_pages+0x1f7/0xa10
try_to_free_pages+0x26c/0x5e0
__alloc_pages_slowpath+0x458/0x1290
__alloc_pages_nodemask+0x3bb/0x450
alloc_pages_vma+0x8a/0x2c0
do_anonymous_page+0x170/0x700
__handle_mm_fault+0xc9f/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
read to 0xffff9c77b80ac400 of 8 bytes by task 5422 on cpu 7:
swap_readpage+0x204/0x6a0
swap_readpage at mm/page_io.c:380
read_swap_cache_async+0xa2/0xb0
swapin_readahead+0x6a0/0x890
do_swap_page+0x465/0xeb0
__handle_mm_fault+0xc7a/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
Reported by Kernel Concurrency Sanitizer on:
CPU: 7 PID: 5422 Comm: gmain Tainted: G W O L 5.5.0-next-20200204+ #6
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
Other reads,
read to 0xffff91ea33eac400 of 8 bytes by task 11276 on cpu 120:
__swap_writepage+0x140/0xc20
__swap_writepage at mm/page_io.c:289
read to 0xffff91ea33eac400 of 8 bytes by task 11264 on cpu 16:
swap_set_page_dirty+0x44/0x1f4
swap_set_page_dirty at mm/page_io.c:442
The write is under &si->lock, but the reads are done as lockless. Since
the reads only check for a specific bit in the flag, it is harmless even
if load tearing happens. Thus, just mark them as intentional data races
using the data_race() macro.
[cai@lca.pw: add a missing annotation]
Link: http://lkml.kernel.org/r/1581612585-5812-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Link: http://lkml.kernel.org/r/20200207003601.1526-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are a few information counters that are intentionally not protected
against increment races, so just annotate them using the data_race()
macro.
BUG: KCSAN: data-race in __frontswap_store / __frontswap_store
write to 0xffffffff8b7174d8 of 8 bytes by task 6396 on cpu 103:
__frontswap_store+0x2d0/0x344
inc_frontswap_failed_stores at mm/frontswap.c:70
(inlined by) __frontswap_store at mm/frontswap.c:280
swap_writepage+0x83/0xf0
pageout+0x33e/0xae0
shrink_page_list+0x1f57/0x2870
shrink_inactive_list+0x316/0x880
shrink_lruvec+0x8dc/0x1380
shrink_node+0x317/0xd80
do_try_to_free_pages+0x1f7/0xa10
try_to_free_pages+0x26c/0x5e0
__alloc_pages_slowpath+0x458/0x1290
__alloc_pages_nodemask+0x3bb/0x450
alloc_pages_vma+0x8a/0x2c0
do_anonymous_page+0x170/0x700
__handle_mm_fault+0xc9f/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
read to 0xffffffff8b7174d8 of 8 bytes by task 6405 on cpu 47:
__frontswap_store+0x2b9/0x344
inc_frontswap_failed_stores at mm/frontswap.c:70
(inlined by) __frontswap_store at mm/frontswap.c:280
swap_writepage+0x83/0xf0
pageout+0x33e/0xae0
shrink_page_list+0x1f57/0x2870
shrink_inactive_list+0x316/0x880
shrink_lruvec+0x8dc/0x1380
shrink_node+0x317/0xd80
do_try_to_free_pages+0x1f7/0xa10
try_to_free_pages+0x26c/0x5e0
__alloc_pages_slowpath+0x458/0x1290
__alloc_pages_nodemask+0x3bb/0x450
alloc_pages_vma+0x8a/0x2c0
do_anonymous_page+0x170/0x700
__handle_mm_fault+0xc9f/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1581114499-5042-1-git-send-email-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Even if KCSAN is disabled for kmemleak, update_checksum() could still call
crc32() (which is outside of kmemleak.c) to dereference object->pointer.
Thus, the value of object->pointer could be accessed concurrently as
noticed by KCSAN,
BUG: KCSAN: data-race in crc32_le_base / do_raw_spin_lock
write to 0xffffb0ea683a7d50 of 4 bytes by task 23575 on cpu 12:
do_raw_spin_lock+0x114/0x200
debug_spin_lock_after at kernel/locking/spinlock_debug.c:91
(inlined by) do_raw_spin_lock at kernel/locking/spinlock_debug.c:115
_raw_spin_lock+0x40/0x50
__handle_mm_fault+0xa9e/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
read to 0xffffb0ea683a7d50 of 4 bytes by task 839 on cpu 60:
crc32_le_base+0x67/0x350
crc32_le_base+0x67/0x350:
crc32_body at lib/crc32.c:106
(inlined by) crc32_le_generic at lib/crc32.c:179
(inlined by) crc32_le at lib/crc32.c:197
kmemleak_scan+0x528/0xd90
update_checksum at mm/kmemleak.c:1172
(inlined by) kmemleak_scan at mm/kmemleak.c:1497
kmemleak_scan_thread+0xcc/0xfa
kthread+0x1e0/0x200
ret_from_fork+0x27/0x50
If a shattered value was returned due to a data race, it will be corrected
in the next scan. Thus, let KCSAN ignore all reads in the region to
silence KCSAN in case the write side is non-atomic.
Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marco Elver <elver@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: http://lkml.kernel.org/r/20200317182754.2180-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The thp prefix is more frequently used than hpage and we should be
consistent between the various functions.
[akpm@linux-foundation.org: fix mm/migrate.c]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch replaces all memcpy() calls with LZ4_memcpy() which calls
__builtin_memcpy() so the compiler can inline it.
LZ4 relies heavily on memcpy() with a constant size being inlined. In x86
and i386 pre-boot environments memcpy() cannot be inlined because memcpy()
doesn't get defined as __builtin_memcpy().
An equivalent patch has been applied upstream so that the next import
won't lose this change [1].
I've measured the kernel decompression speed using QEMU before and after
this patch for the x86_64 and i386 architectures. The speed-up is about
10x as shown below.
Code Arch Kernel Size Time Speed
v5.8 x86_64 11504832 B 148 ms 79 MB/s
patch x86_64 11503872 B 13 ms 885 MB/s
v5.8 i386 9621216 B 91 ms 106 MB/s
patch i386 9620224 B 10 ms 962 MB/s
I also measured the time to decompress the initramfs on x86_64, i386, and
arm. All three show the same decompression speed before and after, as
expected.
[1] https://github.com/lz4/lz4/pull/890
Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Link: http://lkml.kernel.org/r/20200803194022.2966806-1-nickrterrell@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Landisk setup code maps the CF IDE area using ioremap_prot(), and
passes the resulting virtual addresses to the pata_platform driver,
disguising them as I/O port addresses. Hence the pata_platform driver
translates them again using ioport_map().
As CONFIG_GENERIC_IOMAP=n, and CONFIG_HAS_IOPORT_MAP=y, the
SuperH-specific mapping code in arch/sh/kernel/ioport.c translates
I/O port addresses to virtual addresses by adding sh_io_port_base, which
defaults to -1, thus breaking the assumption of an identity mapping.
Fix this by setting sh_io_port_base to zero.
Fixes: 37b7a97884 ("sh: machvec IO death.")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rich Felker <dalias@libc.org>
Other architectures expect that syscall_set_return_value gets an already
negative value as error. That's also what kernel/seccomp.c provides.
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Rich Felker <dalias@libc.org>
This avoids out-of-range jumps that get auto-replaced by the assembler
and prepares for the changes needed to implement SECCOMP_FILTER cleanly.
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Rich Felker <dalias@libc.org>
This switches to using common code for the DMA allocations, including
potential use of the CMA allocator if configured.
Switching to the generic code enables DMA allocations from atomic
context, which is required by the DMA API documentation, and also
adds various other minor features drivers start relying upon. It
also makes sure we have on tested code base for all architectures
that require uncached pte bits for coherent DMA allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rich Felker <dalias@libc.org>
The code handling non-coherent DMA depends on being able to remap code
as non-cached. But that can't be done without an MMU, so using this
option on NOMMU builds is broken.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rich Felker <dalias@libc.org>
Move the internal implementation details of ioremap out of line, no need
to expose any of this to drivers for a slow path API.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rich Felker <dalias@libc.org>
ioremap_fixed is an internal implementation detail and should not be
exposed to drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rich Felker <dalias@libc.org>
There is no point in having __KERNEL__ ifdefs in headers not exported to
userspace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rich Felker <dalias@libc.org>
Ensure there is an order for the selects. Also remove a duplicate
one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rich Felker <dalias@libc.org>
The sh build is full of warnings when building with gcc 9.2.1. While
fixing those would be great, at least avoid failing the build.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rich Felker <dalias@libc.org>
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Rich Felker <dalias@libc.org>
Drop all configs with the CONFIG_SOC_CAMERA prefix since those
have been removed.
SOC_CAMERA support for the sh architecture was removed a long time ago.
Drop it from the configs.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Rich Felker <dalias@libc.org>
The SH implementation never called stacktrace_ops.stack().
Presumably this was copied from the x86 implementation.
Hence remove the method, and all implementations (most of them are
dummies).
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rich Felker <dalias@libc.org>
- Convert from printk() to pr_*(),
- Add missing continuations,
- Join broken messages.
Note that printk(KERN_DEBUG ...) is retained, to preserve behavior
(pr_debug() is a dummy if DEBUG is not defined).
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rich Felker <dalias@libc.org>
Rejoin the broken lines by using pr_cont().
Convert the remaining printk() calls to pr_*() while at it.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rich Felker <dalias@libc.org>
Rejoin the broken lines by dropping the log level parameters and using
pr_cont().
Use "%px" to print sensible addresses in call traces.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rich Felker <dalias@libc.org>
Rejoin the broken lines by using pr_cont().
Convert the remaining printk() calls to pr_*() while at it.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rich Felker <dalias@libc.org>
This reverts commit 8b92f34877.
"data" became the log level in commit 539e786cc3 ("sh: add loglvl
to show_trace()"), so we do need to keep the printk() before the
continuation in print_trace_address().
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rich Felker <dalias@libc.org>
This reverts commit 2deebe4d56.
printk_address() is always used as a continuation of the previous
logging, hence it should not include a log level.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rich Felker <dalias@libc.org>
Somewhere along the patch handling path, both the old "printk(KERN_ALERT
....)" and the new "pr_alert(...)" were retained, leading to the
duplicate printing of "PC:".
Drop the old one.
Fixes: eaabf98b09 ("sh: fault: modernize printing of kernel messages")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rich Felker <dalias@libc.org>
As of commit 37744feebc ("sh: remove sh5 support"), support for
the SH5-based Cayman platform can no longer be selected.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rich Felker <dalias@libc.org>
Since the removal of core support for SH5, Cayman support can no longer
be selected.
Fixes: 37744feebc ("sh: remove sh5 support")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rich Felker <dalias@libc.org>