Pull perf tooling updates from Ingo Molnar:
"Tooling changes only: fixes and a few stray improvements.
Most of the diffstat is dominated by a PowerPC related fix of system
call trace output beautification that allows us to (again) use the
UAPI header version and sync up with the kernel's version of PowerPC
system call names in the arch/powerpc/kernel/syscalls/syscall.tbl
header"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
tools headers powerpc: Remove unistd.h
perf powerpc: Rework syscall table generation
perf symbols: Add 'arch_cpu_idle' to the list of kernel idle symbols
tools include uapi: Sync linux/if_link.h copy with the kernel sources
tools include uapi: Sync linux/vhost.h with the kernel sources
tools include uapi: Sync linux/fs.h copy with the kernel sources
perf beauty: Switch from using uapi/linux/fs.h to uapi/linux/mount.h
tools include uapi: Grab a copy of linux/mount.h
perf top: Lift restriction on using callchains without "sym" in --sort
tools lib traceevent: Remove tep_data_event_from_type() API
tools lib traceevent: Rename tep_is_file_bigendian() to tep_file_bigendian()
tools lib traceevent: Changed return logic of tep_register_event_handler() API
tools lib traceevent: Changed return logic of trace_seq_printf() and trace_seq_vprintf() APIs
tools lib traceevent: Rename struct cmdline to struct tep_cmdline
tools lib traceevent: Initialize host_bigendian at tep_handle allocation
tools lib traceevent: Introduce new libtracevent API: tep_override_comm()
perf tests: Add a test for the ARM 32-bit [vectors] page
perf tools: Make find_vdso_map() more modular
perf trace: Fix alignment for [continued] lines
perf trace: Fix ')' placement in "interrupted" syscall lines
...
Pull x86 fixes from Ingo Molnar:
"A 32-bit build fix, CONFIG_RETPOLINE fixes and rename CONFIG_RESCTRL
to CONFIG_X86_RESCTRL"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE
x86/cache: Rename config option to CONFIG_X86_RESCTRL
samples/seccomp: Fix 32-bit build
Pull ACPI fixes from Rafael Wysocki:
"Fix a build failure introduced recently, fix the xpower PMIC ACPI
driver, clean up the handling of duplicate entries in _PRx power
resource lists and fix addresses in NUMA-related messages on 32-bit
with PAE.
Specifics:
- Fix build failures with both CONFIG_NLS and CONFIG_PCI unset that
can occur since ACPI can be built without PCI now (Sinan Kaya).
- Clean up the handling of duplicate entries in power resource lists
returned by _PRx evaluation to avoid triggering WARN_ON() on
attempts to add duplicate symlinks in sysfs (Hans de Goede).
- Fix issues with the TS current-source switching on systems using
the xpower PMIC by avoiding to update unrelated bits in the TS
pin-ctrl register and avoiding to unconditionally enable TS
current-source on systems where it is not used (Hans de Goede).
- Fix addresses in NUMA-related messages on 32-bit with PAE which can
be truncated due to integer type conversions (Chao Fan)"
* tag 'acpi-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PMIC: xpower: Fix TS-pin current-source handling
ACPI: NUMA: Use correct type for printing addresses on i386-PAE
ACPI: power: Skip duplicate power resource references in _PRx
ACPI: Fix build failure when CONFIG_NLS is set to 'n'
Pull power management updates from Rafael Wysocki:
"These fix fallout after starting to use hrtimers in the runtime PM
framework, fix a few cpufreq issues, fix a recently broken reference
to cpuidle documentation, update MAINTAINERS entries for cpufreq and
cpuidle and make the recently added system suspend and resume support
in devfreq actually work.
Specifics:
- Prevent integer overflows from occurring on 32-bit when converting
milliseconds to nanoseconds in the runtime PM framework and update
comments that still refer to jiffies in it (Vincent Guittot,
Ladislav Michl).
- Fix the SCMI cpufreq driver to always use the same frequency units
for arch_set_freq_scale() and make the scale-invariant load
tracking acutally work with this driver (Quentin Perret).
- Fix freeing of dynamic OPPs in the SCPI and SCMI cpufreq drivers
broken during the 4.20 defelopment cycle (Viresh Kumar).
- Prevent the cpufreq core from attempting to return the current
frequency of offline CPUs (Sudeep Holla).
- Add devfreq suspend and resume hooks (missed previously) to the PM
core to make the recently added system suspend and resume support
in devfreq actually work (Lukasz Luba).
- Update MAINTAINERS entries for cpufreq and cpuidle, mostly to add
references to new/current documentation to them (Rafael Wysocki).
- Fix a recently broken reference to cpuidle documentation (Otto
Sabart)"
* tag 'pm-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM-runtime: Fix autosuspend_delay on 32bits arch
PM-runtime: Fix 'jiffies' in comments after switch to hrtimers
cpufreq: scmi: Fix frequency invariance in slow path
doc: trace: fix reference to cpuidle documentation file
cpufreq: check if policy is inactive early in __cpufreq_get()
cpufreq: scpi/scmi: Fix freeing of dynamic OPPs
cpuidle / Documentation: Update cpuidle MAINTAINERS entry
cpufreq / Documentation: Update cpufreq MAINTAINERS entry
PM: sleep: call devfreq suspend/resume
Pull drm fixes from Dave Airlie:
"Not a huge amount for rc2, assume the usual quiet period, and rc3 will
be most of it.
amdgpu:
- Powerplay fixes
- Virtual display pinning fixes
- Golden register updates for Vega
- Pitch and gem size validation fixes
- SR-IOV init error fix
- Pagetables in system RAM disable for some Raven system
- DP-MST resume fixes
tc358767 bridge:
- fix to work with displayport connector"
* tag 'drm-fixes-2019-01-11' of git://anongit.freedesktop.org/drm/drm: (26 commits)
drm/amdgpu: disable system memory page tables for now
drm/amdgpu: set WRITE_BURST_LENGTH to 64B to workaround SDMA1 hang
drm/amdgpu: fix CPDMA hang in PRT mode for VEGA20
drm/bridge: tc358767: use DP connector if no panel set
drm/bridge: tc358767: fix output H/V syncs
drm/bridge: tc358767: reject modes which require too much BW
drm/bridge: tc358767: fix initial DP0/1_SRCCTRL value
drm/bridge: tc358767: fix single lane configuration
drm/bridge: tc358767: add defines for DP1_SRCCTRL & PHY_2LANE
drm/bridge: tc358767: add bus flags
drm/dp_mst: Add __must_check to drm_dp_mst_topology_mgr_resume()
drm/amdgpu: Don't fail resume process if resuming atomic state fails
drm/amdgpu: Don't ignore rc from drm_dp_mst_topology_mgr_resume()
drm/amdgpu: validate user GEM object size
drm/amdgpu: validate user pitch alignment
drm/amd/powerplay: drop the unnecessary uclk hard min setting
drm/amd/powerplay: avoid possible buffer overflow
drm/amd/powerplay: create pp_od_clk_voltage device file under OD support
drm/amd/powerplay: update OD support flag for SKU with no OD capabilities
drm/amdgpu: make gfx9 enter into rlc safe mode when set MGCG
...
* acpi-pci:
ACPI: Fix build failure when CONFIG_NLS is set to 'n'
* acpi-power:
ACPI: power: Skip duplicate power resource references in _PRx
* acpi-misc:
ACPI: NUMA: Use correct type for printing addresses on i386-PAE
Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:
perf trace:
Ravi Bangoria:
- Rework PowerPC syscall table generation, now using a .tbl file just like
x86_64 and S/390, also silencing a tools build warning about headers out of
sync with the kernel sources.
tools include uapi:
Arnaldo Carvalho de Melo:
- Sync linux/if_link.h copy with the kernel sources, silencing a build warning.
perf top:
Arnaldo Carvalho de Melo:
- Add 'arch_cpu_idle' to the list of kernel idle symbols, noticed on a Orange
Pi Zero ARM board, just like with other symbols in other arches.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull RISC-V updates from Palmer Dabbelt:
"This tag contains a handful of updates that slipped through the cracks
during the merge window due to the holidays. The fixes are mostly
independent, with the exception of one larger audit-related branch.
Core RISC-V updates:
- The BSS has been moved, which shrinks flat images.
- A fix to test-bpf so it compiles on RV64I-based systems.
- A fix to respect the kernel commandline when there is no device
tree.
- A fix to prevent CPUs from trying to put themselves to sleep when
bringing down the system.
- Support for MODULE_SECTIONS on RV32I-based systems.
- [new in v2] The addition of an SBI earlycon driver. This is
definately a new feature, but I'd like to include it now because I
dropped this patch when submitting the merge window PR that removed
our EARLY_PRINTK support.
RISC-V audit updates:
- The addition of NR_syscalls into unistd.h, which is necessary for
CONFIG_FTRACE_SYSCALLS.
- The definition of CREATE_TRACE_POINTS so __tracepoint_sys_{enter,exit}
get defined.
- A fix for trace_sys_exit() so we can enable HAVE_SYSCALL_TRACEPOINTS
As usual, I've tested this by booting a Fedora-based image on a recent
QEMU (this time just whatever I had lying around).
* tag 'riscv-for-linus-4.21-rc2-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
tty/serial: Add RISC-V SBI earlycon support
riscv: add HAVE_SYSCALL_TRACEPOINTS to Kconfig
riscv: fix trace_sys_exit hook
riscv: define CREATE_TRACE_POINTS in ptrace.c
riscv: define NR_syscalls in unistd.h
riscv: audit: add audit hook in do_syscall_trace_enter/exit()
riscv: add audit support
RISC-V: Support MODULE_SECTIONS mechanism on RV32
MAINTAINERS: SiFive drivers: add myself as a SiFive driver maintainer
MAINTAINERS: SiFive drivers: change the git tree to a SiFive git tree
riscv: don't stop itself in smp_send_stop
arch: riscv: support kernel command line forcing when no DTB passed
tools uapi: fix RISC-V 64-bit support
RISC-V: Make BSS section as the last section in vmlinux.lds.S
Pull VFIO fixes from Alex Williamson:
- Fix trace header include path for in-tree builds (Masahiro Yamada)
- Fix overflow in unmap wrap-around test (Alex Williamson)
* tag 'vfio-v5.0-rc2' of git://github.com/awilliam/linux-vfio:
vfio/type1: Fix unmap overflow off-by-one
vfio/pci: set TRACE_INCLUDE_PATH to fix the build error
Pull sound fixes from Takashi Iwai:
"A collection of small fixes for USB-audio, HD-audio and cs46xx.
The USB-audio fixes are for out-of-bound accesses and a regression in
the recent cleanup, while HD-audio fixes are usual device-specific
quirks"
* tag 'sound-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Disable headset Mic VREF for headset mode of ALC225
ALSA: hda/realtek - Add unplug function into unplug state of Headset Mode for ALC225
ALSA: usb-audio: fix CM6206 register definitions
ALSA: cs46xx: Potential NULL dereference in probe
ALSA: hda/realtek - Support Dell headset mode for New AIO platform
ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks
ALSA: usb-audio: Always check descriptor sizes in parser code
ALSA: usb-audio: Check mixer unit descriptors more strictly
ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit()
Pull mtd fixes from Boris Brezillon:
"Core MTD Fixes:
- Fix a bug introduced when exposing MTD devs as NVMEM providers and
check for add_mtd_device() return code everywhere
raw NAND fixes:
- Fix a memory corruption in the QCOM driver"
* tag 'mtd/fixes-for-5.0-rc2' of git://git.infradead.org/linux-mtd:
mtd: rawnand: qcom: fix memory corruption that causes panic
mtd: Check add_mtd_device() ret code
mtd: Fix the check on nvmem_register() ret code
Cast autosuspend_delay to u64 to make sure that the full computation
of 'expires' or slack will be done in u64, even on 32bits arch.
Otherwise, any delay greater than 2^31 nsec can overflow if signed
32bits is used when converting delay from msec to nsec.
Fixes: 8234f6734c (PM-runtime: Switch autosuspend over to using hrtimers)
Reported-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PM-runtime now uses the hrtimers infrastructure for autosuspend, however
comments still reference 'jiffies'.
Fixes: 8234f6734c (PM-runtime: Switch autosuspend over to using hrtimers)
Signed-off-by: Ladislav Michl <ladis@linux-mips.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In RISC-V, the M-mode runtime firmware provide SBI calls for
debug prints. This patch adds earlycon support using RISC-V
SBI console calls. To enable it, just pass "earlycon=sbi" in
kernel parameters.
Signed-off-by: Anup Patel <anup@brainfault.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Pull arch/csky bug fixes from Guo Ren:
"Here are some fixup patches for 5.0-rc1:
- fix compile error with pte_alloc
- fix handle_irq_perbit break irq flow
- fix CACHEV1 store instruction fast retire
- fix module relocation error with 807 & 860
- add csky kernel features to documentation"
* tag 'csky-for-linus-5.0-rc1' of git://github.com/c-sky/csky-linux:
irqchip/csky: fixup handle_irq_perbit break irq
csky: fixup compile error with pte_alloc
csky: fixup CACHEV1 store instruction fast retire
csky: fixup relocation error with 807 & 860
Documentation/features: Add csky kernel features
The scmi-cpufreq driver calls the arch_set_freq_scale() callback on
frequency changes to provide scale-invariant load-tracking signals to
the scheduler. However, in the slow path, it does so while specifying
the current and max frequencies in different units, hence resulting in a
broken freq_scale factor.
Fix this by passing all frequencies in KHz, as stored in the CPUFreq
frequency table.
Fixes: 99d6bdf338 (cpufreq: add support for CPU DVFS based on SCMI message protocol)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: 4.17+ <stable@vger.kernel.org> # v4.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Old cpuidle/sysfs.txt file was replaced in aa5eee355b. So, refer to
an updated file.
Fixes: aa5eee355b (Documentation: admin-guide: PM: Add cpuidle document)
Signed-off-by: Otto Sabart <ottosabart@seberm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Initially DP0_SRCCTRL is set to a static value which includes
DP0_SRCCTRL_LANES_2 and DP0_SRCCTRL_BW27, even when only 1 lane of
1.62Gbps speed is used. DP1_SRCCTRL is configured to a magic number.
This patch changes the configuration as follows:
Configure DP0_SRCCTRL by using tc_srcctrl() which provides the correct
value.
DP1_SRCCTRL needs two bits to be set to the same value as DP0_SRCCTRL:
SSCG and BW27. All other bits can be zero.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190103115954.12785-5-tomi.valkeinen@ti.com
Disable Headset Mic VREF for headset mode of ALC225.
This will be controlled by coef bits of headset mode functions.
[ Fixed a compile warning and code simplification -- tiwai ]
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:
perf top:
Arnaldo Carvalho de Melo:
- Lift restriction on using callchains without "sym" in --sort
perf trace:
Arnaldo Carvalho de Melo:
- Fix ')' placement in "interrupted" syscall lines.
- Fix alignment for [continued] lines.
perf tests:
Florian Fainelli:
- Add a test for the ARM 32-bit [vectors] page.
tools lib traceevent:
Tzvetomir Stoyanov:
- Introduce new libtracevent API: tep_override_comm().
- Initialize host_bigendian at tep_handle allocation.
- More namespacing changes.
- Remove superfluous APIs.
tools headers uapi:
Arnaldo Carvalho de Melo:
. Update linux/{fs,vhost}.h, grab a copy o linux/mount.h, where the
MS_ mount flags were moved.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge misc fixes from Andrew Morton:
"14 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, page_alloc: do not wake kswapd with zone lock held
hugetlbfs: revert "use i_mmap_rwsem for more pmd sharing synchronization"
hugetlbfs: revert "Use i_mmap_rwsem to fix page fault/truncate race"
mm: page_mapped: don't assume compound page is huge or THP
mm/memory.c: initialise mmu_notifier_range correctly
tools/vm/page_owner: use page_owner_sort in the use example
kasan: fix krealloc handling for tag-based mode
kasan: make tag based mode work with CONFIG_HARDENED_USERCOPY
kasan, arm64: use ARCH_SLAB_MINALIGN instead of manual aligning
mm, memcg: fix reclaim deadlock with writeback
mm/usercopy.c: no check page span for stack objects
slab: alien caches must not be initialized if the allocation of the alien cache failed
fork, memcg: fix cached_stacks case
zram: idle writeback fixes and cleanup
The commit 594cc251fd ("make 'user_access_begin()' do 'access_ok()'")
exposed incorrect implementations of access_ok() macro in several
architectures. This change fixes 2 issues found in OpenRISC.
OpenRISC was not properly using parenthesis for arguments and also using
arguments twice. This patch fixes those 2 issues.
I test booted this patch with v5.0-rc1 on qemu and it's working fine.
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
syzbot reported the following regression in the latest merge window and
it was confirmed by Qian Cai that a similar bug was visible from a
different context.
======================================================
WARNING: possible circular locking dependency detected
4.20.0+ #297 Not tainted
------------------------------------------------------
syz-executor0/8529 is trying to acquire lock:
000000005e7fb829 (&pgdat->kswapd_wait){....}, at:
__wake_up_common_lock+0x19e/0x330 kernel/sched/wait.c:120
but task is already holding lock:
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: spin_lock
include/linux/spinlock.h:329 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_bulk
mm/page_alloc.c:2548 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: __rmqueue_pcplist
mm/page_alloc.c:3021 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_pcplist
mm/page_alloc.c:3050 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue
mm/page_alloc.c:3072 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at:
get_page_from_freelist+0x1bae/0x52a0 mm/page_alloc.c:3491
It appears to be a false positive in that the only way the lock ordering
should be inverted is if kswapd is waking itself and the wakeup
allocates debugging objects which should already be allocated if it's
kswapd doing the waking. Nevertheless, the possibility exists and so
it's best to avoid the problem.
This patch flags a zone as needing a kswapd using the, surprisingly,
unused zone flag field. The flag is read without the lock held to do
the wakeup. It's possible that the flag setting context is not the same
as the flag clearing context or for small races to occur. However, each
race possibility is harmless and there is no visible degredation in
fragmentation treatment.
While zone->flag could have continued to be unused, there is potential
for moving some existing fields into the flags field instead.
Particularly read-mostly ones like zone->initialized and
zone->contiguous.
Link: http://lkml.kernel.org/r/20190103225712.GJ31517@techsingularity.net
Fixes: 1c30844d2d ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Reported-by: syzbot+93d94a001cfbce9e60e1@syzkaller.appspotmail.com
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts b43a999005
The reverted commit caused issues with migration and poisoning of anon
huge pages. The LTP move_pages12 test will cause an "unable to handle
kernel NULL pointer" BUG would occur with stack similar to:
RIP: 0010:down_write+0x1b/0x40
Call Trace:
migrate_pages+0x81f/0xb90
__ia32_compat_sys_migrate_pages+0x190/0x190
do_move_pages_to_node.isra.53.part.54+0x2a/0x50
kernel_move_pages+0x566/0x7b0
__x64_sys_move_pages+0x24/0x30
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The purpose of the reverted patch was to fix some long existing races
with huge pmd sharing. It used i_mmap_rwsem for this purpose with the
idea that this could also be used to address truncate/page fault races
with another patch. Further analysis has determined that i_mmap_rwsem
can not be used to address all these hugetlbfs synchronization issues.
Therefore, revert this patch while working an another approach to the
underlying issues.
Link: http://lkml.kernel.org/r/20190103235452.29335-2-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LTP proc01 testcase has been observed to rarely trigger crashes
on arm64:
page_mapped+0x78/0xb4
stable_page_flags+0x27c/0x338
kpageflags_read+0xfc/0x164
proc_reg_read+0x7c/0xb8
__vfs_read+0x58/0x178
vfs_read+0x90/0x14c
SyS_read+0x60/0xc0
The issue is that page_mapped() assumes that if compound page is not
huge, then it must be THP. But if this is 'normal' compound page
(COMPOUND_PAGE_DTOR), then following loop can keep running (for
HPAGE_PMD_NR iterations) until it tries to read from memory that isn't
mapped and triggers a panic:
for (i = 0; i < hpage_nr_pages(page); i++) {
if (atomic_read(&page[i]._mapcount) >= 0)
return true;
}
I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only
with a custom kernel module [1] which:
- allocates compound page (PAGEC) of order 1
- allocates 2 normal pages (COPY), which are initialized to 0xff (to
satisfy _mapcount >= 0)
- 2 PAGEC page structs are copied to address of first COPY page
- second page of COPY is marked as not present
- call to page_mapped(COPY) now triggers fault on access to 2nd COPY
page at offset 0x30 (_mapcount)
[1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c
Fix the loop to iterate for "1 << compound_order" pages.
Kirrill said "IIRC, sound subsystem can producuce custom mapped compound
pages".
Link: http://lkml.kernel.org/r/c440d69879e34209feba21e12d236d06bc0a25db.1543577156.git.jstancek@redhat.com
Fixes: e1534ae950 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Debugged-by: Laszlo Ersek <lersek@redhat.com>
Suggested-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>