Merge various fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
mm, page_alloc: reset zonelist iterator after resetting fair zone allocation policy
mm, oom_reaper: do not use siglock in try_oom_reaper()
mm, page_alloc: prevent infinite loop in buffered_rmqueue()
checkpatch: reduce git commit description style false positives
mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup
memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem()
mm: check the return value of lookup_page_ext for all call sites
kdump: fix dmesg gdbmacro to work with record based printk
mm: fix overflow in vm_map_ram()
Pull irq fixes from Thomas Gleixner:
- a few simple fixes for fallout from the recent gic-v3 changes
- a workaround for a Cavium thunderX erratum
- a bugfix for the pic32 irqchip to make external interrupts work proper
- a missing return value in the generic IPI management code
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/irq-pic32-evic: Fix bug with external interrupts.
irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144
irqchip/gic-v3: Fix quiescence check in gic_enable_redist
irqchip/gic-v3: Fix copy+paste mistakes in defines
irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask
genirq: Fix missing return value in irq_destroy_ipi()
The optimistic fast path may use cpuset_current_mems_allowed instead of
of a NULL nodemask supplied by the caller for cpuset allocations. The
preferred zone is calculated on this basis for statistic purposes and as
a starting point in the zonelist iterator.
However, if the context can ignore memory policies due to being atomic
or being able to ignore watermarks then the starting point in the
zonelist iterator is no longer correct. This patch resets the zonelist
iterator in the allocator slowpath if the context can ignore memory
policies. This will alter the zone used for statistics but only after
it is known that it makes sense for that context. Resetting it before
entering the slowpath would potentially allow an ALLOC_CPUSET allocation
to be accounted for against the wrong zone. Note that while nodemask is
not explicitly set to the original nodemask, it would only have been
overwritten if cpuset_enabled() and it was reset before the slowpath was
entered.
Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net
Fixes: c33d6c06f6 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven reported the following problem that bisected to
commit c33d6c06f6 ("mm, page_alloc: avoid looking up the first zone
in a zonelist twice") on m68k/ARAnyM
BUG: scheduling while atomic: cron/668/0x10c9a0c0
Modules linked in:
CPU: 0 PID: 668 Comm: cron Not tainted 4.6.0-atari-05133-gc33d6c06f60f710f #364
Call Trace: [<0003d7d0>] __schedule_bug+0x40/0x54
__schedule+0x312/0x388
__schedule+0x0/0x388
prepare_to_wait+0x0/0x52
schedule+0x64/0x82
schedule_timeout+0xda/0x104
set_next_entity+0x18/0x40
pick_next_task_fair+0x78/0xda
io_schedule_timeout+0x36/0x4a
bit_wait_io+0x0/0x40
bit_wait_io+0x12/0x40
__wait_on_bit+0x46/0x76
wait_on_page_bit_killable+0x64/0x6c
bit_wait_io+0x0/0x40
wake_bit_function+0x0/0x4e
__lock_page_or_retry+0xde/0x124
do_scan_async+0x114/0x17c
lookup_swap_cache+0x24/0x4e
handle_mm_fault+0x626/0x7de
find_vma+0x0/0x66
down_read+0x0/0xe
wait_on_page_bit_killable_timeout+0x77/0x7c
find_vma+0x16/0x66
do_page_fault+0xe6/0x23a
res_func+0xa3c/0x141a
buserr_c+0x190/0x6d4
res_func+0xa3c/0x141a
buserr+0x20/0x28
res_func+0xa3c/0x141a
buserr+0x20/0x28
The relationship is not obvious but it's due to a failure to rescan the
full zonelist after the fair zone allocation policy exhausts the batch
count. While this is a functional problem, it's also a performance
issue. A page allocator microbenchmark showed the following
4.7.0-rc1 4.7.0-rc1
vanilla reset-v1r2
Min alloc-odr0-1 327.00 ( 0.00%) 326.00 ( 0.31%)
Min alloc-odr0-2 235.00 ( 0.00%) 235.00 ( 0.00%)
Min alloc-odr0-4 198.00 ( 0.00%) 198.00 ( 0.00%)
Min alloc-odr0-8 170.00 ( 0.00%) 170.00 ( 0.00%)
Min alloc-odr0-16 156.00 ( 0.00%) 156.00 ( 0.00%)
Min alloc-odr0-32 150.00 ( 0.00%) 150.00 ( 0.00%)
Min alloc-odr0-64 146.00 ( 0.00%) 146.00 ( 0.00%)
Min alloc-odr0-128 145.00 ( 0.00%) 145.00 ( 0.00%)
Min alloc-odr0-256 155.00 ( 0.00%) 155.00 ( 0.00%)
Min alloc-odr0-512 168.00 ( 0.00%) 165.00 ( 1.79%)
Min alloc-odr0-1024 175.00 ( 0.00%) 174.00 ( 0.57%)
Min alloc-odr0-2048 180.00 ( 0.00%) 180.00 ( 0.00%)
Min alloc-odr0-4096 187.00 ( 0.00%) 186.00 ( 0.53%)
Min alloc-odr0-8192 190.00 ( 0.00%) 190.00 ( 0.00%)
Min alloc-odr0-16384 191.00 ( 0.00%) 191.00 ( 0.00%)
Min alloc-odr1-1 736.00 ( 0.00%) 445.00 ( 39.54%)
Min alloc-odr1-2 343.00 ( 0.00%) 335.00 ( 2.33%)
Min alloc-odr1-4 277.00 ( 0.00%) 270.00 ( 2.53%)
Min alloc-odr1-8 238.00 ( 0.00%) 233.00 ( 2.10%)
Min alloc-odr1-16 224.00 ( 0.00%) 218.00 ( 2.68%)
Min alloc-odr1-32 210.00 ( 0.00%) 208.00 ( 0.95%)
Min alloc-odr1-64 207.00 ( 0.00%) 203.00 ( 1.93%)
Min alloc-odr1-128 276.00 ( 0.00%) 202.00 ( 26.81%)
Min alloc-odr1-256 206.00 ( 0.00%) 202.00 ( 1.94%)
Min alloc-odr1-512 207.00 ( 0.00%) 202.00 ( 2.42%)
Min alloc-odr1-1024 208.00 ( 0.00%) 205.00 ( 1.44%)
Min alloc-odr1-2048 213.00 ( 0.00%) 212.00 ( 0.47%)
Min alloc-odr1-4096 218.00 ( 0.00%) 216.00 ( 0.92%)
Min alloc-odr1-8192 341.00 ( 0.00%) 219.00 ( 35.78%)
Note that order-0 allocations are unaffected but higher orders get a
small boost from this patch and a large reduction in system CPU usage
overall as can be seen here:
4.7.0-rc1 4.7.0-rc1
vanilla reset-v1r2
User 85.32 86.31
System 2221.39 2053.36
Elapsed 2368.89 2202.47
Fixes: c33d6c06f6 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Link: http://lkml.kernel.org/r/20160531100848.GR2527@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In DEBUG_VM kernel, we can hit infinite loop for order == 0 in
buffered_rmqueue() when check_new_pcp() returns 1, because the bad page
is never removed from the pcp list. Fix this by removing the page
before retrying. Also we don't need to check if page is non-NULL,
because we simply grab it from the list which was just tested for being
non-empty.
Fixes: 479f854a20 ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
Link: http://lkml.kernel.org/r/20160530090154.GM2527@techsingularity.net
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memcg_offline_kmem() may be called from memcg_free_kmem() after a css
init failure. memcg_free_kmem() is a ->css_free callback which is
called without cgroup_mutex and memcg_offline_kmem() ends up using
css_for_each_descendant_pre() without any locking. Fix it by adding rcu
read locking around it.
mkdir: cannot create directory `65530': No space left on device
===============================
[ INFO: suspicious RCU usage. ]
4.6.0-work+ #321 Not tainted
-------------------------------
kernel/cgroup.c:4008 cgroup_mutex or RCU read lock required!
[ 527.243970] other info that might help us debug this:
[ 527.244715]
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by kworker/0:5/1664:
#0: ("cgroup_destroy"){.+.+..}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
#1: ((&css->destroy_work)#3){+.+...}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
[ 527.248098] stack backtrace:
CPU: 0 PID: 1664 Comm: kworker/0:5 Not tainted 4.6.0-work+ #321
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
Workqueue: cgroup_destroy css_free_work_fn
Call Trace:
dump_stack+0x68/0xa1
lockdep_rcu_suspicious+0xd7/0x110
css_next_descendant_pre+0x7d/0xb0
memcg_offline_kmem.part.44+0x4a/0xc0
mem_cgroup_css_free+0x1ec/0x200
css_free_work_fn+0x49/0x5e0
process_one_work+0x1c5/0x4a0
worker_thread+0x49/0x490
kthread+0xea/0x100
ret_from_fork+0x1f/0x40
Link: http://lkml.kernel.org/r/20160526203018.GG23194@mtj.duckdns.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> [4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull timer bugfix from Thomas Gleixner:
"A single bugfix for the error check wreckage we introduced in the
merge window"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Make settimeofday error checking work again
Pull ARM fix from Russell King:
"Just one fix to the ptrace code, spotted by Simon Marchi, where if a
thread migrates to a different CPU and the VFP registers are changed
through ptrace, the application doesn't see the updated VFP registers"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: fix PTRACE_SETVFPREGS on SMP systems
Pull arm64 fixes from Will Deacon:
"The main thing here is reviving hugetlb support using contiguous ptes,
which we ended up reverting at the last minute in 4.5 pending a fix
which went into the core mm/ code during the recent merge window.
- Revert a previous revert and get hugetlb going with contiguous hints
- Wire up missing compat syscalls
- Enable CONFIG_SET_MODULE_RONX by default
- Add missing line to our compat /proc/cpuinfo output
- Clarify levels in our page table dumps
- Fix booting with RANDOMIZE_TEXT_OFFSET enabled
- Misc fixes to the ARM CPU PMU driver (refcounting, probe failure)
- Remove some dead code and update a comment"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabled
arm64: move {PAGE,CONT}_SHIFT into Kconfig
arm64: mm: dump: log span level
arm64: update stale PAGE_OFFSET comment
drivers/perf: arm_pmu: Avoid leaking pmu->irq_affinity on error
drivers/perf: arm_pmu: Defer the setting of __oprofile_cpu_pmu
drivers/perf: arm_pmu: Fix reference count of a device_node in of_pmu_irq_cfg
arm64: report CPU number in bad_mode
arm64: unistd32.h: wire up missing syscalls for compat tasks
arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks
arm64: enable CONFIG_SET_MODULE_RONX by default
arm64: Remove orphaned __addr_ok() definition
Revert "arm64: hugetlb: partial revert of 66b3923a1a0f"
Pull powerpc fixes from Michael Ellerman:
- Handle RTAS delay requests in configure_bridge from Russell Currey
- Refactor the configure_bridge RTAS tokens from Russell Currey
- Fix definition of SIAR and SDAR registers from Thomas Huth
- Use privileged SPR number for MMCR2 from Thomas Huth
- Update LPCR only if it is powernv from Aneesh Kumar K.V
- Fix the reference bit update when handling hash fault from Aneesh
Kumar K.V
- Add missing tlb flush from Aneesh Kumar K.V
- Add POWER8NVL support to ibm,client-architecture-support call from
Thomas Huth
* tag 'powerpc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call
powerpc/mm/radix: Add missing tlb flush
powerpc/mm/hash: Fix the reference bit update when handling hash fault
powerpc/mm/radix: Update LPCR only if it is powernv
powerpc: Use privileged SPR number for MMCR2
powerpc: Fix definition of SIAR and SDAR registers
powerpc/pseries/eeh: Refactor the configure_bridge RTAS tokens
powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge
Merge irqchip updates from Marc Zyngier:
- A number of embarassing buglets (GICv3, PIC32)
- A more substential errata workaround for Cavium's GICv3 ITS
(kept for post-rc1 due to its dependency on NUMA)
With ARM64_64K_PAGES and RANDOMIZE_TEXT_OFFSET enabled, we hit the
following issue on the boot:
kernel BUG at arch/arm64/mm/mmu.c:480!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.6.0 #310
Hardware name: ARM Juno development board (r2) (DT)
task: ffff000008d58a80 ti: ffff000008d30000 task.ti: ffff000008d30000
PC is at map_kernel_segment+0x44/0xb0
LR is at paging_init+0x84/0x5b0
pc : [<ffff000008c450b4>] lr : [<ffff000008c451a4>] pstate: 600002c5
Call trace:
[<ffff000008c450b4>] map_kernel_segment+0x44/0xb0
[<ffff000008c451a4>] paging_init+0x84/0x5b0
[<ffff000008c42728>] setup_arch+0x198/0x534
[<ffff000008c40848>] start_kernel+0x70/0x388
[<ffff000008c401bc>] __primary_switched+0x30/0x74
Commit 7eb90f2ff7 ("arm64: cover the .head.text section in the .text
segment mapping") removed the alignment between the .head.text and .text
sections, and used the _text rather than the _stext interval for mapping
the .text segment.
Prior to this commit _stext was always section aligned and didn't cause
any issue even when RANDOMIZE_TEXT_OFFSET was enabled. Since that
alignment has been removed and _text is used to map the .text segment,
we need ensure _text is always page aligned when RANDOMIZE_TEXT_OFFSET
is enabled.
This patch adds logic to TEXT_OFFSET fuzzing to ensure that the offset
is always aligned to the kernel page size. To ensure this, we rely on
the PAGE_SHIFT being available via Kconfig.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 7eb90f2ff7 ("arm64: cover the .head.text section in the .text segment mapping")
Signed-off-by: Will Deacon <will.deacon@arm.com>
In some cases (e.g. the awk for CONFIG_RANDOMIZE_TEXT_OFFSET) we would
like to make use of PAGE_SHIFT outside of code that can include the
usual header files.
Add a new CONFIG_ARM64_PAGE_SHIFT for this, likewise with
ARM64_CONT_SHIFT for consistency.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The page table dump code logs spans of entries at the same level
(pgd/pud/pmd/pte) which have the same attributes. While we log the
(decoded) attributes, we don't log the level, which leaves the output
ambiguous and/or confusing in some cases.
For example:
0xffff800800000000-0xffff800980000000 6G RW NX SHD AF BLK UXN MEM/NORMAL
If using 4K pages, this may describe a span of 6 1G block entries at the
PGD/PUD level, or 3072 2M block entries at the PMD level.
This patch adds the page table level to each output line, removing this
ambiguity. For the example above, this will produce:
0xffffffc800000000-0xffffffc980000000 6G PUD RW NX SHD AF BLK UXN MEM/NORMAL
When 3 level tables are in use, and we use the asm-generic/nopud.h
definitions, the dump code treats each entry in the PGD as a 1 element
table at the PUD level, and logs spans as being PUDs, which can be
confusing. To counteract this, the "PUD" mnemonic is replaced with "PGD"
when CONFIG_PGTABLE_LEVELS <= 3. Likewise for "PMD" when
CONFIG_PGTABLE_LEVELS <= 2.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Huang Shijie <shijie.huang@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit ab893fb9f1 ("arm64: introduce KIMAGE_VADDR as the virtual
base of the kernel region") logically split KIMAGE_VADDR from
PAGE_OFFSET, and since commit f9040773b7 ("arm64: move kernel
image to base of vmalloc area") the two have been distinct values.
Unfortunately, neither commit updated the comment above these
definitions, which now erroneously states that PAGE_OFFSET is the start
of the kernel image rather than the start of the linear mapping.
This patch fixes said comment, and introduces an explanation of
KIMAGE_VADDR.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
pmu->irq_affinity will not be freed if an error occurred within
arm_pmu_device_probe after of_pmu_irq_cfg has been called.
Note that in the case of_pmu_irq_cfg is returning an error,
pmu->irq_affinity will not be set, but it should be NULL as pmu was
kzalloc'd. Therefore the result kfree(NULL) is benign.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The global variable __oprofile_cpu_pmu is set before the PMU is fully
initialized. If an error occurs before the end of the initialization,
the PMU will be freed and the variable will contain an invalid pointer.
This will result in a kernel crash when perf will be used.
Fix it by moving the setting of __oprofile_cpu_pmu when the PMU is fully
initialized (i.e when it is no longer possible to fail).
Cc: <stable@vger.kernel.org>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The only function called by of_pmu_irq_cfg that will increment the
reference count on dn is of_parse_phandle.
Each time we successfully parse a possible CPU from an
interrupt-affinity property, we increment the refcount of that CPU node
once via of_parse_handle. After validating the CPU is possible, we
decrement the refcount once. Subsequently, we decrement the refcount
again, either as part of an early break if we don't have a matching SPI,
or as part of the end of the loop body.
This will lead to decrementing twice the refcounnt.
Remove the second pairs of call to of_node_put as nobody is using dn
between the first and second call to of_node_put.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
If we take an exception we don't expect (e.g. SError), we report this in
the bad_mode handler with pr_crit. Depending on the configured log
level, we may or may not log additional information in functions called
subsequently. Notably, the messages in dump_stack (including the CPU
number) are printed with KERN_DEFAULT and may not appear.
Some exceptions have an IMPLEMENTATION DEFINED ESR_ELx.ISS encoding, and
knowing the CPU number is crucial to correctly decode them. To ensure
that this is always possible, we should log the CPU number along with
the ESR_ELx value, so we are not reliant on subsequent logs or
additional printk configuration options.
This patch logs the CPU number in bad_mode such that it is possible for
a developer to decode these exceptions, provided access to sufficient
documentation.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Al Grant <Al.Grant@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull KVM fixes from Radim Krčmář:
"ARM:
- two fixes for 4.6 vgic [Christoffer] (cc stable)
- six fixes for 4.7 vgic [Marc]
x86:
- six fixes from syzkaller reports [Paolo] (two of them cc stable)
- allow OS X to boot [Dmitry]
- don't trust compilers [Nadav]"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS
KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi
KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number
KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR
KVM: Handle MSR_IA32_PERF_CTL
KVM: x86: avoid write-tearing of TDP
KVM: arm/arm64: vgic-new: Removel harmful BUG_ON
arm64: KVM: vgic-v3: Relax synchronization when SRE==1
arm64: KVM: vgic-v3: Prevent the guest from messing with ICC_SRE_EL1
arm64: KVM: Make ICC_SRE_EL1 access return the configured SRE value
KVM: arm/arm64: vgic-v3: Always resample level interrupts
KVM: arm/arm64: vgic-v2: Always resample level interrupts
KVM: arm/arm64: vgic-v3: Clear all dirty LRs
KVM: arm/arm64: vgic-v2: Clear all dirty LRs
The erratum fixes the hang of ITS SYNC command by avoiding inter node
io and collections/cpu mapping on thunderx dual-socket platform.
This fix is only applicable for Cavium's ThunderX dual-socket platform.
Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Make sure the two sides of the bitwise operation are bool.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
ICC_SGI1R_AFFINITY_{2,3}_MASK are unused, which is good
because they were defined with the wrong shifts.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The INTID mask is wrong, and is made a signed value, which has
nteresting effects in the KVM emulation. Let's sanitize it.
Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Intel CPUs having Turbo Boost feature implement an MSR to provide a
control interface via rdmsr/wrmsr instructions. One could detect the
presence of this feature by issuing one of these instructions and
handling the #GP exception which is generated in case the referenced MSR
is not implemented by the CPU.
KVM's vCPU model behaves exactly as a real CPU in this case by injecting
a fault when MSR_IA32_PERF_CTL is called (which KVM does not support).
However, some operating systems use this register during an early boot
stage in which their kernel is not capable of handling #GP correctly,
causing #DP and finally a triple fault effectively resetting the vCPU.
This patch implements a dummy handler for MSR_IA32_PERF_CTL to avoid the
crashes.
Signed-off-by: Dmitry Bilunov <kmeaw@yandex-team.ru>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
In theory, nothing prevents the compiler from write-tearing PTEs, or
split PTE writes. These partially-modified PTEs can be fetched by other
cores and cause mayhem. I have not really encountered such case in
real-life, but it does seem possible.
For example, the compiler may try to do something creative for
kvm_set_pte_rmapp() and perform multiple writes to the PTE.
Signed-off-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
PTRACE_SETVFPREGS fails to properly mark the VFP register set to be
reloaded, because it undoes one of the effects of vfp_flush_hwstate().
Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to
an invalid CPU number, but vfp_set() overwrites this with the original
CPU number, thereby rendering the hardware state as apparently "valid",
even though the software state is more recent.
Fix this by reverting the previous change.
Cc: <stable@vger.kernel.org>
Fixes: 8130b9d7b9 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers")
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Simon Marchi <simon.marchi@ericsson.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
When changing the active bit from an MMIO trap, we decide to
explode if the intid is that of a private interrupt.
This flawed logic comes from the fact that we were assuming that
kvm_vcpu_kick() as called by kvm_arm_halt_vcpu() would not return before
the called vcpu responded, but this is not the case, so we need to
perform this wait even for private interrupts.
Dropping the BUG_ON seems like the right thing to do.
[ Commit message tweaked by Christoffer ]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Pull pin control fixes from Linus Walleij:
"Here are three pin control fixes for v4.7. Not much, and just driver
fixes:
- add device tree matches to MAINTAINERS
- inversion bug in the Nomadik driver
- dual edge handling bug in the mediatek driver"
* tag 'pinctrl-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: mediatek: fix dual-edge code defect
MAINTAINERS: Add file patterns for pinctrl device tree bindings
pinctrl: nomadik: fix inversion of gpio direction
Pull dma-buf updates from Sumit Semwal:
- use of vma_pages instead of explicit computation
- DocBook and headerdoc updates for dma-buf
* tag 'dma-buf-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf:
dma-buf: use vma_pages()
fence: add missing descriptions for fence
doc: update/fixup dma-buf related DocBook
reservation: add headerdoc comments
dma-buf: headerdoc fixes
We're missing entries for mlock2, copy_file_range, preadv2 and pwritev2
in our compat syscall table, so hook them up. Only the last two need
compat wrappers.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull networking fixes from David Miller:
1) Fix negative error code usage in ATM layer, from Stefan Hajnoczi.
2) If CONFIG_SYSCTL is disabled, the default TTL is not initialized
properly. From Ezequiel Garcia.
3) Missing spinlock init in mvneta driver, from Gregory CLEMENT.
4) Missing unlocks in hwmb error paths, also from Gregory CLEMENT.
5) Fix deadlock on team->lock when propagating features, from Ivan
Vecera.
6) Work around buffer offset hw bug in alx chips, from Feng Tang.
7) Fix double listing of SCTP entries in sctp_diag dumps, from Xin
Long.
8) Various statistics bug fixes in mlx4 from Eric Dumazet.
9) Fix some randconfig build errors wrt fou ipv6 from Arnd Bergmann.
10) All of l2tp was namespace aware, but the ipv6 support code was not
doing so. From Shmulik Ladkani.
11) Handle on-stack hrtimers properly in pktgen, from Guenter Roeck.
12) Propagate MAC changes properly through VLAN devices, from Mike
Manning.
13) Fix memory leak in bnx2x_init_one(), from Vitaly Kuznetsov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
sfc: Track RPS flow IDs per channel instead of per function
usbnet: smsc95xx: fix link detection for disabled autonegotiation
virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv
bnx2x: avoid leaking memory on bnx2x_init_one() failures
fou: fix IPv6 Kconfig options
openvswitch: update checksum in {push,pop}_mpls
sctp: sctp_diag should dump sctp socket type
net: fec: update dirty_tx even if no skb
vlan: Propagate MAC address to VLANs
atm: iphase: off by one in rx_pkt()
atm: firestream: add more reserved strings
vxlan: Accept user specified MTU value when create new vxlan link
net: pktgen: Call destroy_hrtimer_on_stack()
timer: Export destroy_hrtimer_on_stack()
net: l2tp: Make l2tp_ip6 namespace aware
Documentation: ip-sysctl.txt: clarify secure_redirects
sfc: use flow dissector helpers for aRFS
ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr
net: nps_enet: Disable interrupts before napi reschedule
net/lapb: tuse %*ph to dump buffers
...
Pull sparc fixes from David Miller:
"sparc64 mmu context allocation and trap return bug fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix return from trap window fill crashes.
sparc: Harden signal return frame checks.
sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
If we do not provide the PVR for POWER8NVL, a guest on this system
currently ends up in PowerISA 2.06 compatibility mode on KVM, since QEMU
does not provide a generic PowerISA 2.07 mode yet. So some new
instructions from POWER8 (like "mtvsrd") get disabled for the guest,
resulting in crashes when using code compiled explicitly for
POWER8 (e.g. with the "-mcpu=power8" option of GCC).
Fixes: ddee09c099 ("powerpc: Add PVR for POWER8NVL processor")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This should not have any impact on hash, because hash does tlb
invalidate with every pte update and we don't implement
flush_tlb_* functions for hash. With radix we should make an explicit
call to flush tlb outside pte update.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>