Commit Graph

162094 Commits

Author SHA1 Message Date
Linus Torvalds
1609d7604b Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "The main change here is a revert of reverts. We recently simplified
  some code that was thought unnecessary; however, since then KVM has
  grown quite a few cond_resched()s and for that reason the simplified
  code is prone to livelocks---one CPUs tries to empty a list of guest
  page tables while the others keep adding to them. This adds back the
  generation-based zapping of guest page tables, which was not
  unnecessary after all.

  On top of this, there is a fix for a kernel memory leak and a couple
  of s390 fixlets as well"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
  KVM: x86: work around leak of uninitialized stack contents
  KVM: nVMX: handle page fault in vmread
  KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl
  KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
2019-09-14 16:07:40 -07:00
Linus Torvalds
b03c036e6f Merge tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fix from Paul Walmsley:
 "Last week, Palmer and I learned that there was an error in the RISC-V
  kernel image header format that could make it less compatible with the
  ARM64 kernel image header format. I had missed this error during my
  original reviews of the patch.

  The kernel image header format is an interface that impacts
  bootloaders, QEMU, and other user tools. Those packages must be
  updated to align with whatever is merged in the kernel. We would like
  to avoid proliferating these image formats by keeping the RISC-V
  header as close as possible to the existing ARM64 header. Since the
  arch/riscv patch that adds support for the image header was merged
  with our v5.3-rc1 pull request as commit 0f327f2aaa ("RISC-V: Add
  an Image header that boot loader can parse."), we think it wise to try
  to fix this error before v5.3 is released.

  The fix itself should be backwards-compatible with any project that
  has already merged support for premature versions of this interface.
  It primarily involves ensuring that the RISC-V image header has
  something useful in the same field as the ARM64 image header"

* tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: modify the Image header to improve compatibility with the ARM64 header
2019-09-14 15:58:02 -07:00
Paolo Bonzini
a9c20bb020 Merge tag 'kvm-s390-master-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master
KVM: s390: Fixes for 5.3

- prevent a user triggerable oops in the migration code
- do not leak kernel stack content
2019-09-14 09:25:30 +02:00
Sean Christopherson
002c5f73c5 KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
James Harvey reported a livelock that was introduced by commit
d012a06ab1 ("Revert "KVM: x86/mmu: Zap only the relevant pages when
removing a memslot"").

The livelock occurs because kvm_mmu_zap_all() as it exists today will
voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs
to add shadow pages.  With enough vCPUs, kvm_mmu_zap_all() can get stuck
in an infinite loop as it can never zap all pages before observing lock
contention or the need to reschedule.  The equivalent of kvm_mmu_zap_all()
that was in use at the time of the reverted commit (4e103134b8, "KVM:
x86/mmu: Zap only the relevant pages when removing a memslot") employed
a fast invalidate mechanism and was not susceptible to the above livelock.

There are three ways to fix the livelock:

- Reverting the revert (commit d012a06ab1) is not a viable option as
  the revert is needed to fix a regression that occurs when the guest has
  one or more assigned devices.  It's unlikely we'll root cause the device
  assignment regression soon enough to fix the regression timely.

- Remove the conditional reschedule from kvm_mmu_zap_all().  However, although
  removing the reschedule would be a smaller code change, it's less safe
  in the sense that the resulting kvm_mmu_zap_all() hasn't been used in
  the wild for flushing memslots since the fast invalidate mechanism was
  introduced by commit 6ca18b6950 ("KVM: x86: use the fast way to
  invalidate all pages"), back in 2013.

- Reintroduce the fast invalidate mechanism and use it when zapping shadow
  pages in response to a memslot being deleted/moved, which is what this
  patch does.

For all intents and purposes, this is a revert of commit ea145aacf4
("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of
commit 7390de1e99 ("Revert "KVM: x86: use the fast way to invalidate
all pages""), i.e. restores the behavior of commit 5304b8d37c ("KVM:
MMU: fast invalidate all pages") and commit 6ca18b6950 ("KVM: x86:
use the fast way to invalidate all pages") respectively.

Fixes: d012a06ab1 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"")
Reported-by: James Harvey <jamespharvey20@gmail.com>
Cc: Alex Willamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-14 09:25:11 +02:00
Fuqian Huang
541ab2aeb2 KVM: x86: work around leak of uninitialized stack contents
Emulation of VMPTRST can incorrectly inject a page fault
when passed an operand that points to an MMIO address.
The page fault will use uninitialized kernel stack memory
as the CR2 and error code.

The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
exit to userspace; however, it is not an easy fix, so for now just ensure
that the error code and CR2 are zero.

Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Cc: stable@vger.kernel.org
[add comment]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-14 09:25:11 +02:00
Paolo Bonzini
f7eea636c3 KVM: nVMX: handle page fault in vmread
The implementation of vmread to memory is still incomplete, as it
lacks the ability to do vmread to I/O memory just like vmptrst.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-14 09:25:02 +02:00
Paul Walmsley
474efecb65 riscv: modify the Image header to improve compatibility with the ARM64 header
Part of the intention during the definition of the RISC-V kernel image
header was to lay the groundwork for a future merge with the ARM64
image header.  One error during my original review was not noticing
that the RISC-V header's "magic" field was at a different size and
position than the ARM64's "magic" field.  If the existing ARM64 Image
header parsing code were to attempt to parse an existing RISC-V kernel
image header format, it would see a magic number 0.  This is
undesirable, since it's our intention to align as closely as possible
with the ARM64 header format.  Another problem was that the original
"res3" field was not being initialized correctly to zero.

Address these issues by creating a 32-bit "magic2" field in the RISC-V
header which matches the ARM64 "magic" field.  RISC-V binaries will
store "RSC\x05" in this field.  The intention is that the use of the
existing 64-bit "magic" field in the RISC-V header will be deprecated
over time.  Increment the minor version number of the file format to
indicate this change, and update the documentation accordingly.  Fix
the assembler directives in head.S to ensure that reserved fields are
properly zero-initialized.

Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reported-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Karsten Merker <merker@debian.org>
Link: https://lore.kernel.org/linux-riscv/194c2f10c9806720623430dbf0cc59a965e50448.camel@wdc.com/T/#u
Link: https://lore.kernel.org/linux-riscv/mhng-755b14c4-8f35-4079-a7ff-e421fd1b02bc@palmer-si-x1e/T/#t
2019-09-13 19:03:52 -07:00
Linus Torvalds
95217783b7 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "A KVM guest fix, and a kdump kernel relocation errors fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/timer: Force PIT initialization when !X86_FEATURE_ARAT
  x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors
2019-09-12 14:47:35 +01:00
Thomas Huth
53936b5bf3 KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl
When the userspace program runs the KVM_S390_INTERRUPT ioctl to inject
an interrupt, we convert them from the legacy struct kvm_s390_interrupt
to the new struct kvm_s390_irq via the s390int_to_s390irq() function.
However, this function does not take care of all types of interrupts
that we can inject into the guest later (see do_inject_vcpu()). Since we
do not clear out the s390irq values before calling s390int_to_s390irq(),
there is a chance that we copy random data from the kernel stack which
could be leaked to the userspace later.

Specifically, the problem exists with the KVM_S390_INT_PFAULT_INIT
interrupt: s390int_to_s390irq() does not handle it, and the function
__inject_pfault_init() later copies irq->u.ext which contains the
random kernel stack data. This data can then be leaked either to
the guest memory in __deliver_pfault_init(), or the userspace might
retrieve it directly with the KVM_S390_GET_IRQ_STATE ioctl.

Fix it by handling that interrupt type in s390int_to_s390irq(), too,
and by making sure that the s390irq struct is properly pre-initialized.
And while we're at it, make sure that s390int_to_s390irq() now
directly returns -EINVAL for unknown interrupt types, so that we
immediately get a proper error code in case we add more interrupt
types to do_inject_vcpu() without updating s390int_to_s390irq()
sometime in the future.

Cc: stable@vger.kernel.org
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/kvm/20190912115438.25761-1-thuth@redhat.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2019-09-12 14:12:21 +02:00
Igor Mammedov
13a17cc052 KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
If userspace doesn't set KVM_MEM_LOG_DIRTY_PAGES on memslot before calling
kvm_s390_vm_start_migration(), kernel will oops with:

  Unable to handle kernel pointer dereference in virtual kernel address space
  Failing address: 0000000000000000 TEID: 0000000000000483
  Fault in home space mode while using kernel ASCE.
  AS:0000000002a2000b R2:00000001bff8c00b R3:00000001bff88007 S:00000001bff91000 P:000000000000003d
  Oops: 0004 ilc:2 [#1] SMP
  ...
  Call Trace:
  ([<001fffff804ec552>] kvm_s390_vm_set_attr+0x347a/0x3828 [kvm])
   [<001fffff804ecfc0>] kvm_arch_vm_ioctl+0x6c0/0x1998 [kvm]
   [<001fffff804b67e4>] kvm_vm_ioctl+0x51c/0x11a8 [kvm]
   [<00000000008ba572>] do_vfs_ioctl+0x1d2/0xe58
   [<00000000008bb284>] ksys_ioctl+0x8c/0xb8
   [<00000000008bb2e2>] sys_ioctl+0x32/0x40
   [<000000000175552c>] system_call+0x2b8/0x2d8
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
   [<0000000000dbaf60>] __memset+0xc/0xa0

due to ms->dirty_bitmap being NULL, which might crash the host.

Make sure that ms->dirty_bitmap is set before using it or
return -EINVAL otherwise.

Cc: <stable@vger.kernel.org>
Fixes: afdad61615 ("KVM: s390: Fix storage attributes migration with memory slots")
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Link: https://lore.kernel.org/kvm/20190911075218.29153-1-imammedo@redhat.com/
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2019-09-12 13:09:17 +02:00
Linus Torvalds
3120b9a6a3 Merge tag 'ipc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull ipc regression fixes from Arnd Bergmann:
 "Fix ipc regressions from y2038 patches

  These are two regression fixes for bugs that got introduced during the
  system call rework that went into linux-5.1 but only bisected and
  fixed now:

   - One patch affects semtimedop() on many of the less common 32-bit
     architectures, this just needs a single-line bugfix.

   - The other affects only sparc64 and has a slightly more invasive
     workaround to apply the same change to sparc64 that was done to the
     generic code used everywhere else"

* tag 'ipc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  ipc: fix sparc64 ipc() wrapper
  ipc: fix semtimedop for generic 32-bit architectures
2019-09-10 12:34:13 +01:00
Jan Stancek
afa8b475c1 x86/timer: Force PIT initialization when !X86_FEATURE_ARAT
KVM guests with commit c8c4076723 ("x86/timer: Skip PIT initialization on
modern chipsets") applied to guest kernel have been observed to have
unusually higher CPU usage with symptoms of increase in vm exits for HLT
and MSW_WRITE (MSR_IA32_TSCDEADLINE).

This is caused by older QEMUs lacking support for X86_FEATURE_ARAT.  lapic
clock retains CLOCK_EVT_FEAT_C3STOP and nohz stays inactive.  There's no
usable broadcast device either.

Do the PIT initialization if guest CPU lacks X86_FEATURE_ARAT.  On real
hardware it shouldn't matter as ARAT and DEADLINE come together.

Fixes: c8c4076723 ("x86/timer: Skip PIT initialization on modern chipsets")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-09-08 09:01:15 +02:00
Linus Torvalds
950b07c14e Revert "x86/apic: Include the LDR when clearing out APIC registers"
This reverts commit 558682b529.

Chris Wilson reports that it breaks his CPU hotplug test scripts.  In
particular, it breaks offlining and then re-onlining the boot CPU, which
we treat specially (and the BIOS does too).

The symptoms are that we can offline the CPU, but it then does not come
back online again:

    smpboot: CPU 0 is now offline
    smpboot: Booting Node 0 Processor 0 APIC 0x0
    smpboot: do_boot_cpu failed(-1) to wakeup CPU#0

Thomas says he knows why it's broken (my personal suspicion: our magic
handling of the "cpu0_logical_apicid" thing), but for 5.3 the right fix
is to just revert it, since we've never touched the LDR bits before, and
it's not worth the risk to do anything else at this stage.

[ Hotpluging of the boot CPU is special anyway, and should be off by
  default. See the "BOOTPARAM_HOTPLUG_CPU0" config option and the
  cpu0_hotplug kernel parameter.

  In general you should not do it, and it has various known limitations
  (hibernate and suspend require the boot CPU, for example).

  But it should work, even if the boot CPU is special and needs careful
  treatment       - Linus ]

Link: https://lore.kernel.org/lkml/156785100521.13300.14461504732265570003@skylake-alporthouse-com/
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bandan Das <bsd@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-07 14:25:54 -07:00
Arnd Bergmann
fb377eb80c ipc: fix sparc64 ipc() wrapper
Matt bisected a sparc64 specific issue with semctl, shmctl and msgctl
to a commit from my y2038 series in linux-5.1, as I missed the custom
sys_ipc() wrapper that sparc64 uses in place of the generic version that
I patched.

The problem is that the sys_{sem,shm,msg}ctl() functions in the kernel
now do not allow being called with the IPC_64 flag any more, resulting
in a -EINVAL error when they don't recognize the command.

Instead, the correct way to do this now is to call the internal
ksys_old_{sem,shm,msg}ctl() functions to select the API version.

As we generally move towards these functions anyway, change all of
sparc_ipc() to consistently use those in place of the sys_*() versions,
and move the required ksys_*() declarations into linux/syscalls.h

The IS_ENABLED(CONFIG_SYSVIPC) check is required to avoid link
errors when ipc is disabled.

Reported-by: Matt Turner <mattst88@gmail.com>
Fixes: 275f22148e ("ipc: rename old-style shmctl/semctl/msgctl syscalls")
Cc: stable@vger.kernel.org
Tested-by: Matt Turner <mattst88@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-09-07 21:42:25 +02:00
Linus Torvalds
36daa831b5 Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
 "There are three more fixes for this week:

   - The Windows-on-ARM laptops require a workaround to prevent crashing
     at boot from ACPI

   - The Renesas 'draak' board needs one bugfix for the backlight
     regulator

   - Also for Renesas, the 'hihope' board accidentally had its eMMC
     turned off in the 5.3 merge window"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  soc: qcom: geni: Provide parameter error checking
  arm64: dts: renesas: hihope-common: Fix eMMC status
  arm64: dts: renesas: r8a77995: draak: Fix backlight regulator name
2019-09-06 12:53:31 -07:00
Linus Torvalds
13da6ac106 Merge tag 'powerpc-5.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "One fix for a boot hang on some Freescale machines when PREEMPT is
  enabled.

  Two CVE fixes for bugs in our handling of FP registers and
  transactional memory, both of which can result in corrupted FP state,
  or FP state leaking between processes.

  Thanks to: Chris Packham, Christophe Leroy, Gustavo Romero, Michael
  Neuling"

* tag 'powerpc-5.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts
  powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction
  powerpc/64e: Drop stale call to smp_processor_id() which hangs SMP startup
2019-09-06 08:54:45 -07:00
Steve Wahl
e16c2983fb x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors
The last change to this Makefile caused relocation errors when loading
a kdump kernel.  Restore -mcmodel=large (not -mcmodel=kernel),
-ffreestanding, and -fno-zero-initialized-bsss, without reverting to
the former practice of resetting KBUILD_CFLAGS.

Purgatory.ro is a standalone binary that is not linked against the
rest of the kernel.  Its image is copied into an array that is linked
to the kernel, and from there kexec relocates it wherever it desires.

With the previous change to compiler flags, the error "kexec: Overflow
in relocation type 11 value 0x11fffd000" was encountered when trying
to load the crash kernel.  This is from kexec code trying to relocate
the purgatory.ro object.

From the error message, relocation type 11 is R_X86_64_32S.  The
x86_64 ABI says:

  "The R_X86_64_32 and R_X86_64_32S relocations truncate the
   computed value to 32-bits.  The linker must verify that the
   generated value for the R_X86_64_32 (R_X86_64_32S) relocation
   zero-extends (sign-extends) to the original 64-bit value."

This type of relocation doesn't work when kexec chooses to place the
purgatory binary in memory that is not reachable with 32 bit
addresses.

The compiler flag -mcmodel=kernel allows those type of relocations to
be emitted, so revert to using -mcmodel=large as was done before.

Also restore the -ffreestanding and -fno-zero-initialized-bss flags
because they are appropriate for a stand alone piece of object code
which doesn't explicitly zero the bss, and one other report has said
undefined symbols are encountered without -ffreestanding.

These identical compiler flag changes need to happen for every object
that becomes part of the purgatory.ro object, so gather them together
first into PURGATORY_CFLAGS_REMOVE and PURGATORY_CFLAGS, and then
apply them to each of the objects that have C source.  Do not apply
any of these flags to kexec-purgatory.o, which is not part of the
standalone object but part of the kernel proper.

Tested-by: Vaibhav Rustagi <vaibhavrustagi@google.com>
Tested-by: Andreas Smas <andreas@lonelycoder.com>
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: None
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: clang-built-linux@googlegroups.com
Cc: dimitri.sivanich@hpe.com
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Fixes: b059f801a9 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS")
Link: https://lkml.kernel.org/r/20190905202346.GA26595@swahl-linux
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06 09:50:56 +02:00
Linus Torvalds
19e4147a04 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - EFI boot fix for signed kernels

   - an AC flags fix related to UBSAN

   - Hyper-V infinite loop fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hyper-v: Fix overflow bug in fill_gva_list()
  x86/uaccess: Don't leak the AC flags into __get_user() argument evaluation
  x86/boot: Preserve boot_params.secure_boot from sanitizing
2019-09-05 09:47:32 -07:00
Arnd Bergmann
13212a648f Merge tag 'renesas-fixes2-for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into arm/fixes
Second Round of Renesas ARM Based SoC Fixes for v5.3

* RZ/G2M based HiHope main board
  - Re-enabled accidently disabled SDHI3 (eMMC) support

* tag 'renesas-fixes2-for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  arm64: dts: renesas: hihope-common: Fix eMMC status

Link: https://lore.kernel.org/r/cover.1567675986.git.horms+renesas@verge.net.au
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-09-05 17:56:30 +02:00
Arnd Bergmann
d9a2b63ca9 Merge tag 'renesas-fixes-for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into arm/fixes
Renesas ARM Based SoC Fixes for v5.3

* R-Car D3 (r8a77995) based Draak Board
  - Correct backlight regulator name in device tree

* tag 'renesas-fixes-for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  arm64: dts: renesas: r8a77995: draak: Fix backlight regulator name

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-09-04 14:38:47 +02:00
Gustavo Romero
a8318c13e7 powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts
When in userspace and MSR FP=0 the hardware FP state is unrelated to
the current process. This is extended for transactions where if tbegin
is run with FP=0, the hardware checkpoint FP state will also be
unrelated to the current process. Due to this, we need to ensure this
hardware checkpoint is updated with the correct state before we enable
FP for this process.

Unfortunately we get this wrong when returning to a process from a
hardware interrupt. A process that starts a transaction with FP=0 can
take an interrupt. When the kernel returns back to that process, we
change to FP=1 but with hardware checkpoint FP state not updated. If
this transaction is then rolled back, the FP registers now contain the
wrong state.

The process looks like this:
   Userspace:                      Kernel

               Start userspace
                with MSR FP=0 TM=1
                  < -----
   ...
   tbegin
   bne
               Hardware interrupt
                   ---- >
                                    <do_IRQ...>
                                    ....
                                    ret_from_except
                                      restore_math()
				        /* sees FP=0 */
                                        restore_fp()
                                          tm_active_with_fp()
					    /* sees FP=1 (Incorrect) */
                                          load_fp_state()
                                        FP = 0 -> 1
                  < -----
               Return to userspace
                 with MSR TM=1 FP=1
                 with junk in the FP TM checkpoint
   TM rollback
   reads FP junk

When returning from the hardware exception, tm_active_with_fp() is
incorrectly making restore_fp() call load_fp_state() which is setting
FP=1.

The fix is to remove tm_active_with_fp().

tm_active_with_fp() is attempting to handle the case where FP state
has been changed inside a transaction. In this case the checkpointed
and transactional FP state is different and hence we must restore the
FP state (ie. we can't do lazy FP restore inside a transaction that's
used FP). It's safe to remove tm_active_with_fp() as this case is
handled by restore_tm_state(). restore_tm_state() detects if FP has
been using inside a transaction and will set load_fp and call
restore_math() to ensure the FP state (checkpoint and transaction) is
restored.

This is a data integrity problem for the current process as the FP
registers are corrupted. It's also a security problem as the FP
registers from one process may be leaked to another.

Similarly for VMX.

A simple testcase to replicate this will be posted to
tools/testing/selftests/powerpc/tm/tm-poison.c

This fixes CVE-2019-15031.

Fixes: a7771176b4 ("powerpc: Don't enable FP/Altivec if not checkpointed")
Cc: stable@vger.kernel.org # 4.15+
Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190904045529.23002-2-gromero@linux.vnet.ibm.com
2019-09-04 22:31:13 +10:00
Gustavo Romero
8205d5d98e powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction
When we take an FP unavailable exception in a transaction we have to
account for the hardware FP TM checkpointed registers being
incorrect. In this case for this process we know the current and
checkpointed FP registers must be the same (since FP wasn't used
inside the transaction) hence in the thread_struct we copy the current
FP registers to the checkpointed ones.

This copy is done in tm_reclaim_thread(). We use thread->ckpt_regs.msr
to determine if FP was on when in userspace. thread->ckpt_regs.msr
represents the state of the MSR when exiting userspace. This is setup
by check_if_tm_restore_required().

Unfortunatley there is an optimisation in giveup_all() which returns
early if tsk->thread.regs->msr (via local variable `usermsr`) has
FP=VEC=VSX=SPE=0. This optimisation means that
check_if_tm_restore_required() is not called and hence
thread->ckpt_regs.msr is not updated and will contain an old value.

This can happen if due to load_fp=255 we start a userspace process
with MSR FP=1 and then we are context switched out. In this case
thread->ckpt_regs.msr will contain FP=1. If that same process is then
context switched in and load_fp overflows, MSR will have FP=0. If that
process now enters a transaction and does an FP instruction, the FP
unavailable will not update thread->ckpt_regs.msr (the bug) and MSR
FP=1 will be retained in thread->ckpt_regs.msr.  tm_reclaim_thread()
will then not perform the required memcpy and the checkpointed FP regs
in the thread struct will contain the wrong values.

The code path for this happening is:

       Userspace:                      Kernel
                   Start userspace
                    with MSR FP/VEC/VSX/SPE=0 TM=1
                      < -----
       ...
       tbegin
       bne
       fp instruction
                   FP unavailable
                       ---- >
                                        fp_unavailable_tm()
					  tm_reclaim_current()
					    tm_reclaim_thread()
					      giveup_all()
					        return early since FP/VMX/VSX=0
						/* ckpt MSR not updated (Incorrect) */
					      tm_reclaim()
					        /* thread_struct ckpt FP regs contain junk (OK) */
                                              /* Sees ckpt MSR FP=1 (Incorrect) */
					      no memcpy() performed
					        /* thread_struct ckpt FP regs not fixed (Incorrect) */
					  tm_recheckpoint()
					     /* Put junk in hardware checkpoint FP regs */
                                         ....
                      < -----
                   Return to userspace
                     with MSR TM=1 FP=1
                     with junk in the FP TM checkpoint
       TM rollback
       reads FP junk

This is a data integrity problem for the current process as the FP
registers are corrupted. It's also a security problem as the FP
registers from one process may be leaked to another.

This patch moves up check_if_tm_restore_required() in giveup_all() to
ensure thread->ckpt_regs.msr is updated correctly.

A simple testcase to replicate this will be posted to
tools/testing/selftests/powerpc/tm/tm-poison.c

Similarly for VMX.

This fixes CVE-2019-15030.

Fixes: f48e91e87e ("powerpc/tm: Fix FP and VMX register corruption")
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190904045529.23002-1-gromero@linux.vnet.ibm.com
2019-09-04 22:31:13 +10:00
Tianyu Lan
4030b4c585 x86/hyper-v: Fix overflow bug in fill_gva_list()
When the 'start' parameter is >=  0xFF000000 on 32-bit
systems, or >= 0xFFFFFFFF'FF000000 on 64-bit systems,
fill_gva_list() gets into an infinite loop.

With such inputs, 'cur' overflows after adding HV_TLB_FLUSH_UNIT
and always compares as less than end.  Memory is filled with
guest virtual addresses until the system crashes.

Fix this by never incrementing 'cur' to be larger than 'end'.

Reported-by: Jong Hyun Park <park.jonghyun@yonsei.ac.kr>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2ffd9e33ce ("x86/hyper-v: Use hypercall for remote TLB flush")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-02 19:57:19 +02:00
Peter Zijlstra
9b8bd476e7 x86/uaccess: Don't leak the AC flags into __get_user() argument evaluation
Identical to __put_user(); the __get_user() argument evalution will too
leak UBSAN crud into the __uaccess_begin() / __uaccess_end() region.
While uncommon this was observed to happen for:

  drivers/xen/gntdev.c: if (__get_user(old_status, batch->status[i]))

where UBSAN added array bound checking.

This complements commit:

  6ae865615f ("x86/uaccess: Dont leak the AC flag into __put_user() argument evaluation")

Tested-by Sedat Dilek <sedat.dilek@gmail.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: broonie@kernel.org
Cc: sfr@canb.auug.org.au
Cc: akpm@linux-foundation.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: mhocko@suse.cz
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20190829082445.GM2369@hirez.programming.kicks-ass.net
2019-09-02 14:22:38 +02:00
John S. Gruber
29d9a0b507 x86/boot: Preserve boot_params.secure_boot from sanitizing
Commit

  a90118c445 ("x86/boot: Save fields explicitly, zero out everything else")

now zeroes the secure boot setting information (enabled/disabled/...)
passed by the boot loader or by the kernel's EFI handover mechanism.

The problem manifests itself with signed kernels using the EFI handoff
protocol with grub and the kernel loses the information whether secure
boot is enabled in the firmware, i.e., the log message "Secure boot
enabled" becomes "Secure boot could not be determined".

efi_main() arch/x86/boot/compressed/eboot.c sets this field early but it
is subsequently zeroed by the above referenced commit.

Include boot_params.secure_boot in the preserve field list.

 [ bp: restructure commit message and massage. ]

Fixes: a90118c445 ("x86/boot: Save fields explicitly, zero out everything else")
Signed-off-by: John S. Gruber <JohnSGruber@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: stable <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/CAPotdmSPExAuQcy9iAHqX3js_fc4mMLQOTr5RBGvizyCOPcTQQ@mail.gmail.com
2019-09-02 09:17:45 +02:00
Linus Torvalds
9f159ae07f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for x86:

   - Fix the bogus detection of 32bit user mode for uretprobes which
     caused corruption of the user return address resulting in
     application crashes. In the uprobes handler in_ia32_syscall() is
     obviously always returning false on a 64bit kernel. Use
     user_64bit_mode() instead which works correctly.

   - Prevent large page splitting when ftrace flips RW/RO on the kernel
     text which caused iTLB performance issues. Ftrace wants to be
     converted to text_poke() which avoids the problem, but for now
     allow large page preservation in the static protections check when
     the change request spawns a full large page.

   - Prevent arch_dynirq_lower_bound() from returning 0 when the IOAPIC
     is configured via device tree. In the device tree case the GSI 1:1
     mapping is meaningless therefore the lower bound which protects the
     GSI range on ACPI machines is irrelevant. Return the lower bound
     which the core hands to the function instead of blindly returning 0
     which causes the core to allocate the invalid virtual interupt
     number 0 which in turn prevents all drivers from allocating and
     requesting an interrupt.

   - Remove the bogus initialization of LDR and DFR in the 32bit bigsmp
     APIC driver. That uses physical destination mode where LDR/DFR are
     ignored, but the initialization and the missing clear of LDR caused
     the APIC to be left in a inconsistent state on kexec/reboot.

   - Clear LDR when clearing the APIC registers so the APIC is in a well
     defined state.

   - Initialize variables proper in the find_trampoline_placement()
     code.

   - Silence GCC( build warning for the real mode part of the build"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text
  x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning
  x86/boot/compressed/64: Fix missing initialization in find_trampoline_placement()
  x86/apic: Include the LDR when clearing out APIC registers
  x86/apic: Do not initialize LDR and DFR for bigsmp
  uprobes/x86: Fix detection of 32-bit user mode
  x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines
2019-09-01 11:21:57 -07:00
Linus Torvalds
5fb181cba0 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Two fixes for perf x86 hardware implementations:

   - Restrict the period on Nehalem machines to prevent perf from
     hogging the CPU

   - Prevent the AMD IBS driver from overwriting the hardwre controlled
     and pre-seeded reserved bits (0-6) in the count register which
     caused a sample bias for dispatched micro-ops"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops
  perf/x86/intel: Restrict period on Nehalem
2019-09-01 11:09:42 -07:00
Linus Torvalds
95381debd9 Merge tag 'trace-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "Small fixes and minor cleanups for tracing:

   - Make exported ftrace function not static

   - Fix NULL pointer dereference in reading probes as they are created

   - Fix NULL pointer dereference in k/uprobe clean up path

   - Various documentation fixes"

* tag 'trace-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Correct kdoc formats
  ftrace/x86: Remove mcount() declaration
  tracing/probe: Fix null pointer dereference
  tracing: Make exported ftrace_set_clr_event non-static
  ftrace: Check for successful allocation of hash
  ftrace: Check for empty hash and comment the race with registering probes
  ftrace: Fix NULL pointer dereference in t_probe_next()
2019-08-31 09:15:25 -07:00
Linus Torvalds
7fb86707cc Merge tag 'riscv/for-v5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fix from Paul Walmsley:
 "One significant fix for 32-bit RISC-V systems:

  Fix the RV32 memory map to prevent userspace from corrupting the
  FIXMAP area. Without this patch, the system can crash very early
  during the boot"

* tag 'riscv/for-v5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Fix FIXMAP area corruption on RV32 systems
2019-08-31 09:02:36 -07:00
Linus Torvalds
834354f642 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
 "PPC:
   - Fix bug which could leave locks held in the host on return to a
     guest.

  x86:
   - Prevent infinitely looping emulation of a failing syscall while
     single stepping.

   - Do not crash the host when nesting is disabled"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Don't update RIP or do single-step on faulting emulation
  KVM: x86: hyper-v: don't crash on KVM_GET_SUPPORTED_HV_CPUID when kvm_intel.nested is disabled
  KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling
2019-08-31 08:51:48 -07:00
Jisheng Zhang
2e81562731 ftrace/x86: Remove mcount() declaration
Commit 562e14f722 ("ftrace/x86: Remove mcount support") removed the
support for using mcount, so we could remove the mcount() declaration
to clean up.

Link: http://lkml.kernel.org/r/20190826170150.10f101ba@xhacker.debian

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31 06:51:55 -04:00
Fabrizio Castro
ae688e1720 arm64: dts: renesas: hihope-common: Fix eMMC status
SDHI3 got accidentally disabled while adding USB 2.0 support,
this patch fixes it.

Fixes: 734d277f41 ("arm64: dts: renesas: hihope-common: Add USB 2.0 support")
Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2019-08-31 11:23:18 +02:00
Linus Torvalds
0a51b08fb3 Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "Three fixes for ARM this time around:

   - A fix for update_sections_early() to cope with NULL ->mm pointers.

   - A correction to the backtrace code to allow proper backtraces.

   - Reinforcement of pfn_valid() with PFNs >= 4GiB"

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8901/1: add a criteria for pfn_valid of arm
  ARM: 8897/1: check stmfd instruction using right shift
  ARM: 8874/1: mm: only adjust sections of valid mm structures
2019-08-30 11:58:02 -07:00
Linus Torvalds
e8d6766f3c Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
 "The majority of the fixes this time are for OMAP hardware, here is a
  breakdown of the significant changes:

  Various device tree bug fixes:
   - TI am57xx boards need a voltage level fix to avoid damaging SD
     cards
   - vf610-bk4 fails to detect its flash due to an incorrect description
   - meson-g12a USB phy configuration fails
   - meson-g12b reboot should not power off the SD card
   - Some corrections for apparently harmless differences from the
     documentation.

  Regression fixes:
   - ams-delta FIQ interrupts broke in 5.3
   - TI am3/am4 mmc controllers broke in 5.2

  The logic_pio driver (used on some Huawei ARM servers) got a few bug
  fixes for reliability.

  And a couple of compile-time warning fixes"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (26 commits)
  soc: ixp4xx: Protect IXP4xx SoC drivers by ARCH_IXP4XX || COMPILE_TEST
  soc: ti: pm33xx: Make two symbols static
  soc: ti: pm33xx: Fix static checker warnings
  ARM: OMAP: dma: Mark expected switch fall-throughs
  ARM: dts: Fix incomplete dts data for am3 and am4 mmc
  bus: ti-sysc: Simplify cleanup upon failures in sysc_probe()
  ARM: OMAP1: ams-delta-fiq: Fix missing irq_ack
  ARM: dts: dra74x: Fix iodelay configuration for mmc3
  ARM: dts: am335x: Fix UARTs length
  ARM: OMAP2+: Fix omap4 errata warning on other SoCs
  bus: hisi_lpc: Add .remove method to avoid driver unbind crash
  bus: hisi_lpc: Unregister logical PIO range to avoid potential use-after-free
  lib: logic_pio: Add logic_pio_unregister_range()
  lib: logic_pio: Avoid possible overlap for unregistering regions
  lib: logic_pio: Fix RCU usage
  arm64: dts: amlogic: odroid-n2: keep SD card regulator always on
  arm64: dts: meson-g12a-sei510: enable IR controller
  arm64: dts: meson-g12a: add missing dwc2 phy-names
  ARM: dts: vf610-bk4: Fix qspi node description
  ARM: dts: Fix incorrect dcan register mapping for am3, am4 and dra7
  ...
2019-08-30 10:53:12 -07:00
Kim Phillips
0f4cd769c4 perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops
When counting dispatched micro-ops with cnt_ctl=1, in order to prevent
sample bias, IBS hardware preloads the least significant 7 bits of
current count (IbsOpCurCnt) with random values, such that, after the
interrupt is handled and counting resumes, the next sample taken
will be slightly perturbed.

The current count bitfield is in the IBS execution control h/w register,
alongside the maximum count field.

Currently, the IBS driver writes that register with the maximum count,
leaving zeroes to fill the current count field, thereby overwriting
the random bits the hardware preloaded for itself.

Fix the driver to actually retain and carry those random bits from the
read of the IBS control register, through to its write, instead of
overwriting the lower current count bits with zeroes.

Tested with:

perf record -c 100001 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0 <workload>

'perf annotate' output before:

 15.70  65:   addsd     %xmm0,%xmm1
 17.30        add       $0x1,%rax
 15.88        cmp       %rdx,%rax
              je        82
 17.32  72:   test      $0x1,%al
              jne       7c
  7.52        movapd    %xmm1,%xmm0
  5.90        jmp       65
  8.23  7c:   sqrtsd    %xmm1,%xmm0
 12.15        jmp       65

'perf annotate' output after:

 16.63  65:   addsd     %xmm0,%xmm1
 16.82        add       $0x1,%rax
 16.81        cmp       %rdx,%rax
              je        82
 16.69  72:   test      $0x1,%al
              jne       7c
  8.30        movapd    %xmm1,%xmm0
  8.13        jmp       65
  8.24  7c:   sqrtsd    %xmm1,%xmm0
  8.39        jmp       65

Tested on Family 15h and 17h machines.

Machines prior to family 10h Rev. C don't have the RDWROPCNT capability,
and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't
affect their operation.

It is unknown why commit db98c5faf8 ("perf/x86: Implement 64-bit
counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt
field; the number of preloaded random bits has always been 7, AFAICT.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Arnaldo Carvalho de Melo" <acme@kernel.org>
Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: "Namhyung Kim" <namhyung@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20190826195730.30614-1-kim.phillips@amd.com
2019-08-30 14:27:47 +02:00
Josh Hunt
44d3bbb6f5 perf/x86/intel: Restrict period on Nehalem
We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in
some cases when using perf:

perfevents: irq loop stuck!
WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530
...
RIP: 0010:intel_pmu_handle_irq+0x37b/0x530
...
Call Trace:
<NMI>
? perf_event_nmi_handler+0x2e/0x50
? intel_pmu_save_and_restart+0x50/0x50
perf_event_nmi_handler+0x2e/0x50
nmi_handle+0x6e/0x120
default_do_nmi+0x3e/0x100
do_nmi+0x102/0x160
end_repeat_nmi+0x16/0x50
...
? native_write_msr+0x6/0x20
? native_write_msr+0x6/0x20
</NMI>
intel_pmu_enable_event+0x1ce/0x1f0
x86_pmu_start+0x78/0xa0
x86_pmu_enable+0x252/0x310
__perf_event_task_sched_in+0x181/0x190
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
finish_task_switch+0x158/0x260
__schedule+0x2f6/0x840
? hrtimer_start_range_ns+0x153/0x210
schedule+0x32/0x80
schedule_hrtimeout_range_clock+0x8a/0x100
? hrtimer_init+0x120/0x120
ep_poll+0x2f7/0x3a0
? wake_up_q+0x60/0x60
do_epoll_wait+0xa9/0xc0
__x64_sys_epoll_wait+0x1a/0x20
do_syscall_64+0x4e/0x110
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fdeb1e96c03
...
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: acme@kernel.org
Cc: Josh Hunt <johunt@akamai.com>
Cc: bpuranda@akamai.com
Cc: mingo@redhat.com
Cc: jolsa@redhat.com
Cc: tglx@linutronix.de
Cc: namhyung@kernel.org
Cc: alexander.shishkin@linux.intel.com
Link: https://lkml.kernel.org/r/1566256411-18820-1-git-send-email-johunt@akamai.com
2019-08-30 14:27:47 +02:00
Thomas Gleixner
7af0145067 x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text
ftrace does not use text_poke() for enabling trace functionality. It uses
its own mechanism and flips the whole kernel text to RW and back to RO.

The CPA rework removed a loop based check of 4k pages which tried to
preserve a large page by checking each 4k page whether the change would
actually cover all pages in the large page.

This resulted in endless loops for nothing as in testing it turned out that
it actually never preserved anything. Of course testing missed to include
ftrace, which is the one and only case which benefitted from the 4k loop.

As a consequence enabling function tracing or ftrace based kprobes results
in a full 4k split of the kernel text, which affects iTLB performance.

The kernel RO protection is the only valid case where this can actually
preserve large pages.

All other static protections (RO data, data NX, PCI, BIOS) are truly
static.  So a conflict with those protections which results in a split
should only ever happen when a change of memory next to a protected region
is attempted. But these conflicts are rightfully splitting the large page
to preserve the protected regions. In fact a change to the protected
regions itself is a bug and is warned about.

Add an exception for the static protection check for kernel text RO when
the to be changed region spawns a full large page which allows to preserve
the large mappings. This also prevents the syslog to be spammed about CPA
violations when ftrace is used.

The exception needs to be removed once ftrace switched over to text_poke()
which avoids the whole issue.

Fixes: 585948f4f6 ("x86/mm/cpa: Avoid the 4k pages check completely")
Reported-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282355340.1938@nanos.tec.linutronix.de
2019-08-29 20:48:44 +02:00
Linus Torvalds
4a64489cf8 Merge tag 'Wimplicit-fallthrough-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull fallthrough fixes from Gustavo A. R. Silva:
 "Fix fall-through warnings on arc and nds32 for multiple
  configurations"

* tag 'Wimplicit-fallthrough-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  nds32: Mark expected switch fall-throughs
  ARC: unwind: Mark expected switch fall-through
2019-08-29 09:28:25 -07:00
Gustavo A. R. Silva
7c9eb2dbd7 nds32: Mark expected switch fall-throughs
Mark switch cases where we are expecting to fall through.

This patch fixes the following warnings (Building: allmodconfig nds32):

include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/nds32/kernel/signal.c:362:20: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/nds32/kernel/signal.c:315:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-08-29 11:06:56 -05:00
Gustavo A. R. Silva
00a0c8451a ARC: unwind: Mark expected switch fall-through
Mark switch cases where we are expecting to fall through.

This patch fixes the following warnings (Building: haps_hs_defconfig arc):

arch/arc/kernel/unwind.c: In function ‘read_pointer’:
./include/linux/compiler.h:328:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
  do {        \
     ^
./include/linux/compiler.h:338:2: note: in expansion of macro ‘__compiletime_assert’
  __compiletime_assert(condition, msg, prefix, suffix)
  ^~~~~~~~~~~~~~~~~~~~
./include/linux/compiler.h:350:2: note: in expansion of macro ‘_compiletime_assert’
  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
  ^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
 #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                     ^~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
  ^~~~~~~~~~~~~~~~
arch/arc/kernel/unwind.c:573:3: note: in expansion of macro ‘BUILD_BUG_ON’
   BUILD_BUG_ON(sizeof(u32) != sizeof(value));
   ^~~~~~~~~~~~
arch/arc/kernel/unwind.c:575:2: note: here
  case DW_EH_PE_native:
  ^~~~

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-08-29 11:05:17 -05:00
zhaoyang
5b3efa4f14 ARM: 8901/1: add a criteria for pfn_valid of arm
pfn_valid can be wrong when parsing a invalid pfn whose phys address
exceeds BITS_PER_LONG as the MSB will be trimed when shifted.

The issue originally arise from bellowing call stack, which corresponding to
an access of the /proc/kpageflags from userspace with a invalid pfn parameter
and leads to kernel panic.

[46886.723249] c7 [<c031ff98>] (stable_page_flags) from [<c03203f8>]
[46886.723264] c7 [<c0320368>] (kpageflags_read) from [<c0312030>]
[46886.723280] c7 [<c0311fb0>] (proc_reg_read) from [<c02a6e6c>]
[46886.723290] c7 [<c02a6e24>] (__vfs_read) from [<c02a7018>]
[46886.723301] c7 [<c02a6f74>] (vfs_read) from [<c02a778c>]
[46886.723315] c7 [<c02a770c>] (SyS_pread64) from [<c0108620>]
(ret_fast_syscall+0x0/0x28)

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-28 23:30:21 +01:00
Anup Patel
a256f2e329 RISC-V: Fix FIXMAP area corruption on RV32 systems
Currently, various virtual memory areas of Linux RISC-V are organized
in increasing order of their virtual addresses is as follows:
1. User space area (This is lowest area and starts at 0x0)
2. FIXMAP area
3. VMALLOC area
4. Kernel area (This is highest area and starts at PAGE_OFFSET)

The maximum size of user space aread is represented by TASK_SIZE.

On RV32 systems, TASK_SIZE is defined as VMALLOC_START which causes the
user space area to overlap the FIXMAP area. This allows user space apps
to potentially corrupt the FIXMAP area and kernel OF APIs will crash
whenever they access corrupted FDT in the FIXMAP area.

On RV64 systems, TASK_SIZE is set to fixed 256GB and no other areas
happen to overlap so we don't see any FIXMAP area corruptions.

This patch fixes FIXMAP area corruption on RV32 systems by setting
TASK_SIZE to FIXADDR_START. We also move FIXADDR_TOP, FIXADDR_SIZE,
and FIXADDR_START defines to asm/pgtable.h so that we can avoid cyclic
header includes.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Tested-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-08-28 15:30:12 -07:00
Linus Torvalds
42e0e95474 x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning
One of the very few warnings I have in the current build comes from
arch/x86/boot/edd.c, where I get the following with a gcc9 build:

   arch/x86/boot/edd.c: In function ‘query_edd’:
   arch/x86/boot/edd.c:148:11: warning: taking address of packed member of ‘struct boot_params’ may result in an unaligned pointer value [-Waddress-of-packed-member]
     148 |  mbrptr = boot_params.edd_mbr_sig_buffer;
         |           ^~~~~~~~~~~

This warning triggers because we throw away all the CFLAGS and then make
a new set for REALMODE_CFLAGS, so the -Wno-address-of-packed-member we
added in the following commit is not present:

  6f303d6053 ("gcc-9: silence 'address-of-packed-member' warning")

The simplest solution for now is to adjust the warning for this version
of CFLAGS as well, but it would definitely make sense to examine whether
REALMODE_CFLAGS could be derived from CFLAGS, so that it picks up changes
in the compiler flags environment automatically.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-28 17:31:31 +02:00
Sean Christopherson
75ee23b30d KVM: x86: Don't update RIP or do single-step on faulting emulation
Don't advance RIP or inject a single-step #DB if emulation signals a
fault.  This logic applies to all state updates that are conditional on
clean retirement of the emulation instruction, e.g. updating RFLAGS was
previously handled by commit 38827dbd3f ("KVM: x86: Do not update
EFLAGS on faulting emulation").

Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
ctxt->_eip until emulation "retires" anyways.  Skipping #DB injection
fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
invalid state with EFLAGS.TF=1 would loop indefinitely due to emulation
overwriting the #UD with #DB and thus restarting the bad SYSCALL over
and over.

Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski <luto@kernel.org>
Fixes: 663f4c61b8 ("KVM: x86: handle singlestep during emulation")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-08-27 20:59:04 +02:00
Vitaly Kuznetsov
ea1529873a KVM: x86: hyper-v: don't crash on KVM_GET_SUPPORTED_HV_CPUID when kvm_intel.nested is disabled
If kvm_intel is loaded with nested=0 parameter an attempt to perform
KVM_GET_SUPPORTED_HV_CPUID results in OOPS as nested_get_evmcs_version hook
in kvm_x86_ops is NULL (we assign it in nested_vmx_hardware_setup() and
this only happens in case nested is enabled).

Check that kvm_x86_ops->nested_get_evmcs_version is not NULL before
calling it. With this, we can remove the stub from svm as it is no
longer needed.

Cc: <stable@vger.kernel.org>
Fixes: e2e871ab2f ("x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helper")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-08-27 20:59:04 +02:00
Linus Torvalds
6525771f58 Merge tag 'arc-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:

 - support for Edge Triggered IRQs in ARC IDU intc

 - other fixes here and there

* tag 'arc-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  arc: prefer __section from compiler_attributes.h
  dt-bindings: IDU-intc: Add support for edge-triggered interrupts
  dt-bindings: IDU-intc: Clean up documentation
  ARCv2: IDU-intc: Add support for edge-triggered interrupts
  ARC: unwind: Mark expected switch fall-throughs
  ARC: [plat-hsdk]: allow to switch between AXI DMAC port configurations
  ARC: fix typo in setup_dma_ops log message
  ARCv2: entry: early return from exception need not clear U & DE bits
2019-08-27 10:50:27 -07:00
Linus Torvalds
452a04441b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Use 32-bit index for tails calls in s390 bpf JIT, from Ilya
    Leoshkevich.

 2) Fix missed EPOLLOUT events in TCP, from Eric Dumazet. Same fix for
    SMC from Jason Baron.

 3) ipv6_mc_may_pull() should return 0 for malformed packets, not
    -EINVAL. From Stefano Brivio.

 4) Don't forget to unpin umem xdp pages in error path of
    xdp_umem_reg(). From Ivan Khoronzhuk.

 5) Fix sta object leak in mac80211, from Johannes Berg.

 6) Fix regression by not configuring PHYLINK on CPU port of bcm_sf2
    switches. From Florian Fainelli.

 7) Revert DMA sync removal from r8169 which was causing regressions on
    some MIPS Loongson platforms. From Heiner Kallweit.

 8) Use after free in flow dissector, from Jakub Sitnicki.

 9) Fix NULL derefs of net devices during ICMP processing across
    collect_md tunnels, from Hangbin Liu.

10) proto_register() memory leaks, from Zhang Lin.

11) Set NLM_F_MULTI flag in multipart netlink messages consistently,
    from John Fastabend.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
  r8152: Set memory to all 0xFFs on failed reg reads
  openvswitch: Fix conntrack cache with timeout
  ipv4: mpls: fix mpls_xmit for iptunnel
  nexthop: Fix nexthop_num_path for blackhole nexthops
  net: rds: add service level support in rds-info
  net: route dump netlink NLM_F_MULTI flag missing
  s390/qeth: reject oversized SNMP requests
  sock: fix potential memory leak in proto_register()
  MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT
  xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode
  ipv4/icmp: fix rt dst dev null pointer dereference
  openvswitch: Fix log message in ovs conntrack
  bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0
  bpf: fix use after free in prog symbol exposure
  bpf: fix precision tracking in presence of bpf2bpf calls
  flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH
  Revert "r8169: remove not needed call to dma_sync_single_for_device"
  ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev
  net/ncsi: Fix the payload copying for the request coming from Netlink
  qed: Add cleanup in qed_slowpath_start()
  ...
2019-08-27 10:12:48 -07:00
Radim Krčmář
c91ff72142 Merge tag 'kvm-ppc-fixes-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
KVM/PPC fix for 5.3

- Fix bug which could leave locks locked in the host on return
  to a guest.
2019-08-27 16:02:48 +02:00
Kirill A. Shutemov
c96e8483cb x86/boot/compressed/64: Fix missing initialization in find_trampoline_placement()
Gustavo noticed that 'new' can be left uninitialized if 'bios_start'
happens to be less or equal to 'entry->addr + entry->size'.

Initialize the variable at the begin of the iteration to the current value
of 'bios_start'.

Fixes: 0a46fff2f9 ("x86/boot/compressed/64: Fix boot on machines with broken E820 table")
Reported-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190826133326.7cxb4vbmiawffv2r@box
2019-08-27 10:46:27 +02:00
Alexey Kardashevskiy
ddfd151f3d KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling
H_PUT_TCE_INDIRECT handlers receive a page with up to 512 TCEs from
a guest. Although we verify correctness of TCEs before we do anything
with the existing tables, there is a small window when a check in
kvmppc_tce_validate might pass and right after that the guest alters
the page of TCEs, causing an early exit from the handler and leaving
srcu_read_lock(&vcpu->kvm->srcu) (virtual mode) or lock_rmap(rmap)
(real mode) locked.

This fixes the bug by jumping to the common exit code with an appropriate
unlock.

Cc: stable@vger.kernel.org # v4.11+
Fixes: 121f80ba68 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-08-27 10:59:30 +10:00