Commit Graph

1013780 Commits

Author SHA1 Message Date
Nicholas Piggin
aaae8c7900 KVM: PPC: Book3S HV: Remove support for dependent threads mode on P9
Dependent-threads mode is the normal KVM mode for pre-POWER9 SMT
processors, where all threads in a core (or subcore) would run the same
partition at the same time, or they would run the host.

This design was mandated by MMU state that is shared between threads in
a processor, so the synchronisation point is in hypervisor real-mode
that has essentially no shared state, so it's safe for multiple threads
to gather and switch to the correct mode.

It is implemented by having the host unplug all secondary threads and
always run in SMT1 mode, and host QEMU threads essentially represent
virtual cores that wake these secondary threads out of unplug when the
ioctl is called to run the guest. This happens via a side-path that is
mostly invisible to the rest of the Linux host and the secondary threads
still appear to be unplugged.

POWER9 / ISA v3.0 has a more flexible MMU design that is independent
per-thread and allows a much simpler KVM implementation. Before the new
"P9 fast path" was added that began to take advantage of this, POWER9
support was implemented in the existing path which has support to run
in the dependent threads mode. So it was not much work to add support to
run POWER9 in this dependent threads mode.

The mode is not required by the POWER9 MMU (although "mixed-mode" hash /
radix MMU limitations of early processors were worked around using this
mode). But it is one way to run SMT guests without running different
guests or guest and host on different threads of the same core, so it
could avoid or reduce some SMT attack surfaces without turning off SMT
entirely.

This security feature has some real, if indeterminate, value. However
the old path is lagging in features (nested HV), and with this series
the new P9 path adds remaining missing features (radix prefetch bug
and hash support, in later patches), so POWER9 dependent threads mode
support would be the only remaining reason to keep that code in and keep
supporting POWER9/POWER10 in the old path. So here we make the call to
drop this feature.

Remove dependent threads mode support for POWER9 and above processors.
Systems can still achieve this security by disabling SMT entirely, but
that would generally come at a larger performance cost for guests.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-23-npiggin@gmail.com
2021-06-10 22:12:14 +10:00
Nicholas Piggin
2e1ae9cd56 KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMU
Rather than partition the guest PID space + flush a rogue guest PID to
work around this problem, instead fix it by always disabling the MMU when
switching in or out of guest MMU context in HV mode.

This may be a bit less efficient, but it is a lot less complicated and
allows the P9 path to trivally implement the workaround too. Newer CPUs
are not subject to this issue.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-22-npiggin@gmail.com
2021-06-10 22:12:14 +10:00
Nicholas Piggin
41f7799176 KVM: PPC: Book3S HV P9: Switch to guest MMU context as late as possible
Move MMU context switch as late as reasonably possible to minimise code
running with guest context switched in. This becomes more important when
this code may run in real-mode, with later changes.

Move WARN_ON as early as possible so program check interrupts are less
likely to tangle everything up.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-21-npiggin@gmail.com
2021-06-10 22:12:14 +10:00
Nicholas Piggin
edba6aff4f KVM: PPC: Book3S HV P9: Add helpers for OS SPR handling
This is a first step to wrapping supervisor and user SPR saving and
loading up into helpers, which will then be called independently in
bare metal and nested HV cases in order to optimise SPR access.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-20-npiggin@gmail.com
2021-06-10 22:12:14 +10:00
Nicholas Piggin
68e3baaca8 KVM: PPC: Book3S HV P9: Move SPR loading after expiry time check
This is wasted work if the time limit is exceeded.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-19-npiggin@gmail.com
2021-06-10 22:12:14 +10:00
Nicholas Piggin
a32ed1bb70 KVM: PPC: Book3S HV P9: Improve exit timing accounting coverage
The C conversion caused exit timing to become a bit cramped. Expand it
to cover more of the entry and exit code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-18-npiggin@gmail.com
2021-06-10 22:12:13 +10:00
Nicholas Piggin
6d770e3fe9 KVM: PPC: Book3S HV P9: Read machine check registers while MSR[RI] is 0
SRR0/1, DAR, DSISR must all be protected from machine check which can
clobber them. Ensure MSR[RI] is clear while they are live.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-17-npiggin@gmail.com
2021-06-10 22:12:13 +10:00
Nicholas Piggin
c00366e237 KVM: PPC: Book3S HV P9: inline kvmhv_load_hv_regs_and_go into __kvmhv_vcpu_entry_p9
Now the initial C implementation is done, inline more HV code to make
rearranging things easier.

And rename __kvmhv_vcpu_entry_p9 to drop the leading underscores as it's
now C, and is now a more complete vcpu entry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-16-npiggin@gmail.com
2021-06-10 22:12:13 +10:00
Nicholas Piggin
89d35b2391 KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C
Almost all logic is moved to C, by introducing a new in_guest mode for
the P9 path that branches very early in the KVM interrupt handler to P9
exit code.

The main P9 entry and exit assembly is now only about 160 lines of low
level stack setup and register save/restore, plus a bad-interrupt
handler.

There are two motivations for this, the first is just make the code more
maintainable being in C. The second is to reduce the amount of code
running in a special KVM mode, "realmode". In quotes because with radix
it is no longer necessarily real-mode in the MMU, but it still has to be
treated specially because it may be in real-mode, and has various
important registers like PID, DEC, TB, etc set to guest. This is hostile
to the rest of Linux and can't use arbitrary kernel functionality or be
instrumented well.

This initial patch is a reasonably faithful conversion of the asm code,
but it does lack any loop to return quickly back into the guest without
switching out of realmode in the case of unimportant or easily handled
interrupts. As explained in previous changes, handling HV interrupts
very quickly in this low level realmode is not so important for P9
performance, and are important to avoid for security, observability,
debugability reasons.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-15-npiggin@gmail.com
2021-06-10 22:12:13 +10:00
Nicholas Piggin
9dc2babc18 KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path
In the interest of minimising the amount of code that is run in
"real-mode", don't handle hcalls in real mode in the P9 path. This
requires some new handlers for H_CEDE and xics-on-xive to be added
before xive is pulled or cede logic is checked.

This introduces a change in radix guest behaviour where radix guests
that execute 'sc 1' in userspace now get a privilege fault whereas
previously the 'sc 1' would be reflected as a syscall interrupt to the
guest kernel. That reflection is only required for hash guests that run
PR KVM.

Background:

In POWER8 and earlier processors, it is very expensive to exit from the
HV real mode context of a guest hypervisor interrupt, and switch to host
virtual mode. On those processors, guest->HV interrupts reach the
hypervisor with the MMU off because the MMU is loaded with guest context
(LPCR, SDR1, SLB), and the other threads in the sub-core need to be
pulled out of the guest too. Then the primary must save off guest state,
invalidate SLB and ERAT, and load up host state before the MMU can be
enabled to run in host virtual mode (~= regular Linux mode).

Hash guests also require a lot of hcalls to run due to the nature of the
MMU architecture and paravirtualisation design. The XICS interrupt
controller requires hcalls to run.

So KVM traditionally tries hard to avoid the full exit, by handling
hcalls and other interrupts in real mode as much as possible.

By contrast, POWER9 has independent MMU context per-thread, and in radix
mode the hypervisor is in host virtual memory mode when the HV interrupt
is taken. Radix guests do not require significant hcalls to manage their
translations, and xive guests don't need hcalls to handle interrupts. So
it's much less important for performance to handle hcalls in real mode on
POWER9.

One caveat is that the TCE hcalls are performance critical, real-mode
variants introduced for POWER8 in order to achieve 10GbE performance.
Real mode TCE hcalls were found to be less important on POWER9, which
was able to drive 40GBe networking without them (using the virt mode
hcalls) but performance is still important. These hcalls will benefit
from subsequent guest entry/exit optimisation including possibly a
faster "partial exit" that does not entirely switch to host context to
handle the hcall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-14-npiggin@gmail.com
2021-06-10 22:12:13 +10:00
Nicholas Piggin
48013cbc50 KVM: PPC: Book3S HV P9: Move radix MMU switching instructions together
Switching the MMU from radix<->radix mode is tricky particularly as the
MMU can remain enabled and requires a certain sequence of SPR updates.
Move these together into their own functions.

This also includes the radix TLB check / flush because it's tied in to
MMU switching due to tlbiel getting LPID from LPIDR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-13-npiggin@gmail.com
2021-06-10 22:12:13 +10:00
Nicholas Piggin
09512c2916 KVM: PPC: Book3S HV P9: Move xive vcpu context management into kvmhv_p9_guest_entry
Move the xive management up so the low level register switching can be
pushed further down in a later patch. XIVE MMIO CI operations can run in
higher level code with machine checks, tracing, etc., available.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-12-npiggin@gmail.com
2021-06-10 22:12:13 +10:00
Nicholas Piggin
6ffe2c6e6d KVM: PPC: Book3S HV P9: Reduce irq_work vs guest decrementer races
irq_work's use of the DEC SPR is racy with guest<->host switch and guest
entry which flips the DEC interrupt to guest, which could lose a host
work interrupt.

This patch closes one race, and attempts to comment another class of
races.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-11-npiggin@gmail.com
2021-06-10 22:12:13 +10:00
Nicholas Piggin
413679e73b KVM: PPC: Book3S HV P9: Move setting HDEC after switching to guest LPCR
LPCR[HDICE]=0 suppresses hypervisor decrementer exceptions on some
processors, so it must be enabled before HDEC is set.

Rather than set it in the host LPCR then setting HDEC, move the HDEC
update to after the guest MMU context (including LPCR) is loaded.
There shouldn't be much concern with delaying HDEC by some 10s or 100s
of nanoseconds by setting it a bit later.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-10-npiggin@gmail.com
2021-06-10 22:12:12 +10:00
Nicholas Piggin
023c3c96ca KVM: PPC: Book3S HV P9: implement kvmppc_xive_pull_vcpu in C
This is more symmetric with kvmppc_xive_push_vcpu, and has the advantage
that it runs with the MMU on.

The extra test added to the asm will go away with a future change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-9-npiggin@gmail.com
2021-06-10 22:12:12 +10:00
Nicholas Piggin
e2762743c6 KVM: PPC: Book3S 64: Minimise hcall handler calling convention differences
This sets up the same calling convention from interrupt entry to
KVM interrupt handler for system calls as exists for other interrupt
types.

This is a better API, it uses a save area rather than SPR, and it has
more registers free to use. Using a single common API helps maintain
it, and it becomes easier to use in C in a later patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-8-npiggin@gmail.com
2021-06-10 22:12:12 +10:00
Nicholas Piggin
1b5821c630 KVM: PPC: Book3S 64: move bad_host_intr check to HV handler
The bad_host_intr check will never be true with PR KVM, move
it to HV code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-7-npiggin@gmail.com
2021-06-10 22:12:12 +10:00
Nicholas Piggin
69fdd67499 KVM: PPC: Book3S 64: Move interrupt early register setup to KVM
Like the earlier patch for hcalls, KVM interrupt entry requires a
different calling convention than the Linux interrupt handlers
set up. Move the code that converts from one to the other into KVM.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-6-npiggin@gmail.com
2021-06-10 22:12:12 +10:00
Nicholas Piggin
04ece7b60b KVM: PPC: Book3S 64: Move hcall early register setup to KVM
System calls / hcalls have a different calling convention than
other interrupts, so there is code in the KVMTEST to massage these
into the same form as other interrupt handlers.

Move this work into the KVM hcall handler. This means teaching KVM
a little more about the low level interrupt handler setup, PACA save
areas, etc., although that's not obviously worse than the current
approach of coming up with an entirely different interrupt register
/ save convention.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-5-npiggin@gmail.com
2021-06-10 22:12:12 +10:00
Nicholas Piggin
31c67cfe2a KVM: PPC: Book3S 64: add hcall interrupt handler
Add a separate hcall entry point. This can be used to deal with the
different calling convention.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-4-npiggin@gmail.com
2021-06-10 22:12:12 +10:00
Nicholas Piggin
f33e0702d9 KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVM
Move the GUEST_MODE_SKIP logic into KVM code. This is quite a KVM
internal detail that has no real need to be in common handlers.

Add a comment explaining the what and why of KVM "skip" interrupts.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-3-npiggin@gmail.com
2021-06-10 22:12:11 +10:00
Nicholas Piggin
f36011569b KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry point
Rather than bifurcate the call depending on whether or not HV is
possible, and have the HV entry test for PR, just make a single
common point which does the demultiplexing. This makes it simpler
to add another type of exit handler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-2-npiggin@gmail.com
2021-06-10 22:12:01 +10:00
Nicholas Piggin
6ba53317d4 KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
Similar to commit 25edcc50d7 ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.

Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
2021-06-04 22:02:25 +10:00
Linus Torvalds
d07f6ca923 Linux 5.13-rc2 v5.13-rc2 2021-05-16 15:27:44 -07:00
Linus Torvalds
28183dbf54 Merge tag 'driver-core-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
 "Here are two driver fixes for driver core changes that happened in
  5.13-rc1.

  The clk driver fix resolves a many-reported issue with booting some
  devices, and the USB typec fix resolves the reported problem of USB
  systems on some embedded boards.

  Both of these have been in linux-next this week with no reported
  issues"

* tag 'driver-core-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  clk: Skip clk provider registration when np is NULL
  usb: typec: tcpm: Don't block probing of consumers of "connector" nodes
2021-05-16 10:13:14 -07:00
Linus Torvalds
6942d81a8f Merge tag 'staging-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO driver fixes from Greg KH:
 "Here are some small IIO driver fixes and one Staging driver fix for
  5.13-rc2.

  Nothing major, just some resolutions for reported problems:

   - gcc-11 bogus warning fix for rtl8723bs

   - iio driver tiny fixes

  All of these have been in linux-next for many days with no reported
  issues"

* tag 'staging-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: tsl2583: Fix division by a zero lux_val
  iio: core: return ENODEV if ioctl is unknown
  iio: core: fix ioctl handlers removal
  iio: gyro: mpu3050: Fix reported temperature value
  iio: hid-sensors: select IIO_TRIGGERED_BUFFER under HID_SENSOR_IIO_TRIGGER
  iio: proximity: pulsedlight: Fix rumtime PM imbalance on error
  iio: light: gp2ap002: Fix rumtime PM imbalance on error
  staging: rtl8723bs: avoid bogus gcc warning
2021-05-16 10:06:19 -07:00
Linus Torvalds
4a668429e0 Merge tag 'usb-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are some small USB fixes for 5.13-rc2. They consist of a number
  of resolutions for reported issues:

   - typec fixes for found problems

   - xhci fixes and quirk additions

   - dwc3 driver fixes

   - minor fixes found by Coverity

   - cdc-wdm fixes for reported problems

  All of these have been in linux-next for a few days with no reported
  issues"

* tag 'usb-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
  usb: core: hub: fix race condition about TRSMRCY of resume
  usb: typec: tcpm: Fix SINK_DISCOVERY current limit for Rp-default
  xhci: Add reset resume quirk for AMD xhci controller.
  usb: xhci: Increase timeout for HC halt
  xhci: Do not use GFP_KERNEL in (potentially) atomic context
  xhci: Fix giving back cancelled URBs even if halted endpoint can't reset
  xhci-pci: Allow host runtime PM as default for Intel Alder Lake xHCI
  usb: musb: Fix an error message
  usb: typec: tcpm: Fix wrong handling for Not_Supported in VDM AMS
  usb: typec: tcpm: Send DISCOVER_IDENTITY from dedicated work
  usb: typec: ucsi: Retrieve all the PDOs instead of just the first 4
  usb: fotg210-hcd: Fix an error message
  docs: usb: function: Modify path name
  usb: dwc3: omap: improve extcon initialization
  usb: typec: ucsi: Put fwnode in any case during ->probe()
  usb: typec: tcpm: Fix wrong handling in GET_SINK_CAP
  usb: dwc2: Remove obsolete MODULE_ constants from platform.c
  usb: dwc3: imx8mp: fix error return code in dwc3_imx8mp_probe()
  usb: dwc3: imx8mp: detect dwc3 core node via compatible string
  usb: dwc3: gadget: Return success always for kick transfer in ep queue
  ...
2021-05-16 09:55:05 -07:00
Linus Torvalds
8ce3648158 Merge tag 'timers-urgent-2021-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "Two fixes for timers:

   - Use the ALARM feature check in the alarmtimer core code insted of
     the old method of checking for the set_alarm() callback.

     Drivers can have that callback set but the feature bit cleared. If
     such a RTC device is selected then alarms wont work.

   - Use a proper define to let the preprocessor check whether Hyper-V
     VDSO clocksource should be active.

     The code used a constant in an enum with #ifdef, which evaluates to
     always false and disabled the clocksource for VDSO"

* tag 'timers-urgent-2021-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/hyper-v: Re-enable VDSO_CLOCKMODE_HVCLOCK on X86
  alarmtimer: Check RTC features instead of ops
2021-05-16 09:42:13 -07:00
Linus Torvalds
f44e58bb19 Merge tag 'for-linus-5.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:

 - two patches for error path fixes

 - a small series for fixing a regression with swiotlb with Xen on Arm

* tag 'for-linus-5.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/swiotlb: check if the swiotlb has already been initialized
  arm64: do not set SWIOTLB_NO_FORCE when swiotlb is required
  xen/arm: move xen_swiotlb_detect to arm/swiotlb-xen.h
  xen/unpopulated-alloc: fix error return code in fill_list()
  xen/gntdev: fix gntdev_mmap() error exit path
2021-05-16 09:39:04 -07:00
Linus Torvalds
ccb013c29d Merge tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
 "The three SEV commits are not really urgent material. But we figured
  since getting them in now will avoid a huge amount of conflicts
  between future SEV changes touching tip, the kvm and probably other
  trees, sending them to you now would be best.

  The idea is that the tip, kvm etc branches for 5.14 will all base
  ontop of -rc2 and thus everything will be peachy. What is more, those
  changes are purely mechanical and defines movement so they should be
  fine to go now (famous last words).

  Summary:

   - Enable -Wundef for the compressed kernel build stage

   - Reorganize SEV code to streamline and simplify future development"

* tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/compressed: Enable -Wundef
  x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG
  x86/sev: Move GHCB MSR protocol and NAE definitions in a common header
  x86/sev-es: Rename sev-es.{ch} to sev.{ch}
2021-05-16 09:31:06 -07:00
Linus Torvalds
63d1cb53e2 Merge tag 'powerpc-5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - Fix a regression in the conversion of the 64-bit BookE interrupt
   entry to C.

 - Fix KVM hosts running with the hash MMU since the recent KVM gfn
   changes.

 - Fix a deadlock in our paravirt spinlocks when hcall tracing is
   enabled.

 - Several fixes for oopses in our runtime code patching for security
   mitigations.

 - A couple of minor fixes for the recent conversion of 32-bit interrupt
   entry/exit to C.

 - Fix __get_user() causing spurious crashes in sigreturn due to a bad
   inline asm constraint, spotted with GCC 11.

 - A fix for the way we track IRQ masking state vs NMI interrupts when
   using the new scv system call entry path.

 - A couple more minor fixes.

Thanks to Cédric Le Goater, Christian Zigotzky, Christophe Leroy,
Naveen N. Rao, Nicholas Piggin Paul Menzel, and Sean Christopherson.

* tag 'powerpc-5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64e/interrupt: Fix nvgprs being clobbered
  powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled
  powerpc/64s: Fix stf mitigation patching w/strict RWX & hash
  powerpc/64s: Fix entry flush patching w/strict RWX & hash
  powerpc/64s: Fix crashes when toggling entry flush barrier
  powerpc/64s: Fix crashes when toggling stf barrier
  KVM: PPC: Book3S HV: Fix kvm_unmap_gfn_range_hv() for Hash MMU
  powerpc/legacy_serial: Fix UBSAN: array-index-out-of-bounds
  powerpc/signal: Fix possible build failure with unsafe_copy_fpr_{to/from}_user
  powerpc/uaccess: Fix __get_user() with CONFIG_CC_HAS_ASM_GOTO_OUTPUT
  powerpc/pseries: warn if recursing into the hcall tracing code
  powerpc/pseries: use notrace hcall variant for H_CEDE idle
  powerpc/pseries: Don't trace hcall tracing wrapper
  powerpc/pseries: Fix hcall tracing recursion in pv queued spinlocks
  powerpc/syscall: Calling kuap_save_and_lock() is wrong
  powerpc/interrupts: Fix kuep_unlock() call
2021-05-15 16:39:45 -07:00
Linus Torvalds
c12a29ed90 Merge tag 'sched-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Fix an idle CPU selection bug, and an AMD Ryzen maximum frequency
  enumeration bug"

* tag 'sched-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, sched: Fix the AMD CPPC maximum performance value on certain AMD Ryzen generations
  sched/fair: Fix clearing of has_idle_cores flag in select_idle_cpu()
2021-05-15 10:24:48 -07:00
Linus Torvalds
e7c425b744 Merge tag 'objtool-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
 "Fix a couple of endianness bugs that crept in"

* tag 'objtool-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool/x86: Fix elf_add_alternative() endianness
  objtool: Fix elf_create_undef_symbol() endianness
2021-05-15 10:18:23 -07:00
Linus Torvalds
077fc64407 Merge tag 'irq-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
 "Fix build warning on SH"

* tag 'irq-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sh: Remove unused variable
2021-05-15 10:13:42 -07:00
Linus Torvalds
91b7a0f063 Merge tag 'core-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 stack randomization fix from Ingo Molnar:
 "Fix an assembly constraint that affected LLVM up to version 12"

* tag 'core-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stack: Replace "o" output with "r" input constraint
2021-05-15 10:00:35 -07:00
Linus Torvalds
a4147415bd Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "13 patches.

  Subsystems affected by this patch series: resource, squashfs, hfsplus,
  modprobe, and mm (hugetlb, slub, userfaultfd, ksm, pagealloc, kasan,
  pagemap, and ioremap)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/ioremap: fix iomap_max_page_shift
  docs: admin-guide: update description for kernel.modprobe sysctl
  hfsplus: prevent corruption in shrinking truncate
  mm/filemap: fix readahead return types
  kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
  mm: fix struct page layout on 32-bit systems
  ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()"
  userfaultfd: release page in error path to avoid BUG_ON
  squashfs: fix divide error in calculate_skip()
  kernel/resource: fix return code check in __request_free_mem_region
  mm, slub: move slub_debug static key enabling outside slab_mutex
  mm/hugetlb: fix cow where page writtable in child
  mm/hugetlb: fix F_SEAL_FUTURE_WRITE
2021-05-15 09:42:27 -07:00
Linus Torvalds
f36edc5533 Merge tag 'arc-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:

 - PAE fixes

 - syscall num check off-by-one bug

 - misc fixes

* tag 'arc-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: mm: Use max_high_pfn as a HIGHMEM zone border
  ARC: mm: PAE: use 40-bit physical page mask
  ARC: entry: fix off-by-one error in syscall number validation
  ARC: kgdb: add 'fallthrough' to prevent a warning
  arc: Fix typos/spellos
2021-05-15 09:01:45 -07:00
Linus Torvalds
8f4ae0f68c Merge tag 'block-5.13-2021-05-14' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - Fix for shared tag set exit (Bart)

 - Correct ioctl range for zoned ioctls (Damien)

 - Removed dead/unused function (Lin)

 - Fix perf regression for shared tags (Ming)

 - Fix out-of-bounds issue with kyber and preemption (Omar)

 - BFQ merge fix (Paolo)

 - Two error handling fixes for nbd (Sun)

 - Fix weight update in blk-iocost (Tejun)

 - NVMe pull request (Christoph):
      - correct the check for using the inline bio in nvmet (Chaitanya
        Kulkarni)
      - demote unsupported command warnings (Chaitanya Kulkarni)
      - fix corruption due to double initializing ANA state (me, Hou Pu)
      - reset ns->file when open fails (Daniel Wagner)
      - fix a NULL deref when SEND is completed with error in nvmet-rdma
        (Michal Kalderon)

 - Fix kernel-doc warning (Bart)

* tag 'block-5.13-2021-05-14' of git://git.kernel.dk/linux-block:
  block/partitions/efi.c: Fix the efi_partition() kernel-doc header
  blk-mq: Swap two calls in blk_mq_exit_queue()
  blk-mq: plug request for shared sbitmap
  nvmet: use new ana_log_size instead the old one
  nvmet: seset ns->file when open fails
  nbd: share nbd_put and return by goto put_nbd
  nbd: Fix NULL pointer in flush_workqueue
  blkdev.h: remove unused codes blk_account_rq
  block, bfq: avoid circular stable merges
  blk-iocost: fix weight updates of inner active iocgs
  nvmet: demote fabrics cmd parse err msg to debug
  nvmet: use helper to remove the duplicate code
  nvmet: demote discovery cmd parse err msg to debug
  nvmet-rdma: Fix NULL deref when SEND is completed with error
  nvmet: fix inline bio check for passthru
  nvmet: fix inline bio check for bdev-ns
  nvme-multipath: fix double initialization of ANA state
  kyber: fix out of bounds access when preempted
  block: uapi: fix comment about block device ioctl
2021-05-15 08:52:30 -07:00
Linus Torvalds
5601591035 Merge tag 'io_uring-5.13-2021-05-14' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "Just a few minor fixes/changes:

   - Fix issue with double free race for linked timeout completions

   - Fix reference issue with timeouts

   - Remove last few places that make SQPOLL special, since it's just an
     io thread now.

   - Bump maximum allowed registered buffers, as we don't allocate as
     much anymore"

* tag 'io_uring-5.13-2021-05-14' of git://git.kernel.dk/linux-block:
  io_uring: increase max number of reg buffers
  io_uring: further remove sqpoll limits on opcodes
  io_uring: fix ltout double free on completion race
  io_uring: fix link timeout refs
2021-05-15 08:43:44 -07:00
Linus Torvalds
41f035c062 Merge tag 'erofs-for-5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
 "This mainly fixes 1 lcluster-sized pclusters for the big pcluster
  feature, which can be forcely generated by mkfs as a specific on-disk
  case for per-(sub)file compression strategies but missed to handle in
  runtime properly.

  Also, documentation updates are included to fix the broken
  illustration due to the ReST conversion by accident and complete the
  big pcluster introduction.

  Summary:

   - update documentation to fix the broken illustration due to ReST
     conversion by accident at that time and complete the big pcluster
     introduction

   - fix 1 lcluster-sized pclusters for the big pcluster feature"

* tag 'erofs-for-5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix 1 lcluster-sized pcluster for big pcluster
  erofs: update documentation about data compression
  erofs: fix broken illustration in documentation
2021-05-15 08:37:21 -07:00
Linus Torvalds
a5ce4296b0 Merge tag 'libnvdimm-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "A regression fix for a bootup crash condition introduced in this merge
  window and some other minor fixups:

   - Fix regression in ACPI NFIT table handling leading to crashes and
     driver load failures.

   - Move the nvdimm mailing list

   - Miscellaneous minor fixups"

* tag 'libnvdimm-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  ACPI: NFIT: Fix support for variable 'SPA' structure size
  MAINTAINERS: Move nvdimm mailing list
  tools/testing/nvdimm: Make symbol '__nfit_test_ioremap' static
  libnvdimm: Remove duplicate struct declaration
2021-05-15 08:32:51 -07:00
Linus Torvalds
393f42f113 Merge tag 'dax-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax fixes from Dan Williams:
 "A fix for a hang condition due to missed wakeups in the filesystem-dax
  core when exercised by virtiofs.

  This bug has been there from the beginning, but the condition has
  not triggered on other filesystems since they hold a lock over
  invalidation events"

* tag 'dax-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Wake up all waiters after invalidating dax entry
  dax: Add a wakeup mode parameter to put_unlocked_entry()
  dax: Add an enum for specifying dax wakup mode
2021-05-15 08:28:08 -07:00
Linus Torvalds
33f85ca44e Merge tag 'drm-fixes-2021-05-15' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
 "Looks like I wasn't the only one not fully switched on this week. The
  msm pull has a missing tag so I missed it, and i915 team were a bit
  late. In my defence I did have a day with the roof of my home office
  removed, so was sitting at my kids desk.

  msm:
   - dsi regression fix
   - dma-buf pinning fix
   - displayport fixes
   - llc fix

  i915:
   - Fix active callback alignment annotations and subsequent crashes
   - Retract link training strategy to slow and wide, again
   - Avoid division by zero on gen2
   - Use correct width reads for C0DRB3/C1DRB3 registers
   - Fix double free in pdp allocation failure path
   - Fix HDMI 2.1 PCON downstream caps check"

* tag 'drm-fixes-2021-05-15' of git://anongit.freedesktop.org/drm/drm:
  drm/i915: Use correct downstream caps for check Src-Ctl mode for PCON
  drm/i915/overlay: Fix active retire callback alignment
  drm/i915: Fix crash in auto_retire
  drm/i915/gt: Fix a double free in gen8_preallocate_top_level_pdp
  drm/i915: Read C0DRB3/C1DRB3 as 16 bits again
  drm/i915: Avoid div-by-zero on gen2
  drm/i915/dp: Use slow and wide link training for everything
  drm/msm/dp: initialize audio_comp when audio starts
  drm/msm/dp: check sink_count before update is_connected status
  drm/msm: fix minor version to indicate MSM_PARAM_SUSPENDS support
  drm/msm/dsi: fix msm_dsi_phy_get_clk_provider return code
  drm/msm/dsi: dsi_phy_28nm_8960: fix uninitialized variable access
  drm/msm: fix LLC not being enabled for mmu500 targets
  drm/msm: Do not unpin/evict exported dma-buf's
2021-05-15 08:18:29 -07:00
Tetsuo Handa
ffb324e6f8 tty: vt: always invoke vc->vc_sw->con_resize callback
syzbot is reporting OOB write at vga16fb_imageblit() [1], for
resize_screen() from ioctl(VT_RESIZE) returns 0 without checking whether
requested rows/columns fit the amount of memory reserved for the graphical
screen if current mode is KD_GRAPHICS.

----------
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>
  #include <sys/ioctl.h>
  #include <linux/kd.h>
  #include <linux/vt.h>

  int main(int argc, char *argv[])
  {
        const int fd = open("/dev/char/4:1", O_RDWR);
        struct vt_sizes vt = { 0x4100, 2 };

        ioctl(fd, KDSETMODE, KD_GRAPHICS);
        ioctl(fd, VT_RESIZE, &vt);
        ioctl(fd, KDSETMODE, KD_TEXT);
        return 0;
  }
----------

Allow framebuffer drivers to return -EINVAL, by moving vc->vc_mode !=
KD_GRAPHICS check from resize_screen() to fbcon_resize().

Link: https://syzkaller.appspot.com/bug?extid=1f29e126cf461c4de3b3 [1]
Reported-by: syzbot <syzbot+1f29e126cf461c4de3b3@syzkaller.appspotmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+1f29e126cf461c4de3b3@syzkaller.appspotmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-15 08:12:12 -07:00
Christophe Leroy
86d0c16427 mm/ioremap: fix iomap_max_page_shift
iomap_max_page_shift is expected to contain a page shift, so it can't be a
'bool', has to be an 'unsigned int'

And fix the default values: P4D_SHIFT is when huge iomap is allowed.

However, on some architectures (eg: powerpc book3s/64), P4D_SHIFT is not a
constant so it can't be used to initialise a static variable.  So,
initialise iomap_max_page_shift with a maximum shift supported by the
architecture, it is gated by P4D_SHIFT in vmap_try_huge_p4d() anyway.

Link: https://lkml.kernel.org/r/ad2d366015794a9f21320dcbdd0a8eb98979e9df.1620898113.git.christophe.leroy@csgroup.eu
Fixes: bbc180a5ad ("mm: HUGE_VMAP arch support cleanup")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14 19:41:32 -07:00
Rasmus Villemoes
f4d3f25ace docs: admin-guide: update description for kernel.modprobe sysctl
When I added CONFIG_MODPROBE_PATH, I neglected to update Documentation/.
It's still true that this defaults to /sbin/modprobe, but now via a level
of indirection.  So document that the kernel might have been built with
something other than /sbin/modprobe as the initial value.

Link: https://lkml.kernel.org/r/20210420125324.1246826-1-linux@rasmusvillemoes.dk
Fixes: 17652f4240 ("modules: add CONFIG_MODPROBE_PATH")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14 19:41:32 -07:00
Jouni Roivas
c3187cf322 hfsplus: prevent corruption in shrinking truncate
I believe there are some issues introduced by commit 31651c6071
("hfsplus: avoid deadlock on file truncation")

HFS+ has extent records which always contains 8 extents.  In case the
first extent record in catalog file gets full, new ones are allocated from
extents overflow file.

In case shrinking truncate happens to middle of an extent record which
locates in extents overflow file, the logic in hfsplus_file_truncate() was
changed so that call to hfs_brec_remove() is not guarded any more.

Right action would be just freeing the extents that exceed the new size
inside extent record by calling hfsplus_free_extents(), and then check if
the whole extent record should be removed.  However since the guard
(blk_cnt > start) is now after the call to hfs_brec_remove(), this has
unfortunate effect that the last matching extent record is removed
unconditionally.

To reproduce this issue, create a file which has at least 10 extents, and
then perform shrinking truncate into middle of the last extent record, so
that the number of remaining extents is not under or divisible by 8.  This
causes the last extent record (8 extents) to be removed totally instead of
truncating into middle of it.  Thus this causes corruption, and lost data.

Fix for this is simply checking if the new truncated end is below the
start of this extent record, making it safe to remove the full extent
record.  However call to hfs_brec_remove() can't be moved to it's previous
place since we're dropping ->tree_lock and it can cause a race condition
and the cached info being invalidated possibly corrupting the node data.

Another issue is related to this one.  When entering into the block
(blk_cnt > start) we are not holding the ->tree_lock.  We break out from
the loop not holding the lock, but hfs_find_exit() does unlock it.  Not
sure if it's possible for someone else to take the lock under our feet,
but it can cause hard to debug errors and premature unlocking.  Even if
there's no real risk of it, the locking should still always be kept in
balance.  Thus taking the lock now just before the check.

Link: https://lkml.kernel.org/r/20210429165139.3082828-1-jouni.roivas@tuxera.com
Fixes: 31651c6071 ("hfsplus: avoid deadlock on file truncation")
Signed-off-by: Jouni Roivas <jouni.roivas@tuxera.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Cc: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14 19:41:32 -07:00
Matthew Wilcox (Oracle)
076171a677 mm/filemap: fix readahead return types
A readahead request will not allocate more memory than can be represented
by a size_t, even on systems that have HIGHMEM available.  Change the
length functions from returning an loff_t to a size_t.

Link: https://lkml.kernel.org/r/20210510201201.1558972-1-willy@infradead.org
Fixes: 32c0a6bcaa ("btrfs: add and use readahead_batch_length")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14 19:41:32 -07:00
Peter Collingbourne
f649dc0e0d kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
These tests deliberately access these arrays out of bounds, which will
cause the dynamic local bounds checks inserted by
CONFIG_UBSAN_LOCAL_BOUNDS to fail and panic the kernel.  To avoid this
problem, access the arrays via volatile pointers, which will prevent the
compiler from being able to determine the array bounds.

These accesses use volatile pointers to char (char *volatile) rather than
the more conventional pointers to volatile char (volatile char *) because
we want to prevent the compiler from making inferences about the pointer
itself (i.e.  its array bounds), not the data that it refers to.

Link: https://lkml.kernel.org/r/20210507025915.1464056-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/I90b1713fbfa1bf68ff895aef099ea77b98a7c3b9
Signed-off-by: Peter Collingbourne <pcc@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: George Popescu <georgepope@android.com>
Cc: Elena Petrova <lenaptr@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14 19:41:32 -07:00
Matthew Wilcox (Oracle)
9ddb3c14af mm: fix struct page layout on 32-bit systems
32-bit architectures which expect 8-byte alignment for 8-byte integers and
need 64-bit DMA addresses (arm, mips, ppc) had their struct page
inadvertently expanded in 2019.  When the dma_addr_t was added, it forced
the alignment of the union to 8 bytes, which inserted a 4 byte gap between
'flags' and the union.

Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
This restores the alignment to that of an unsigned long.  We always
store the low bits in the first word to prevent the PageTail bit from
being inadvertently set on a big endian platform.  If that happened,
get_user_pages_fast() racing against a page which was freed and
reallocated to the page_pool could dereference a bogus compound_head(),
which would be hard to trace back to this cause.

Link: https://lkml.kernel.org/r/20210510153211.1504886-1-willy@infradead.org
Fixes: c25fff7171 ("mm: add dma_addr_t to struct page")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Matteo Croce <mcroce@linux.microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14 19:41:32 -07:00