Commit Graph

1014934 Commits

Author SHA1 Message Date
Julian Wiedmann
17c0b86e5f s390/ccwgroup: use BUS_NOTIFY_UNBOUND_DRIVER to trigger ungrouping
ccwgroup_notifier() currently listens for BUS_NOTIFY_UNBIND_DRIVER
events, and triggers an ungrouping for the affected device.

Looking at __device_release_driver(), we can wait for a little longer
until the driver has been fully unbound and eg. bus->remove() has been
called. So listen for BUS_NOTIFY_UNBOUND_DRIVER instead. Due to locking
the current code should work just fine, but this clarifies our intent.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:07:00 +02:00
Julian Wiedmann
428b7f5983 s390/ccwgroup: simplify ungrouping when driver deregisters
driver_unregister() ends up calling __device_release_driver() for
each device that is bound to the driver. Thus ccwgroup_notifier() will
see a BUS_NOTIFY_UNBIND_DRIVER event for these ccwgroup devices, and
trigger the ungrouping.

So there's no need to also trigger the ungrouping from within
ccwgroup_driver_unregister(), remove it.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:07:00 +02:00
Julian Wiedmann
31aae32ca1 s390/vfio-ap: clean up vfio_ap_drv's definition
Define & initialize the driver struct in one go, so that everything
is in one place.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:07:00 +02:00
Harald Freudenberger
0677519ab9 s390/ap: extend AP change bindings-complete uevent with counter
Userspace udev rules need an indication about the very first AP change
BINDINGS=complete uevent.

So now this uevent is extend with an additional key-value entry
COMPLETECOUNT=<counter>. The very first uevent will show counter=1 and
the following BINDINGS=complete uevents increase this counter by 1.

Here is an example how the very first BINDINGS=complete uevent
looks like:

  KERNEL[106.079510] change   /devices/ap (ap)
  ACTION=change
  DEVPATH=/devices/ap
  SUBSYSTEM=ap
  BINDINGS=complete
  COMPLETECOUNT=1
  SEQNUM=10686

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:07:00 +02:00
Alexander Gordeev
d2e834c62d s390/smp: remove redundant pcpu::lowcore member
Per-CPU pointer to lowcore is stored in global lowcore_ptr[]
array and duplicated in struct pcpu::lowcore member. This
update removes the redundancy.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:07:00 +02:00
Alexander Gordeev
587704efb3 s390/smp: do not preserve boot CPU lowcore on hotplug
Once the kernel is running the boot CPU lowcore becomes
freeable and does not differ from the secondary CPU ones
in any way. Make use of it and do not preserve the boot
CPU lowcore on unplugging. That allows returning unused
memory when the boot CPU is offline and makes the code
more clear.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:07:00 +02:00
Alexander Gordeev
5789284710 s390/smp: reallocate IPL CPU lowcore
The lowcore for IPL CPU is special. It is allocated early
in the boot process using memblock and never freed since.
The reason is pcpu_alloc_lowcore() and pcpu_free_lowcore()
routines use page allocator which is not available when
the IPL CPU is getting initialized.

Similar problem is already addressed for stacks - once the
virtual memory is available the early boot stacks get re-
allocated. Doing the same for lowcore will allow freeing
the IPL CPU lowcore and make no difference between the
boot and secondary CPUs.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:07:00 +02:00
Alexander Gordeev
bdb8c9353e s390/mm: ensure switch_mm() is executed with interrupts disabled
Architecture callback switch_mm() is allowed to be called with
enabled interrupts. However, our implementation of switch_mm()
does not expect that. Let's follow other architectures and make
sure switch_mm() is always executed with interrupts disabled,
regardless of what happens with the generic kernel code in the
future.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:59 +02:00
Heiko Carstens
27c1dac0b6 s390/boot: access kernel command line via parmarea
Access the kernel command line via parmarea instead of using the
COMMAND_LINE define.
With this the following gcc11 warning is removed:

arch/s390/boot/ipl_parm.c: In function ‘setup_boot_command_line’:
arch/s390/boot/ipl_parm.c:168:50: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:59 +02:00
Heiko Carstens
f73c632d38 s390/ipl: make parameter area accessible via struct parmarea
Since commit 9a965ea95135 ("s390/kexec_file: Simplify parmarea
access") we have struct parmarea which describes the layout of the
kernel parameter area.

Make the kernel parameter area available as global variable parmarea
of type struct parmarea, which allows to easily access its members.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:59 +02:00
Valentin Vidic
b7d91d230a s390/sclp_vt220: fix console name to match device
Console name reported in /proc/consoles:

  ttyS1                -W- (EC p  )    4:65

does not match the char device name:

  crw--w----    1 root     root        4,  65 May 17 12:18 /dev/ttysclp0

so debian-installer inside a QEMU s390x instance gets confused and fails
to start with the following error:

  steal-ctty: No such file or directory

Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Link: https://lore.kernel.org/r/20210427194010.9330-1-vvidic@valentin-vidic.from.hr
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:59 +02:00
Julian Wiedmann
197cec2853 s390/ccwgroup: release the cdevs from within dev->release()
Wiring up the cdevs is an essential part of the gdev creation. So
release them during the gdev destruction, ie. on the last put_device().
This could cause us to hold on to the cdevs a tiny bit longer, but
that's not a real concern.

As we have much less certainty of what context this put_device() is
executed from, switch to irqsave locking.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:59 +02:00
Julian Wiedmann
95c09f0344 s390/ap: wire up bus->probe and bus->remove
Hijacking the device_driver's probe/remove callbacks for purely
bus-internal logic is a very unconvential construct. Instead just set
up our callbacks in the AP bus struct, and really_probe() will call them
in the same way as before.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:59 +02:00
Julian Wiedmann
3b4dd96854 s390/zcrypt: remove zcrypt_device_count
It's evidently unused.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:59 +02:00
Sven Schnelle
a237283fc4 s390/crypto: fix function/prototype mismatches
gcc-11 warns:

drivers/s390/crypto/zcrypt_ccamisc.c:298:38: warning: argument 4 of type u8[64] {aka unsigned char[64]} with mismatched bound [-Warray-parameter=]
  298 |                   u32 keybitsize, u8 seckey[SECKEYBLOBSIZE])
      |                                   ~~~^~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/s390/crypto/zcrypt_ccamisc.c:24:
drivers/s390/crypto/zcrypt_ccamisc.h:162:63: note: previously declared as u8 * {aka unsigned char *}
  162 | int cca_genseckey(u16 cardnr, u16 domain, u32 keybitsize, u8 *seckey);
      |                                                           ~~~~^~~~~~
drivers/s390/crypto/zcrypt_ccamisc.c:441:41: warning: argument 5 of type u8[64] {aka unsigned char[64]} with mismatched bound [-Warray-parameter=]
  441 |                    const u8 *clrkey, u8 seckey[SECKEYBLOBSIZE])
      |                                      ~~~^~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/s390/crypto/zcrypt_ccamisc.c:24:
drivers/s390/crypto/zcrypt_ccamisc.h:168:42: note: previously declared as u8 * {aka unsigned char *}
  168 |                    const u8 *clrkey, u8 *seckey);
      |                                      ~~~~^~~~~~

Fix this by making the prototypes match the functions.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Cc: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Sven Schnelle
755112b35c s390/traps: add struct to access transactional diagnostic block
gcc-11 warns:

arch/s390/kernel/traps.c: In function __do_pgm_check:
arch/s390/kernel/traps.c:319:17: warning: memcpy reading 256 bytes from a region of size 0 [-Wstringop-overread]
  319 |                 memcpy(&current->thread.trap_tdb, &S390_lowcore.pgm_tdb, 256);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by adding a struct pgm_tdb to struct lowcore and copy that.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Sven Schnelle
6c6a07fc7c s390/irq: add union/struct to access external interrupt parameters
gcc-11 warns:

arch/s390/kernel/irq.c: In function do_ext_irq:
arch/s390/kernel/irq.c:175:9: warning: memcpy reading 4 bytes from a region of size 0 [-Wstringop-overread]
  175 |         memcpy(&regs->int_code, &S390_lowcore.ext_cpu_addr, 4);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by adding a struct for int_code to struct lowcore.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Sven Schnelle
17e89e1340 s390/facilities: move stfl information from lowcore to global data
With gcc-11, there are a lot of warnings because the facility functions
are accessing lowcore through a null pointer. Fix this by moving the
facility arrays away from lowcore.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Sven Schnelle
af9ad82290 s390/entry: use assignment to read intcode / asm to copy gprs
arch/s390/kernel/syscall.c: In function __do_syscall:
arch/s390/kernel/syscall.c:147:9: warning: memcpy reading 64 bytes from a region of size 0 [-Wstringop-overread]
  147 |         memcpy(&regs->gprs[8], S390_lowcore.save_area_sync, 8 * sizeof(unsigned long));
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/s390/kernel/syscall.c:148:9: warning: memcpy reading 4 bytes from a region of size 0 [-Wstringop-overread]
  148 |         memcpy(&regs->int_code, &S390_lowcore.svc_ilc, sizeof(regs->int_code));
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by moving the gprs restore from C to assembly, and use a assignment
for int_code instead of memcpy.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Niklas Schnelle
d460bb6c64 s390: enable HAVE_IOREMAP_PROT
In commit b02002cc4c ("s390/pci: Implement ioremap_wc/prot() with
MIO") we implemented both ioremap_wc() and ioremap_prot() however until
now we had not set HAVE_IOREMAP_PROT in Kconfig, do so now.

This also requires implementing pte_pgprot() as this is used in the
generic_access_phys() code enabled by CONFIG_HAVE_IOREMAP_PROT. As with
ioremap_wc() we need to take the MMIO Write Back bit index into account.

Moreover since the pgprot value returned from pte_pgprot() is to be used
for mappings into kernel address space we must make sure that it uses
appropriate kernel page table protection bits. In particular a pgprot
value originally coming from userspace could have the _PAGE_PROTECT
bit set to enable fault based dirty bit accounting which would then make
the mapping inaccessible when used in kernel address space.

Fixes: b02002cc4c ("s390/pci: Implement ioremap_wc/prot() with MIO")
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Thomas Richter
15e5b53ff4 s390/cpumf: remove WARN_ON_ONCE in counter start handler
Remove some WARN_ON_ONCE() warnings when a counter is started. Each
counter is installed function calls
event_sched_in() --> cpumf_pmu_add(..., PERF_EF_START).

This is done after the event has been created using
perf_pmu_event_init() which verifies the counter is valid.
Member hwc->config must be valid at this point.

Function cpumf_pmu_start(..., PERF_EF_RELOAD) is called from
function cpumf_pmu_add() for counter events. All other invocations of
cpumf_pmu_start(..., PERF_EF_RELOAD) are from the performance subsystem
for sampling events.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Thomas Richter
d552a58d70 s390/cpumf: remove counter transaction call backs
The command 'perf stat -e cycles ...' triggers the following function
sequence in the CPU Measurement Facility counter device driver:

perf_pmu_event_init()
  __hw_perf_event_init()
    validate_ctr_auth()
    validate_ctr_version()

During event creation, the counter number is checked in functions
validate_ctr_auth() and validate_ctr_version() to verify it is a valid
counter and supported by the hardware. If this is not the case, both
functions return an error and the event is not created. System call
perf_event_open() returns an error in this case.

Later on the event is installed in the kernel event subsystem and the
driver functions cpumf_pmu_add() and cpumf_pmu_commit_txn() are called
to install the counter event by the hardware.

Since both events have been verified at event creation, there is no need
to re-evaluate the authorization state. This can not change since on
 * LPARs the authorization change requires a restart of the LPAR (and
   thus a reboot of the kernel)
 * DPMs can not take resources away, just add them.

Also the sequence of CPU Measurement facility counter device driver
calls is
  cpumf_pmu_start_txn
  cpumf_pmu_add
  cpumf_pmu_start
  cpumf_pmu_commit_txn
for every single event. Which means the condition in cpumf_pmu_add()
is never met and validate_ctr_auth() is never called.

This leaves the counter device driver transaction functions with
just one task:
start_txn: Verify a transaction is not in flight and call
	perf_pmu_disable()
cancel_txn, commit_txn: Verify a transaction is in flight and call
	perf_pmu_enable()

The same functionality is provided by the default transaction handling
functions in kernel/events/core.c. Use those by removing the
counter device driver private call back functions.

Suggested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Linus Torvalds
614124bea7 Linux 5.13-rc5 v5.13-rc5 2021-06-06 15:47:27 -07:00
Linus Torvalds
90d56a3d6e Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Five small and fairly minor fixes, all in drivers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_devinfo: Add blacklist entry for HPE OPEN-V
  scsi: ufs: ufs-mediatek: Fix HCI version in some platforms
  scsi: qedf: Do not put host in qedf_vport_create() unconditionally
  scsi: lpfc: Fix failure to transmit ABTS on FC link
  scsi: target: core: Fix warning on realtime kernels
2021-06-06 15:39:56 -07:00
Linus Torvalds
20e41d9bc8 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
 "Miscellaneous ext4 bug fixes"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Only advertise encrypted_casefold when encryption and unicode are enabled
  ext4: fix no-key deletion for encrypt+casefold
  ext4: fix memory leak in ext4_fill_super
  ext4: fix fast commit alignment issues
  ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
  ext4: fix accessing uninit percpu counter variable with fast_commit
  ext4: fix memory leak in ext4_mb_init_backend on error path.
2021-06-06 14:24:13 -07:00
Linus Torvalds
decad3e1d1 Merge tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
 "A set of fixes that have been coming in over the last few weeks, the
  usual mix of fixes:

   - DT fixups for TI K3

   - SATA drive detection fix for TI DRA7

   - Power management fixes and a few build warning removals for OMAP

   - OP-TEE fix to use standard API for UUID exporting

   - DT fixes for a handful of i.MX boards

  And a few other smaller items"

* tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
  arm64: meson: select COMMON_CLK
  soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()
  ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power
  bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
  ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
  ARM: dts: imx7d-pico: Fix the 'tuning-step' property
  ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
  arm64: dts: freescale: sl28: var1: fix RGMII clock and voltage
  arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
  ARM: imx: pm-imx27: Include "common.h"
  arm64: dts: zii-ultra: fix 12V_MAIN voltage
  arm64: dts: zii-ultra: remove second GEN_3V3 regulator instance
  arm64: dts: ls1028a: fix memory node
  bus: ti-sysc: Fix am335x resume hang for usb otg module
  ARM: OMAP2+: Fix build warning when mmc_omap is not built
  ARM: OMAP1: isp1301-omap: Add missing gpiod_add_lookup_table function
  ARM: OMAP1: Fix use of possibly uninitialized irq variable
  optee: use export_uuid() to copy client UUID
  arm64: dts: ti: k3*: Introduce reg definition for interrupt routers
  arm64: dts: ti: k3-am65|j721e|am64: Map the dma / navigator subsystem via explicit ranges
  ...
2021-06-06 13:00:36 -07:00
Linus Torvalds
bd7b12aa60 Merge tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "Fix our KVM reverse map real-mode handling since we enabled huge
  vmalloc (in some configurations).

  Revert a recent change to our IOMMU code which broke some devices.

  Fix KVM handling of FSCR on P7/P8, which could have possibly let a
  guest crash it's Qemu.

  Fix kprobes validation of prefixed instructions across page boundary.

  Thanks to Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas,
  Frederic Barrat, Naveen N. Rao, and Nicholas Piggin"

* tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs"
  KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
  powerpc: Fix reverse map real-mode address lookup with huge vmalloc
  powerpc/kprobes: Fix validation of prefixed instructions across page boundary
2021-06-06 12:39:36 -07:00
Linus Torvalds
773ac53bbf Merge tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
 "A bunch of x86/urgent stuff accumulated for the last two weeks so
  lemme unload it to you.

  It should be all totally risk-free, of course. :-)

   - Fix out-of-spec hardware (1st gen Hygon) which does not implement
     MSR_AMD64_SEV even though the spec clearly states so, and check
     CPUID bits first.

   - Send only one signal to a task when it is a SEGV_PKUERR si_code
     type.

   - Do away with all the wankery of reserving X amount of memory in the
     first megabyte to prevent BIOS corrupting it and simply and
     unconditionally reserve the whole first megabyte.

   - Make alternatives NOP optimization work at an arbitrary position
     within the patched sequence because the compiler can put
     single-byte NOPs for alignment anywhere in the sequence (32-bit
     retpoline), vs our previous assumption that the NOPs are only
     appended.

   - Force-disable ENQCMD[S] instructions support and remove
     update_pasid() because of insufficient protection against FPU state
     modification in an interrupt context, among other xstate horrors
     which are being addressed at the moment. This one limits the
     fallout until proper enablement.

   - Use cpu_feature_enabled() in the idxd driver so that it can be
     build-time disabled through the defines in disabled-features.h.

   - Fix LVT thermal setup for SMI delivery mode by making sure the APIC
     LVT value is read before APIC initialization so that softlockups
     during boot do not happen at least on one machine.

   - Mark all legacy interrupts as legacy vectors when the IO-APIC is
     disabled and when all legacy interrupts are routed through the PIC"

* tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Check SME/SEV support in CPUID first
  x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR
  x86/setup: Always reserve the first 1M of RAM
  x86/alternative: Optimize single-byte NOPs at an arbitrary position
  x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
  dmaengine: idxd: Use cpu_feature_enabled()
  x86/thermal: Fix LVT thermal setup for SMI delivery mode
  x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
2021-06-06 12:25:43 -07:00
Daniel Rosenberg
e71f99f2df ext4: Only advertise encrypted_casefold when encryption and unicode are enabled
Encrypted casefolding is only supported when both encryption and
casefolding are both enabled in the config.

Fixes: 471fbbea7f ("ext4: handle casefolding with encryption")
Cc: stable@vger.kernel.org # 5.13+
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20210603094849.314342-1-drosen@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06 10:10:23 -04:00
Daniel Rosenberg
63e7f12893 ext4: fix no-key deletion for encrypt+casefold
commit 471fbbea7f ("ext4: handle casefolding with encryption") is
missing a few checks for the encryption key which are needed to
support deleting enrypted casefolded files when the key is not
present.

This bug made it impossible to delete encrypted+casefolded directories
without the encryption key, due to errors like:

    W         : EXT4-fs warning (device vdc): __ext4fs_dirhash:270: inode #49202: comm Binder:378_4: Siphash requires key

Repro steps in kvm-xfstests test appliance:
      mkfs.ext4 -F -E encoding=utf8 -O encrypt /dev/vdc
      mount /vdc
      mkdir /vdc/dir
      chattr +F /vdc/dir
      keyid=$(head -c 64 /dev/zero | xfs_io -c add_enckey /vdc | awk '{print $NF}')
      xfs_io -c "set_encpolicy $keyid" /vdc/dir
      for i in `seq 1 100`; do
          mkdir /vdc/dir/$i
      done
      xfs_io -c "rm_enckey $keyid" /vdc
      rm -rf /vdc/dir # fails with the bug

Fixes: 471fbbea7f ("ext4: handle casefolding with encryption")
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20210522004132.2142563-1-drosen@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06 10:10:23 -04:00
Alexey Makhalov
afd09b617d ext4: fix memory leak in ext4_fill_super
Buffer head references must be released before calling kill_bdev();
otherwise the buffer head (and its page referenced by b_data) will not
be freed by kill_bdev, and subsequently that bh will be leaked.

If blocksizes differ, sb_set_blocksize() will kill current buffers and
page cache by using kill_bdev(). And then super block will be reread
again but using correct blocksize this time. sb_set_blocksize() didn't
fully free superblock page and buffer head, and being busy, they were
not freed and instead leaked.

This can easily be reproduced by calling an infinite loop of:

  systemctl start <ext4_on_lvm>.mount, and
  systemctl stop <ext4_on_lvm>.mount

... since systemd creates a cgroup for each slice which it mounts, and
the bh leak get amplified by a dying memory cgroup that also never
gets freed, and memory consumption is much more easily noticed.

Fixes: ce40733ce9 ("ext4: Check for return value from sb_set_blocksize")
Fixes: ac27a0ec11 ("ext4: initial copy of files from ext3")
Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.com
Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2021-06-06 10:10:23 -04:00
Harshad Shirwadkar
a7ba36bc94 ext4: fix fast commit alignment issues
Fast commit recovery data on disk may not be aligned. So, when the
recovery code reads it, this patch makes sure that fast commit info
found on-disk is first memcpy-ed into an aligned variable before
accessing it. As a consequence of it, we also remove some macros that
could resulted in unaligned accesses.

Cc: stable@kernel.org
Fixes: 8016e29f43 ("ext4: fast commit recovery path")
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210519215920.2037527-1-harshads@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06 10:10:23 -04:00
Ye Bin
082cd4ec24 ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
We got follow bug_on when run fsstress with injecting IO fault:
[130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
[130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
......
[130747.334329] Call trace:
[130747.334553]  ext4_es_cache_extent+0x150/0x168 [ext4]
[130747.334975]  ext4_cache_extents+0x64/0xe8 [ext4]
[130747.335368]  ext4_find_extent+0x300/0x330 [ext4]
[130747.335759]  ext4_ext_map_blocks+0x74/0x1178 [ext4]
[130747.336179]  ext4_map_blocks+0x2f4/0x5f0 [ext4]
[130747.336567]  ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
[130747.336995]  ext4_readpage+0x54/0x100 [ext4]
[130747.337359]  generic_file_buffered_read+0x410/0xae8
[130747.337767]  generic_file_read_iter+0x114/0x190
[130747.338152]  ext4_file_read_iter+0x5c/0x140 [ext4]
[130747.338556]  __vfs_read+0x11c/0x188
[130747.338851]  vfs_read+0x94/0x150
[130747.339110]  ksys_read+0x74/0xf0

This patch's modification is according to Jan Kara's suggestion in:
https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/
"I see. Now I understand your patch. Honestly, seeing how fragile is trying
to fix extent tree after split has failed in the middle, I would probably
go even further and make sure we fix the tree properly in case of ENOSPC
and EDQUOT (those are easily user triggerable).  Anything else indicates a
HW problem or fs corruption so I'd rather leave the extent tree as is and
don't try to fix it (which also means we will not create overlapping
extents)."

Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06 10:09:55 -04:00
Linus Torvalds
f5b6eb1e01 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "Some more bugfixes from I2C for v5.13. Usual stuff"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops
  i2c: qcom-geni: Add shutdown callback for i2c
  i2c: tegra-bpmp: Demote kernel-doc abuses
  i2c: altera: Fix formatting issue in struct and demote unworthy kernel-doc headers
2021-06-05 15:45:11 -07:00
Olof Johansson
b9c112f2c2 Merge tag 'ti-k3-dt-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nmenon/linux into arm/fixes
Devicetree fixes for TI K3 platforms for v5.13 merge window:

These minor fixes include:
* Fixups for device tree discovered during yaml conversion
* Fixups for missing dma-coherent property in j7200
* Removal of camera sensor node from am65 evm dts to overlay
  as camera sensor boards are variable.

* tag 'ti-k3-dt-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nmenon/linux:
  arm64: dts: ti: k3*: Introduce reg definition for interrupt routers
  arm64: dts: ti: k3-am65|j721e|am64: Map the dma / navigator subsystem via explicit ranges
  arm64: dts: ti: k3-*: Rename the TI-SCI node
  arm64: dts: ti: k3-am65-wakeup: Drop un-necessary properties from dmsc node
  arm64: dts: ti: k3-am65-wakeup: Add debug region to TI-SCI node
  arm64: dts: ti: k3-*: Rename the TI-SCI clocks node name
  arm64: dts: ti: j7200-main: Mark Main NAVSS as dma-coherent
  arm64: dts: ti: k3-am654-base-board: remove ov5640

Link: https://lore.kernel.org/r/20210518115634.467vgpbzplal5kou@obituary
Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-05 15:43:48 -07:00
Olof Johansson
7468bed8f8 Merge tag 'optee-fix-for-v5.13' of git://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes
OP-TEE use export_uuid() to copy UUID

* tag 'optee-fix-for-v5.13' of git://git.linaro.org/people/jens.wiklander/linux-tee:
  optee: use export_uuid() to copy client UUID

Link: https://lore.kernel.org/r/20210518100712.GA449561@jade
Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-05 15:43:11 -07:00
Olof Johansson
2f3e4eb179 Merge tag 'omap-for-v5.13/fixes-pm' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
PM and build warning fixes for omaps

While chasing system suspend related regressions, I noticed few other
issues related to PM would be good to have fixed:

- UART idling does not always work for hardware autoidle features
- am335x resume works only the first time unless musb module is loaded

Then there are three patches for omap1 related warnings caused by the gpio
changes, and one build warning fix for legacy mmc platform code when mmc
is built as a loadable module.

These can all be merged whenever suitable naturally. I've sent the more
urgent SATA regression fix separately although it appears in this pull
request too because of the branches merged.

* tag 'omap-for-v5.13/fixes-pm' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power
  bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
  bus: ti-sysc: Fix am335x resume hang for usb otg module
  ARM: OMAP2+: Fix build warning when mmc_omap is not built
  ARM: OMAP1: isp1301-omap: Add missing gpiod_add_lookup_table function
  ARM: OMAP1: Fix use of possibly uninitialized irq variable

Link: https://lore.kernel.org/r/pull-1622614772-543196@atomide.com
Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-05 15:41:44 -07:00
Olof Johansson
94277cb5b4 Merge tag 'omap-for-v5.13/fixes-sata' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
Regression fix for TI dra7 SATA not detecting drives

The SATA quirk flags are no missing With recent removal of legacy
platform data and we need to add the quirk flags to detect drives.

* tag 'omap-for-v5.13/fixes-sata' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  bus: ti-sysc: Fix missing quirk flags for sata

Link: https://lore.kernel.org/r/pull-1622613578-121536@atomide.com
Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-05 15:39:58 -07:00
Olof Johansson
3091a9e742 Merge tag 'amlogic-fixes-v5.13-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux into arm/fixes
Amlogic fixes for v5.13-rc1
- arm64: meson: select COMMON_CLK to select a proper implementation of the clock API
- soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()

* tag 'amlogic-fixes-v5.13-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux:
  arm64: meson: select COMMON_CLK
  soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()

Link: https://lore.kernel.org/r/73e76706-f3f4-bebf-10dd-d2ec9106a234@baylibre.com
Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-05 15:39:23 -07:00
Olof Johansson
3a2d3ae067 Merge tag 'imx-fixes-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 5.13:

- Fix missing-prototypes warning of 'imx27_pm_init' in i.MX27 platform
  pm code.
- A couple of patches from Fabio Estevam to fix 'tuning-step' property
  in imx7d-meerkat96 and imx7d-pico DT.
- Fix '#gpio-cells' of nxp,pca8574 device in imx6qdl-emcon-avari DT.
- A couple of patches from Lucas Stach to fix regulator and voltage for
  imx8mq-zii-ultra board.
- Add missing regulators for imx6q-dhcom to avoid possible instability
  issues.
- Fix memory-controller settings for fsl-ls1028a DT.
- Fix RGMII clock and voltage for a couple of fsl-ls1028a-kontron-sl28
  boards.
- Fix RGMII connection to QCA8334 switch for imx6dl-yapp4 board.

* tag 'imx-fixes-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
  ARM: dts: imx7d-pico: Fix the 'tuning-step' property
  ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
  arm64: dts: freescale: sl28: var1: fix RGMII clock and voltage
  arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
  ARM: imx: pm-imx27: Include "common.h"
  arm64: dts: zii-ultra: fix 12V_MAIN voltage
  arm64: dts: zii-ultra: remove second GEN_3V3 regulator instance
  arm64: dts: ls1028a: fix memory node
  ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators
  ARM: dts: imx6dl-yapp4: Fix RGMII connection to QCA8334 switch

Link: https://lore.kernel.org/r/20210527011758.GD8194@dragon
Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-05 15:24:12 -07:00
Linus Torvalds
e5220dd167 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "13 patches.

  Subsystems affected by this patch series: mips, mm (kfence, debug,
  pagealloc, memory-hotplug, hugetlb, kasan, and hugetlb), init, proc,
  lib, ocfs2, and mailmap"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mailmap: use private address for Michel Lespinasse
  ocfs2: fix data corruption by fallocate
  lib: crc64: fix kernel-doc warning
  mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
  mm/kasan/init.c: fix doc warning
  proc: add .gitignore for proc-subset-pid selftest
  hugetlb: pass head page to remove_hugetlb_page()
  drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64
  mm/page_alloc: fix counting of free pages after take off from buddy
  mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
  pid: take a reference when initializing `cad_pid`
  kfence: use TASK_IDLE when awaiting allocation
  Revert "MIPS: make userspace mapping young by default"
2021-06-05 10:55:41 -07:00
Linus Torvalds
af8d9eb840 Merge tag 'riscv-for-linus-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - Build with '-mno-relax' when using LLVM's linker, which doesn't
   support linker relaxation.

 - A fix to build without SiFive's errata.

 - A fix to use PAs during init_resources()

 - A fix to avoid W+X mappings during boot.

* tag 'riscv-for-linus-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Fix memblock_free() usages in init_resources()
  riscv: skip errata_cip_453.o if CONFIG_ERRATA_SIFIVE_CIP_453 is disabled
  riscv: mm: Fix W+X mappings at boot
  riscv: Use -mno-relax when using lld linker
2021-06-05 10:45:13 -07:00
Michel Lespinasse
2eff0573e0 mailmap: use private address for Michel Lespinasse
Link: https://lkml.kernel.org/r/20210602221225.49446-1-michel@lespinasse.org
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:12 -07:00
Junxiao Bi
6bba4471f0 ocfs2: fix data corruption by fallocate
When fallocate punches holes out of inode size, if original isize is in
the middle of last cluster, then the part from isize to the end of the
cluster will be zeroed with buffer write, at that time isize is not yet
updated to match the new size, if writeback is kicked in, it will invoke
ocfs2_writepage()->block_write_full_page() where the pages out of inode
size will be dropped.  That will cause file corruption.  Fix this by
zero out eof blocks when extending the inode size.

Running the following command with qemu-image 4.2.1 can get a corrupted
coverted image file easily.

    qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
             -O qcow2 -o compat=1.1 $qcow_image.conv

The usage of fallocate in qemu is like this, it first punches holes out
of inode size, then extend the inode size.

    fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0
    fallocate(11, 0, 2276196352, 65536) = 0

v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html
v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T/

Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:12 -07:00
YueHaibing
415f0c835b lib: crc64: fix kernel-doc warning
Fix W=1 kernel build warning:

  lib/crc64.c:40: warning:
   bad line:         or the previous crc64 value if computing incrementally.

Link: https://lkml.kernel.org/r/20210601135851.15444-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Coly Li <colyli@suse.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:12 -07:00
Mina Almasry
d84cf06e3d mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
The userfaultfd hugetlb tests cause a resv_huge_pages underflow.  This
happens when hugetlb_mcopy_atomic_pte() is called with !is_continue on
an index for which we already have a page in the cache.  When this
happens, we allocate a second page, double consuming the reservation,
and then fail to insert the page into the cache and return -EEXIST.

To fix this, we first check if there is a page in the cache which
already consumed the reservation, and return -EEXIST immediately if so.

There is still a rare condition where we fail to copy the page contents
AND race with a call for hugetlb_no_page() for this index and again we
will underflow resv_huge_pages.  That is fixed in a more complicated
patch not targeted for -stable.

Test:

  Hacked the code locally such that resv_huge_pages underflows produce a
  warning, then:

  ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10
	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
  ./tools/testing/selftests/vm/userfaultfd hugetlb 10
	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success

Both tests succeed and produce no warnings.  After the test runs number
of free/resv hugepages is correct.

[mike.kravetz@oracle.com: changelog fixes]

Link: https://lkml.kernel.org/r/20210528004649.85298-1-almasrymina@google.com
Fixes: 8fb5debc5f ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:12 -07:00
Yu Kuai
7b6889f54a mm/kasan/init.c: fix doc warning
Fix gcc W=1 warning:

  mm/kasan/init.c:228: warning: Function parameter or member 'shadow_start' not described in 'kasan_populate_early_shadow'
  mm/kasan/init.c:228: warning: Function parameter or member 'shadow_end' not described in 'kasan_populate_early_shadow'

Link: https://lkml.kernel.org/r/20210603140700.3045298-1-yukuai3@huawei.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
David Matlack
263e88d678 proc: add .gitignore for proc-subset-pid selftest
This new selftest needs an entry in the .gitignore file otherwise git
will try to track the binary.

Link: https://lkml.kernel.org/r/20210601164305.11776-1-dmatlack@google.com
Fixes: 268af17ada ("selftests: proc: test subset=pid")
Signed-off-by: David Matlack <dmatlack@google.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
Naoya Horiguchi
0c5da35723 hugetlb: pass head page to remove_hugetlb_page()
When memory_failure() or soft_offline_page() is called on a tail page of
some hugetlb page, "BUG: unable to handle page fault" error can be
triggered.

remove_hugetlb_page() dereferences page->lru, so it's assumed that the
page points to a head page, but one of the caller,
dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
which could be a tail page.  So pass 'head' to it, instead.

Link: https://lkml.kernel.org/r/20210526235257.2769473-1-nao.horiguchi@gmail.com
Fixes: 6eb4e88a6d ("hugetlb: create remove_hugetlb_page() to separate functionality")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
David Hildenbrand
928130532e drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64
offline_pages() properly checks for memory holes and bails out.
However, we do a page_zone(pfn_to_page(start_pfn)) before calling
offline_pages() when offlining a memory block.

We should not unconditionally call page_zone(pfn_to_page(start_pfn)) on
aarch64 in offlining code, otherwise we can trigger a BUG when hitting a
memory hole:

   kernel BUG at include/linux/mm.h:1383!
   Internal error: Oops - BUG: 0 [#1] SMP
   Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb nvme i2c_algo_bit mlx5_core i2c_core nvme_core firmware_class
   CPU: 13 PID: 1694 Comm: ranbug Not tainted 5.12.0-next-20210524+ #4
   Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
   pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
   pc : memory_subsys_offline+0x1f8/0x250
   lr : memory_subsys_offline+0x1f8/0x250
   Call trace:
     memory_subsys_offline+0x1f8/0x250
     device_offline+0x154/0x1d8
     online_store+0xa4/0x118
     dev_attr_store+0x44/0x78
     sysfs_kf_write+0xe8/0x138
     kernfs_fop_write_iter+0x26c/0x3d0
     new_sync_write+0x2bc/0x4f8
     vfs_write+0x718/0xc88
     ksys_write+0xf8/0x1e0
     __arm64_sys_write+0x74/0xa8
     invoke_syscall.constprop.0+0x78/0x1e8
     do_el0_svc+0xe4/0x298
     el0_svc+0x20/0x30
     el0_sync_handler+0xb0/0xb8
     el0_sync+0x178/0x180
   Kernel panic - not syncing: Oops - BUG: Fatal exception
   SMP: stopping secondary CPUs
   Kernel Offset: disabled
   CPU features: 0x00000251,20000846
   Memory Limit: none

If nr_vmemmap_pages is set, we know that we are dealing with hotplugged
memory that doesn't have any holes.  So call
page_zone(pfn_to_page(start_pfn)) only when really necessary -- when
nr_vmemmap_pages is set and we actually adjust the present pages.

Link: https://lkml.kernel.org/r/20210526075226.5572-1-david@redhat.com
Fixes: a08a2ae346 ("mm,memory_hotplug: allocate memmap from the added memory range")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Qian Cai (QUIC) <quic_qiancai@quicinc.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00