If the decrementer wraps again and de-asserts the decrementer
exception while hard-disabled, __check_irq_replay() has a test to
notice the wrap when interrupts are re-enabled.
The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS
flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked
because the decrementer interrupt was always the first one checked
after clearing the hard disable flag, but HMI check was moved ahead of
that, which introduced this bug.
This can cause a missed decrementer interrupt if we soft-disable
interrupts then take an HMI which is recorded in irq_happened, then
hard-disable interrupts for > 4s to wrap the decrementer.
Fixes: e0e0d6b739 ("powerpc/64: Replay hypervisor maintenance interrupt first")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER9 DD2 PMU can stop after a state-loss idle in some conditions.
A solution is to set then clear MMCRA[60] after wake from state-loss
idle. MMCRA[60] is a non-architected bit, see the user manual for
details.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implement a NEON fallback for systems that do support NEON but have
no support for the optional 64x64->128 polynomial multiplication
instruction that is part of the ARMv8 Crypto Extensions. It is based
on the paper "Fast Software Polynomial Multiplication on ARM Processors
Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
Ricardo Dahab (https://hal.inria.fr/hal-01506572), but has been reworked
extensively for the AArch64 ISA.
On a low-end core such as the Cortex-A53 found in the Raspberry Pi3, the
NEON based implementation is 4x faster than the table based one, and
is time invariant as well, making it less vulnerable to timing attacks.
When combined with the bit-sliced NEON implementation of AES-CTR, the
AES-GCM performance increases by 2x (from 58 to 29 cycles per byte).
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implement a NEON fallback for systems that do support NEON but have
no support for the optional 64x64->128 polynomial multiplication
instruction that is part of the ARMv8 Crypto Extensions. It is based
on the paper "Fast Software Polynomial Multiplication on ARM Processors
Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
Ricardo Dahab (https://hal.inria.fr/hal-01506572)
On a 32-bit guest executing under KVM on a Cortex-A57, the new code is
not only 4x faster than the generic table based GHASH driver, it is also
time invariant. (Note that the existing vmull.p64 code is 16x faster on
this core).
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, the AES-GCM implementation for arm64 systems that support the
ARMv8 Crypto Extensions is based on the generic GCM module, which combines
the AES-CTR implementation using AES instructions with the PMULL based
GHASH driver. This is suboptimal, given the fact that the input data needs
to be loaded twice, once for the encryption and again for the MAC
calculation.
On Cortex-A57 (r1p2) and other recent cores that implement micro-op fusing
for the AES instructions, AES executes at less than 1 cycle per byte, which
means that any cycles wasted on loading the data twice hurt even more.
So implement a new GCM driver that combines the AES and PMULL instructions
at the block level. This improves performance on Cortex-A57 by ~37% (from
3.5 cpb to 2.6 cpb)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Of the various chaining modes implemented by the bit sliced AES driver,
only CTR is exposed as a synchronous cipher, and requires a fallback in
order to remain usable once we update the kernel mode NEON handling logic
to disallow nested use. So wire up the existing CTR fallback C code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
To accommodate systems that disallow the use of kernel mode NEON in
some circumstances, take the return value of may_use_simd into
account when deciding whether to invoke the C fallback routine.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
To accommodate systems that may disallow use of the NEON in kernel mode
in some circumstances, introduce a C fallback for synchronous AES in CTR
mode, and use it if may_use_simd() returns false.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON.
So honour this in the ARMv8 Crypto Extensions implementation of
CCM-AES, and fall back to a scalar implementation using the generic
crypto helpers for AES, XOR and incrementing the CTR counter.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In order to be able to reuse the generic AES code as a fallback for
situations where the NEON may not be used, update the key handling
to match the byte order of the generic code: it stores round keys
as sequences of 32-bit quantities rather than streams of bytes, and
so our code needs to be updated to reflect that.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are quite a number of occurrences in the kernel of the pattern
if (dst != src)
memcpy(dst, src, walk.total % AES_BLOCK_SIZE);
crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE);
or
crypto_xor(keystream, src, nbytes);
memcpy(dst, keystream, nbytes);
where crypto_xor() is preceded or followed by a memcpy() invocation
that is only there because crypto_xor() uses its output parameter as
one of the inputs. To avoid having to add new instances of this pattern
in the arm64 code, which will be refactored to implement non-SIMD
fallbacks, add an alternative implementation called crypto_xor_cpy(),
taking separate input and output arguments. This removes the need for
the separate memcpy().
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The pci_fixup_irqs() function allocates IRQs for all PCI devices present in
a system; those PCI devices possibly belong to different PCI bus trees (and
possibly rooted at different host bridges) and may well be enabled (ie
probed and bound to a driver) by the time pci_fixup_irqs() is called when
probing a given host bridge driver.
Furthermore, current kernel code relying on pci_fixup_irqs() to assign
legacy PCI IRQs to devices does not work at all for hotplugged devices in
that the code carrying out the IRQ fixup is called at host bridge driver
probe time, which just cannot take into account devices hotplugged after
the system has booted.
The introduction of map/swizzle function hooks in struct pci_host_bridge
allows us to define per-bridge map/swizzle functions that can be used at
device probe time in PCI core code to allocate IRQs for a given device
(through pci_assign_irq()).
Convert PCI host bridge initialization code to the
pci_scan_root_bus_bridge() API (that allows to pass a struct
pci_host_bridge with initialized map/swizzle pointers) and remove the
pci_fixup_irqs() call from arch code.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
The pci_fixup_irqs() function allocates IRQs for all PCI devices present in
a system; those PCI devices possibly belong to different PCI bus trees (and
possibly rooted at different host bridges) and may well be enabled (ie
probed and bound to a driver) by the time pci_fixup_irqs() is called when
probing a given host bridge driver.
Furthermore, current kernel code relying on pci_fixup_irqs() to assign
legacy PCI IRQs to devices does not work at all for hotplugged devices in
that the code carrying out the IRQ fixup is called at host bridge driver
probe time, which just cannot take into account devices hotplugged after
the system has booted.
The introduction of map/swizzle function hooks in struct pci_host_bridge
allows us to define per-bridge map/swizzle functions that can be used at
device probe time in PCI core code to allocate IRQs for a given device
(through pci_assign_irq()).
Convert PCI host bridge initialization code to the
pci_scan_root_bus_bridge() API (that allows to pass a struct
pci_host_bridge with initialized map/swizzle pointers) and remove the
pci_fixup_irqs() call from arch code.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
The pci_fixup_irqs() function allocates IRQs for all PCI devices present in
a system; those PCI devices possibly belong to different PCI bus trees (and
possibly rooted at different host bridges) and may well be enabled (ie
probed and bound to a driver) by the time pci_fixup_irqs() is called when
probing a given host bridge driver.
Furthermore, current kernel code relying on pci_fixup_irqs() to assign
legacy PCI IRQs to devices does not work at all for hotplugged devices in
that the code carrying out the IRQ fixup is called at host bridge driver
probe time, which just cannot take into account devices hotplugged after
the system has booted.
The introduction of map/swizzle function hooks in struct pci_host_bridge
allows us to define per-bridge map/swizzle functions that can be used at
device probe time in PCI core code to allocate IRQs for a given device
(through pci_assign_irq()).
Convert PCI host bridge initialization code to the
pci_scan_root_bus_bridge() API (that allows to pass a struct
pci_host_bridge with initializedmap/swizzle pointers) and remove the
pci_fixup_irqs() call from arch code.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
The pci_fixup_irqs() function allocates IRQs for all PCI devices present in
a system; those PCI devices possibly belong to different PCI bus trees (and
possibly rooted at different host bridges) and may well be enabled (ie
probed and bound to a driver) by the time pci_fixup_irqs() is called when
probing a given host bridge driver.
Furthermore, current kernel code relying on pci_fixup_irqs() to assign
legacy PCI IRQs to devices does not work at all for hotplugged devices in
that the code carrying out the IRQ fixup is called at host bridge driver
probe time, which just cannot take into account devices hotplugged after
the system has booted.
The introduction of map/swizzle function hooks in struct pci_host_bridge
allows us to define per-bridge map/swizzle functions, that can be used at
device probe time in PCI core code to allocate IRQs for a given device
(through pci_assign_irq()).
Convert PCI host bridge initialization code to the
pci_scan_root_bus_bridge() API (that allows to pass a struct
pci_host_bridge with initialized map/swizzle pointers) and remove the
pci_fixup_irqs() call from arch code.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@imgtec.com>
The pci_fixup_irqs() function allocates IRQs for all PCI devices present in
a system; those PCI devices possibly belong to different PCI bus trees (and
possibly rooted at different host bridges) and may well be enabled (ie
probed and bound to a driver) by the time pci_fixup_irqs() is called when
probing a given host bridge driver.
Furthermore, current kernel code relying on pci_fixup_irqs() to assign
legacy PCI IRQs to devices does not work at all for hotplugged devices in
that the code carrying out the IRQ fixup is called at host bridge driver
probe time, which just cannot take into account devices hotplugged after
the system has booted.
The introduction of map/swizzle function hooks in struct pci_host_bridge
allows us to define per-bridge map/swizzle functions that can be used at
device probe time in PCI core code to allocate IRQs for a given device
(through pci_assign_irq()).
Convert PCI host bridge initialization code to the
pci_scan_root_bus_bridge() API (that allows to pass a struct
pci_host_bridge with initialized map/swizzle pointers) and remove the
pci_fixup_irqs() call from arch code.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
We're about to amend ACPI bus scan with DMI checks whether we're running
on a Mac to support Apple device properties in AML. The DMI checks are
performed for every single device, adding overhead for everything x86
that isn't Apple, which is the majority. Rafael and Andy therefore
request to perform the DMI match only once and cache the result.
Outside of ACPI various other Apple DMI checks exist and it seems
reasonable to use the cached value there as well. Rafael, Andy and
Darren suggest performing the DMI check in arch code and making it
available with a header in include/linux/platform_data/x86/.
To this end, add early_platform_quirks() to arch/x86/kernel/quirks.c
to perform the DMI check and invoke it from setup_arch(). Switch over
all existing Apple DMI checks, thereby fixing two deficiencies:
* They are now #defined to false on non-x86 arches and can thus be
optimized away if they're located in cross-arch code.
* Some of them only match "Apple Inc." but not "Apple Computer, Inc.",
which is used by BIOSes released between January 2006 (when the first
x86 Macs started shipping) and January 2007 (when the company name
changed upon introduction of the iPhone).
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Suggested-by: Darren Hart <dvhart@infradead.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The pci_fixup_irqs() function allocates IRQs for all PCI devices present in
a system; those PCI devices possibly belong to different PCI bus trees (and
possibly rooted at different host bridges) and may well be enabled (ie
probed and bound to a driver) by the time pci_fixup_irqs() is called when
probing a given host bridge driver.
Furthermore, current kernel code relying on pci_fixup_irqs() to assign
legacy PCI IRQs to devices does not work at all for hotplugged devices in
that the code carrying out the IRQ fixup is called at host bridge driver
probe time, which just cannot take into account devices hotplugged after
the system has booted.
The introduction of map/swizzle function hooks in struct pci_host_bridge
allows us to define per-bridge map/swizzle functions that can be used at
device probe time in PCI core code to allocate IRQs for a given device
(through pci_assign_irq()).
Convert PCI host bridge initialization code to the
pci_scan_root_bus_bridge() API (that allows to pass a struct
pci_host_bridge with initialized map/swizzle pointers) and remove the
pci_fixup_irqs() call from arch code.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
The pci_fixup_irqs() function allocates IRQs for all PCI devices present in
a system; those PCI devices possibly belong to different PCI bus trees (and
possibly rooted at different host bridges) and may well be enabled (ie
probed and bound to a driver) by the time pci_fixup_irqs() is called when
probing a given host bridge driver.
Furthermore, current kernel code relying on pci_fixup_irqs() to assign
legacy PCI IRQs to devices does not work at all for hotplugged devices in
that the code carrying out the IRQ fixup is called at host bridge driver
probe time, which just cannot take into account devices hotplugged after
the system has booted.
The introduction of map/swizzle function hooks in struct pci_host_bridge
allows us to define per-bridge map/swizzle functions that can be used at
device probe time in PCI core code to allocate IRQs for a given device
(through pci_assign_irq()).
Convert PCI host bridge initialization code to the
pci_scan_root_bus_bridge() API (that allows to pass a struct
pci_host_bridge with initialized map/swizzle pointers) and remove the
pci_fixup_irqs() call from arch code.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Currently many IRQ mapping functions and data structures use the __init and
__initdata optimisations. These result in the relevant functions being
innaccessible after boot time.
However for deferred IRQ assignment it is important to have access to these
functions at PCI device enable time.
Therefore, remove the optimisation from the relevant data structures and
functions to prepare for deferred IRQ assignment.
Signed-off-by: Matthew Minter <matt@masarand.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Pull power management fixes from Rafael Wysocki:
"These fix two cpufreq issues, one introduced recently and one related
to recent changes, fix cpufreq documentation, fix up recently added
code in the Thunderbolt driver and update runtime PM framework
documentation.
Specifics:
- Fix the handling of the scaling_cur_freq cpufreq policy attribute
on x86 systems with the MPERF/APERF registers present to make it
behave more as expected after recent changes (Rafael Wysocki).
- Drop a leftover callback from the intel_pstate driver which also
prevents the cpuinfo_cur_freq cpufreq policy attribute from being
incorrectly exposed when intel_pstate works in the active mode
(Rafael Wysocki).
- Add a missing piece describing the cpuinfo_cur_freq policy
attribute to cpufreq documentation (Rafael Wysocki).
- Fix up a recently added part of the Thunderbolt driver to avoid
aborting system suspends if its mailbox commands time out (Rafael
Wysocki).
- Update device runtime PM framework documentation to reflect the
current behavior of the code (Johan Hovold)"
* tag 'pm-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thunderbolt: icm: Ignore mailbox errors in icm_suspend()
cpufreq: x86: Make scaling_cur_freq behave more as expected
PM / runtime: Document new pm_runtime_set_suspended() constraint
cpufreq: docs: Add missing cpuinfo_cur_freq description
cpufreq: intel_pstate: Drop ->get from intel_pstate structure
------------[ cut here ]------------
WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
Call Trace:
vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
? kvm_arch_vcpu_load+0x62/0x230 [kvm]
kvm_vcpu_ioctl+0x340/0x700 [kvm]
? kvm_vcpu_ioctl+0x340/0x700 [kvm]
? __fget+0xfc/0x210
do_vfs_ioctl+0xa4/0x6a0
? __fget+0x11d/0x210
SyS_ioctl+0x79/0x90
do_syscall_64+0x8f/0x750
? trace_hardirqs_on_thunk+0x1a/0x1c
entry_SYSCALL64_slow_path+0x25/0x25
This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
means that tells the kernel to not make use of any IOAPICs that may be present
in the system.
Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
variable passed from vcpu_enter_guest() which means that the L0's userspace
requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
L0's userspace reqeusts an irq window) is true, so there is no interrupt which
L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
interrupt on exit" for the irq window requirement in this scenario.
This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
if there is no L1 requirement to inject an interrupt to L2.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
[Added code comment to make it obvious that the behavior is not correct.
We should do a userspace exit with open interrupt window instead of the
nested VM exit. This patch still improves the behavior, so it was
accepted as a (temporary) workaround.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Armada 38x SoCs along with legacy timer (time-armada-370-xp.c),
comprise generic Cortex-A9 global timer (arm_global_timer.c).
Enable its compilation. The system clocksource subsystem
will pick one of above two available ones in case the global
timer node is present in the device tree.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Since generic Cortex-A9 global timer is available after adding
it to compilation, enable its node in armada-38x.dtsi.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tests showed, that under certain conditions, the summary number of jiffies
spent on softirq/idle, which are counted by system statistics can be even
below 10% of expected value, resulting in false load presentation.
The issue was observed on the quad-core Marvell Armada 8k SoC, whose two
10G ports were bound into L2 bridge. Load was controlled by bidirectional
UDP traffic, produced by a packet generator. Under such condition,
the dominant load is softirq. With 100% single CPU occupation or without
any activity (all CPUs 100% idle), total number of jiffies is 10000 (2500
per each core) in 10s interval. Also with other kind of load this was
true.
However below a saturation threshold it was observed, that with CPU which
was occupied almost by softirqs only, the statistic were awkward. See
the mpstat output:
CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
all 0.00 0.00 0.13 0.00 0.00 0.55 0.00 0.00 0.00 99.32
0 0.00 0.00 0.00 0.00 0.00 23.08 0.00 0.00 0.00 76.92
1 0.00 0.00 0.40 0.00 0.00 0.00 0.00 0.00 0.00 99.60
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Above would mean basically no total load, debug CPU0 occupied in 25%.
Raw statistics, printed every 10s from /proc/stat unveiled a root
cause - summary idle/softirq jiffies on loaded CPU were below 200,
i.e. over 90% samples lost. All problems were gone after enabling
fine granulity IRQ time accounting.
This patch fixes possible wrong statistics processing by enabling
CONFIG_IRQ_TIME_ACCOUNTING for arm64 platfroms, which is by
default done on other architectures, e.g. x86 and arm. Tests
showed no noticeable performance penalty, nor stability impact.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
The ESPRESSObin board exposes one of the SDHCI interfaces
via J1 uSD slot. This patch enables it.
Tested-by: Miquel Raynal <miquel.raynal@free-electrons.com>
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Zbigniew Bodek <zbodek@gmail.com>
[gregory.clement@free-electrons.com: removed "no-1-8-v"]
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Since the introduction of the CP110 dt files, the sata node was
misplaced. Move it at the right place. Thanks to this, the files are
completely ordered.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
And another header file for which we can use the generic variant,
even though it doesn't look obvious at first glance.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
clang doesn't like s390 specific inline assembler constraints. These
are present in our arch specific uapi/asm/swab.h which again is
required by some ebpf test cases.
For current compiler versions the generic swab.h already makes use of
gcc's builtin functions. Therefore we can simply remove our own header
file and use the generic one.
This will generate worse code if used with compilers before gcc 4.8,
which has no __builtin_bswap16(); or before gcc v4.4, which has no
__builtin_bswap[32|64](). For these cases a C implementation fallback
would be used which generates more code, but is still correct (170KB
extra code for gcc 4.3 with performance_defconfig).
However given that we need (and want) to get rid of the inline
assemblies anyway in order to be able to use clang, the above is just
a minor drawback if old gcc compilers are used.
With current compilers there is close to zero difference, except for
three btrfs bit functions which generate more out-of-line code. The
generated code looks still correct and also uses the s390 specific
byteswap instructions.
Reported-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We have a whole pile of unused code to maintain the ACOP register,
allocate coprocessor PIDs and handle ACOP faults. This mechanism
was used for the HFI adapter on POWER7 which is dead and gone and
whose driver never went upstream. It was used on some A2 core based
stuff that also never saw the light of day.
Take out all that code.
There is still some POWER8 coprocessor code that uses icswx but it's
kernel only and thus doesn't use any of that infrastructure.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When hitting below a VM_GROWSDOWN vma (typically growing the stack),
we check whether it's a valid stack-growing instruction and we
check the distance to GPR1. This is largely open coded with lots
of comments, so move it out to a helper.
While at it, make store_update_sp a boolean.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If the first iteration returns VM_FAULT_MAJOR but the second
one doesn't, we fail to account the fault as a major fault.
This fixes it and brings the code in line with x86.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move out the code that sets FAULT_FLAG_WRITE so the block that check
access permissions can be extracted. While at it also set
FAULT_FLAG_INSTRUCTION which will be used for protection keys.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Do the check before we re-enable interrupts and clean the code
up a bit.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This has a page of comment explaining what's going on right in
the middle of do_page_fault() which makes things a bit hard to
follow. Move it to a helper instead. Also do the test earlier
as there's no point waiting until after we found the VMA.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
No need to break those lines, they aren't that long
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It makes do_page_fault() more readable. No functional change.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>