Power10 hw has multiple DAWRs but hw doesn't tell which DAWR caused
the exception. So we have a sw logic to detect that in hw_breakpoint.c.
But hw_breakpoint.c gets compiled only with CONFIG_HAVE_HW_BREAKPOINT=Y.
Move DAWR detection logic outside of hw_breakpoint.c so that it can be
reused when CONFIG_HAVE_HW_BREAKPOINT is not set.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200902042945.129369-5-ravi.bangoria@linux.ibm.com
When kernel is compiled with CONFIG_HAVE_HW_BREAKPOINT=N, user can
still create watchpoint using PPC_PTRACE_SETHWDEBUG, with limited
functionalities. But, such watchpoints are never firing because of
the missing privilege settings. Fix that.
It's safe to set HW_BRK_TYPE_PRIV_ALL because we don't really leak
any kernel address in signal info. Setting HW_BRK_TYPE_PRIV_ALL will
also help to find scenarios when kernel accesses user memory.
Reported-by: Pedro Miraglia Franco de Carvalho <pedromfc@linux.ibm.com>
Suggested-by: Pedro Miraglia Franco de Carvalho <pedromfc@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200902042945.129369-4-ravi.bangoria@linux.ibm.com
On p10 predecessors, watchpoint with quadword access is compared at
quadword length. If the watch range is doubleword or less than that
in a first half of quadword aligned 16 bytes, and if there is any
unaligned quadword access which will access only the 2nd half, the
handler should consider it as extraneous and emulate/single-step it
before continuing.
Fixes: 74c6881019 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint")
Reported-by: Pedro Miraglia Franco de Carvalho <pedromfc@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200902042945.129369-2-ravi.bangoria@linux.ibm.com
POWER secure guests (i.e., guests which use the Protected Execution
Facility) need to use SWIOTLB to be able to do I/O with the
hypervisor, but they don't need the SWIOTLB memory to be in low
addresses since the hypervisor doesn't have any addressing limitation.
This solves a SWIOTLB initialization problem we are seeing in secure
guests with 128 GB of RAM: they are configured with 4 GB of
crashkernel reserved memory, which leaves no space for SWIOTLB in low
addresses.
To do this, we use mostly the same code as swiotlb_init(), but
allocate the buffer using memblock_alloc() instead of
memblock_alloc_low().
Fixes: 2efbc58f15 ("powerpc/pseries/svm: Force SWIOTLB for secure guests")
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200818221126.391073-1-bauerman@linux.ibm.com
When we added the VDSO32 kconfig symbol, which controls building of
the 32-bit VDSO, we made it depend on CPU_BIG_ENDIAN (for 64-bit).
That was because back then COMPAT was always enabled for 64-bit, so
depending on it would have left the 32-bit VDSO always enabled, which
we didn't want.
But since then we have made COMPAT selectable, and off by default for
ppc64le, so VDSO32 should really depend on that.
For most people this makes no difference, none of the defconfigs
change, it's only if someone is building ppc64le with COMPAT=y, they
will now also get VDSO32. If they've enabled COMPAT in order to run
32-bit binaries they presumably also want the 32-bit VDSO.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20200908125850.407939-1-mpe@ellerman.id.au
The newly introduced 'perf_stats' attribute uses the default access
mode of 0444, allowing non-root users to access performance stats of
an nvdimm and potentially force the kernel into issuing a large number
of expensive hypercalls. Since the information exposed by this
attribute cannot be cached it is better to ward off access to this
attribute from users who don't need to access to these performance
statistics.
Hence update the access mode of 'perf_stats' attribute to be only
readable by root users.
Fixes: 2d02bf835e ("powerpc/papr_scm: Fetch nvdimm performance stats from PHYP")
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200907110540.21349-1-vaibhav@linux.ibm.com
The ISA v3.1 the copy-paste facility has a new memory move functionality
which allows the copy buffer to be pasted to domestic memory (RAM) as
opposed to foreign memory (accelerator).
This means the POWER9 trick of avoiding the cp_abort on context switch if
the process had not mapped foreign memory does not work on POWER10. Do the
cp_abort unconditionally there.
KVM must also cp_abort on guest exit to prevent copy buffer state leaking
between contexts.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200825075535.224536-1-npiggin@gmail.com
It's not done anything for a long time. Save the percpu variable, and
emit a warning to remind users to not expect it to do anything.
This uses pr_warn_once instead of pr_warn_ratelimit as testing
'ppc64_cpu --smt=off' on a 24 core / 4 SMT system showed the warning
to be noisy, as the online/offline loop is slow.
Fixes: 3fa8cad82b ("powerpc/pseries/cpuidle: smt-snooze-delay cleanup.")
Cc: stable@vger.kernel.org # v3.14
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200902000012.3440389-1-joel@jms.id.au
Often the firmware will guard out cores after a crash. This often
undesirable, and is not immediately noticeable.
This adds an informative message when a CPU device tree nodes are
marked bad in the device tree.
Signed-off-by: Joel Stanley <joel@jms.id.au>
[mpe: Use an eye-catcher that's less likely to get us in trouble]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190801051630.5804-1-joel@jms.id.au
On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
default DMA window for the device, before attempting to configure a DDW,
in order to make the maximum resources available for the next DDW to be
created.
This is a requirement for using DDW on devices in which hypervisor
allows only one DMA window.
If setting up a new DDW fails anywhere after the removal of this
default DMA window, it's needed to restore the default DMA window.
For this, an implementation of ibm,reset-pe-dma-windows rtas call is
needed:
Platforms supporting the DDW option starting with LoPAR level 2.7 implement
ibm,ddw-extensions. The first extension available (index 2) carries the
token for ibm,reset-pe-dma-windows rtas call, which is used to restore
the default DMA window for a device, if it has been deleted.
It does so by resetting the TCE table allocation for the PE to it's
boot time value, available in "ibm,dma-window" device tree node.
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200805030455.123024-5-leobras.c@gmail.com
>From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of
outputs from "ibm,query-pe-dma-windows" go from 5 to 6.
This change of output size is meant to expand the address size of
largest_available_block PE TCE from 32-bit to 64-bit, which ends up
shifting page_size and migration_capable.
This ends up requiring the update of
ddw_query_response->largest_available_block from u32 to u64, and manually
assigning the values from the buffer into this struct, according to
output size.
Also, a routine was created for helping reading the ddw extensions as
suggested by LoPAR: First reading the size of the extension array from
index 0, checking if the property exists, and then returning it's value.
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200805030455.123024-3-leobras.c@gmail.com
As of commit bdc48fa11e, scripts/checkpatch.pl now has a default line
length warning of 100 characters. The powerpc wrapper script was using
a length of 90 instead of 80 in order to make checkpatch less
restrictive, but now it's making it more restrictive instead.
I think it makes sense to just use the default value now.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200828020542.393022-1-ruscur@russell.cc
The signal handler in the alignment handler self test has the ability
to jump over the instruction that triggered the signal. It does this
by incrementing the PT_NIP in the user context by 4. If it were a
prefixed instruction this will mean that the suffix is then executed
which is incorrect. Instead check if the major opcode indicates a
prefixed instruction (e.g. it is 1) and if so increment PT_NIP by 8.
If ISA v3.1 is not available treat it as a word instruction even if
the major opcode is 1.
Fixes: 620a6473df ("selftests/powerpc: Add prefixed loads/stores to alignment_handler test")
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
[mpe: Fix 32-bit build, rename haveprefixes to prefixes_enabled]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200824131231.14008-1-jniethe5@gmail.com
The last caller was removed in 2014 in commit fb5a515704 ("powerpc:
Remove platforms/wsp and associated pieces").
As Jordan noticed even though there are no callers, the code above in
fsl_secondary_thread_init() falls through into
generic_secondary_thread_init(). So we can remove the _GLOBAL but not
the body of the function.
However because fsl_secondary_thread_init() is inside #ifdef
CONFIG_PPC_BOOK3E, we can never reach the body of
generic_secondary_thread_init() unless CONFIG_PPC_BOOK3E is enabled,
so we can wrap the whole thing in a single #ifdef.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200819015704.1976364-1-mpe@ellerman.id.au
If we're running on a CPU without VMX/VSX then don't touch them. This
is fragile, the compiler could spill a VMX/VSX register and break the
test anyway. But in practice it seems to work, ie. the test runs to
completion on a system without VSX with this change.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200819015727.1977134-8-mpe@ellerman.id.au
This test creates some threads, which write to TM SPRs, and then makes
sure the registers maintain the correct values across context switches
and contention with other threads.
But currently the test finishes almost instantaneously, which reduces
the chance of it hitting an interesting condition.
So increase the number of loops, so it runs a bit longer, though still
less than 2s on a Power8.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200813013445.686464-3-mpe@ellerman.id.au
This test tries to set affinity to CPUs that don't exist, especially
if the set of online CPUs doesn't start at 0.
But there's no real reason for it to use setaffinity in the first
place, it's just trying to create lots of threads to cause contention.
So drop the setaffinity entirely.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200813013445.686464-2-mpe@ellerman.id.au
Commit d947fb4c96 ("cpuidle: pseries: Fixup exit latency for
CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
of the Extended CEDE states advertised by the platform. The values
advertised by the platform are in timebase ticks. However the cpuidle
framework requires the latency values in microseconds.
If the tb-ticks value advertised by the platform correspond to a value
smaller than 1us, during the conversion from tb-ticks to microseconds,
in the current code, the result becomes zero. This is incorrect as it
puts a CEDE state on par with the snooze state.
This patch fixes this by rounding up the result obtained while
converting the latency value from tb-ticks to microseconds. It also
prints a warning in case we discover an extended-cede state with
wakeup latency to be 0. In such a case, ensure that CEDE(0) has a
non-zero wakeup latency.
Fixes: d947fb4c96 ("cpuidle: pseries: Fixup exit latency for CEDE(0)")
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1599125247-28488-1-git-send-email-ego@linux.vnet.ibm.com
There are 2 problems with it:
1. "<" vs expected "<<"
2. the shift number is an IOMMU page number mask, not an address
mask as the IOMMU page shift is missing.
This did not hit us before f1565c24b5 ("powerpc: use the generic
dma_ops_bypass mode") because we had additional code to handle bypass
mask so this chunk (almost?) never executed.However there were
reports that aacraid does not work with "iommu=nobypass".
After f1565c24b5, aacraid (and probably others which call
dma_get_required_mask() before setting the mask) was unable to enable
64bit DMA and fall back to using IOMMU which was known not to work,
one of the problems is double free of an IOMMU page.
This fixes DMA for aacraid, both with and without "iommu=nobypass" in
the kernel command line. Verified with "stress-ng -d 4".
Fixes: 6a5c7be5e4 ("powerpc: Override dma_get_required_mask by platform hook and ops")
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200908015106.79661-1-aik@ozlabs.ru
This reverts commit f2af201002.
It added a usage of cc-ldoption, but cc-ldoption was removed in commit
055efab312 ("kbuild: drop support for cc-ldoption").
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, using llvm-objtool, this script just silently succeeds without
actually do the intended checking. So this updates it to work properly.
Firstly, llvm-objdump does not add target symbol names to the end
of branches in its asm output, so we have to drop the branch to
__start_initialization_multiplatform using its address.
Secondly, v9 and 10 specify branch targets as .+<offset>, so we convert
those to actual addresses.
Thirdly, v10 and 11 error out on a vmlinux if given the -R option
complaining that it is "not a dynamic object". The -R does not make
any difference to the asm output, so remove it.
Lastly, v11 produces asm that is very similar to Gnu objtool (at least
as far as branches are concerned), so no further changes are necessary
to make it work.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200812081036.7969-3-sfr@canb.auug.org.au