On systems without any specific PMU driver support registered, running
'perf record' with —intr-regs will crash ( perf record -I <workload> ).
The relevant portion from crash logs and Call Trace:
Unable to handle kernel paging request for data at address 0x00000068
Faulting instruction address: 0xc00000000013eb18
Oops: Kernel access of bad area, sig: 11 [#1]
CPU: 2 PID: 13435 Comm: kill Kdump: loaded Not tainted 4.18.0-193.el8.ppc64le #1
NIP: c00000000013eb18 LR: c000000000139f2c CTR: c000000000393d80
REGS: c0000004a07ab4f0 TRAP: 0300 Not tainted (4.18.0-193.el8.ppc64le)
NIP [c00000000013eb18] is_sier_available+0x18/0x30
LR [c000000000139f2c] perf_reg_value+0x6c/0xb0
Call Trace:
[c0000004a07ab770] [c0000004a07ab7c8] 0xc0000004a07ab7c8 (unreliable)
[c0000004a07ab7a0] [c0000000003aa77c] perf_output_sample+0x60c/0xac0
[c0000004a07ab840] [c0000000003ab3f0] perf_event_output_forward+0x70/0xb0
[c0000004a07ab8c0] [c00000000039e208] __perf_event_overflow+0x88/0x1a0
[c0000004a07ab910] [c00000000039e42c] perf_swevent_hrtimer+0x10c/0x1d0
[c0000004a07abc50] [c000000000228b9c] __hrtimer_run_queues+0x17c/0x480
[c0000004a07abcf0] [c00000000022aaf4] hrtimer_interrupt+0x144/0x520
[c0000004a07abdd0] [c00000000002a864] timer_interrupt+0x104/0x2f0
[c0000004a07abe30] [c0000000000091c4] decrementer_common+0x114/0x120
When perf record session is started with "-I" option, capturing registers
on each sample calls is_sier_available() to check for the
SIER (Sample Instruction Event Register) availability in the platform.
This function in core-book3s accesses 'ppmu->flags'. If a platform specific
PMU driver is not registered, ppmu is set to NULL and accessing its
members results in a crash. Fix the crash by returning false in
is_sier_available() if ppmu is not set.
Fixes: 333804dc3b ("powerpc/perf: Update perf_regs structure to include SIER")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1606185640-1720-1-git-send-email-atrajeev@linux.vnet.ibm.com
Use bcl 20,31,0f rather than plain bl to avoid unbalancing the link
stack.
Update the code to use REL16 relocs, available for ppc64 in 2009 (and
ppc32 in 2005).
Signed-off-by: Alan Modra <amodra@gmail.com>
[mpe: Incorporate more detail into the change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The clang toolchain treats inline assembly a bit differently than
straight assembly code. In particular, inline assembly doesn't have
the complete context available to resolve expressions. This is
intentional to avoid divergence in the resulting assembly code.
We can work around this issue by borrowing a workaround done for ARM,
i.e. not directly testing the labels themselves, but by moving the
current output pointer by a value that should always be zero. If this
value is not null, then we will trigger a backward move, which is
explicitly forbidden.
Signed-off-by: Bill Wendling <morbo@google.com>
[mpe: Put it in a macro and only do the workaround for clang]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201120224034.191382-4-morbo@google.com
The "-z notext" flag disables reporting an error if DT_TEXTREL is set.
ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against
symbol: _start in readonly segment; recompile object files with
-fPIC or pass '-Wl,-z,notext' to allow text relocations in the
output
>>> defined in
>>> referenced by crt0.o:(.text+0x8) in archive arch/powerpc/boot/wrapper.a
The BFD linker disables this by default (though it's configurable in
current versions). LLD enables this by default. So we add the flag to
keep LLD from emitting the error.
Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201120224034.191382-2-morbo@google.com
Normally all read-only sections precede SHF_WRITE sections. .dynamic
and .got have the SHF_WRITE flag; .dynamic probably because of
DT_DEBUG. LLD emits an error when this happens, so use "-z rodynamic"
to mark .dynamic as read-only.
Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201118223910.2711337-1-morbo@google.com
gpr_get() does membuf_write() twice to override pt_regs->msr in
between. We can call membuf_write() once and change ->msr in the
kernel buffer, this simplifies the code and the next fix.
The patch adds a new simple helper, membuf_at(offs), it returns the
new membuf which can be safely used after membuf_write().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[mpe: Fixup some minor whitespace issues noticed by Christophe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201119160221.GA5188@redhat.com
Using DECLARE_STATIC_KEY_FALSE needs linux/jump_table.h.
Otherwise the build fails with eg:
arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: data definition has no type or storage class
66 | DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
Fixes: 9a32a7e78b ("powerpc/64s: flush L1D after user accesses")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[mpe: Massage change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201123184016.693fe464@canb.auug.org.au
From Daniel's cover letter:
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern.
This patch series flushes the L1 cache on kernel entry (patch 2) and after the
kernel performs any user accesses (patch 3). It also adds a self-test and
performs some related cleanups.
pseries|pnv_setup_rfi_flush already does the count cache flush setup, and
we just added entry and uaccess flushes. So the name is not very accurate
any more. In both platforms we then also immediately setup the STF flush.
Rename them to _setup_security_mitigations and fold the STF flush in.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For simplicity in backporting, the original entry_flush test contained
a lot of duplicated code from the rfi_flush test. De-duplicate that code.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add a test modelled on the RFI flush test which counts the number
of L1D misses doing a simple syscall with the entry flush on and off.
For simplicity of backporting, this test duplicates a lot of code from
rfi_flush. We clean that up in the next patch.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In kup.h we currently include kup-radix.h for all 64-bit builds, which
includes Book3S and Book3E. The latter doesn't make sense, Book3E
never uses the Radix MMU.
This has worked up until now, but almost by accident, and the recent
uaccess flush changes introduced a build breakage on Book3E because of
the bad structure of the code.
So disentangle things so that we only use kup-radix.h for Book3S. This
requires some more stubs in kup.h and fixing an include in
syscall_64.c.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache after user accesses.
This is part of the fix for CVE-2020-4788.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache on kernel entry.
This is part of the fix for CVE-2020-4788.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We are about to add an entry flush. The rfi (exit) flush test measures
the number of L1D flushes over a syscall with the RFI flush enabled and
disabled. But if the entry flush is also enabled, the effect of enabling
and disabling the RFI flush is masked.
If there is a debugfs entry for the entry flush, disable it during the RFI
flush and restore it later.
Reported-by: Spoorthy S <spoorts2@in.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Let's use alloc_contig_pages() for allocating memory and remove the
linear mapping manually via arch_remove_linear_mapping(). Mark all pages
PG_offline, such that they will definitely not get touched - e.g.,
when hibernating. When freeing memory, try to revert what we did.
The original idea was discussed in:
https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915b6e@redhat.com
This is similar to CONFIG_DEBUG_PAGEALLOC handling on other
architectures, whereby only single pages are unmapped from the linear
mapping. Let's mimic what memory hot(un)plug would do with the linear
mapping.
We now need MEMORY_HOTPLUG and CONTIG_ALLOC as dependencies. Add a TODO
that we want to use __GFP_ZERO for clearing once alloc_contig_pages()
understands that.
Tested with in QEMU/TCG with 10 GiB of main memory:
[root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
[ 105.903043][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
[root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
[ 145.042493][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages
[ 145.049019][ T1080] memtrace: Freed trace memory back on node 0
[ 145.333960][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
[root@localhost ~]# echo 0x80000000 > /sys/kernel/debug/powerpc/memtrace/enable
[ 213.606916][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages
[ 213.613855][ T1080] memtrace: Freed trace memory back on node 0
[ 214.185094][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
[root@localhost ~]# echo 0x100000000 > /sys/kernel/debug/powerpc/memtrace/enable
[ 234.874872][ T1080] radix-mmu: Mapped 0x0000000080000000-0x0000000100000000 with 64.0 KiB pages
[ 234.886974][ T1080] memtrace: Freed trace memory back on node 0
[ 234.890153][ T1080] memtrace: Failed to allocate trace memory on node 0
[root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
[ 259.490196][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
I also made sure allocated memory is properly zeroed.
Note 1: We currently won't be allocating from ZONE_MOVABLE - because our
pages are not movable. However, as we don't run with any memory
hot(un)plug mechanism around, we could make an exception to
increase the chance of allocations succeeding.
Note 2: PG_reserved isn't sufficient. E.g., kernel_page_present() used
along PG_reserved in hibernation code will always return "true"
on powerpc, resulting in the pages getting touched. It's too
generic - e.g., indicates boot allocations.
Note 3: For now, we keep using memory_block_size_bytes() as minimum
granularity.
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-9-david@redhat.com
This code currently relies on mem_hotplug_begin()/mem_hotplug_done() -
create_section_mapping()/remove_section_mapping() implementations
cannot tollerate getting called concurrently.
Let's prepare for callers (memtrace) not holding any such locks (and
don't force them to mess with memory hotplug locks).
Other parts in these functions don't seem to rely on external locking.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-5-david@redhat.com
We want to stop abusing memory hotplug infrastructure in memtrace code
to perform allocations and remove the linear mapping. Instead we will use
alloc_contig_pages() and remove the linear mapping manually.
Let's factor out creating/removing the linear mapping into
arch_create_linear_mapping() / arch_remove_linear_mapping() - so in the
future, we might be able to have whole arch_add_memory() /
arch_remove_memory() be implemented in common code.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-4-david@redhat.com
It's very easy to crash the kernel right now by simply trying to
enable memtrace concurrently, hammering on the "enable" interface
loop.sh:
#!/bin/bash
dmesg --console-off
while true; do
echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
done
[root@localhost ~]# loop.sh &
[root@localhost ~]# loop.sh &
Resulting quickly in a kernel crash. Let's properly protect using a
mutex.
Fixes: 9d5171a8f2 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Cc: stable@vger.kernel.org# v4.14+
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-3-david@redhat.com
In power10 DD1, there is an issue where the SIAR (Sampled Instruction
Address Register) is not latching to the sampled address during random
sampling. This results in value of 0s in the SIAR. Add a check to use
regs->nip when SIAR is zero.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021085329.384535-5-maddy@linux.ibm.com
While setting the processor mode for any sample, perf_get_misc_flags()
expects the privilege level to differentiate the userspace and kernel
address. On power10 DD1, there is an issue that causes MSR_HV MSR_PR
bits of Sampled Instruction Event Register (SIER) not to be set for
marked events. Hence add a check to use the address in SIAR (Sampled
Instruction Address Register) to identify the privilege level.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021085329.384535-3-maddy@linux.ibm.com
In power10 DD1, there is an issue that causes the SIAR_VALID bit of
the SIER (Sampled Instruction Event Register) to not be set. But the
SIAR_VALID bit is used for fetching the instruction address from the
SIAR (Sampled Instruction Address Register), and marked events are
sampled only if the SIAR_VALID bit is set. So drop the check for
SIAR_VALID and return true always incase of power10 DD1.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021085329.384535-2-maddy@linux.ibm.com
This reverts commit a0ff72f9f5.
Since the commit b015f6bc95 ("powerpc/pseries: Add cpu DLPAR
support for drc-info property"), the 'cpu_drcs' wouldn't be double
freed when the 'cpus' node not found.
So we needn't apply this patch, otherwise, the memory will be leaked.
Fixes: a0ff72f9f5 ("powerpc/pseries/hotplug-cpu: Remove double free in error path")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
[mpe: Caused by me applying a patch to a function that had changed in the interim]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111020752.1686139-1-zhangxiaoxu5@huawei.com
read_user_stack_slow that walks user address translation by hand is
only required on hash, because a hash fault can not be serviced from
"NMI" context (to avoid re-entering the hash code) so the user stack
can be mapped into Linux page tables but not accessible by the CPU.
Radix MMU mode does not have this restriction. A page fault failure
would indicate the page is not accessible via get_user_pages either,
so avoid this on radix.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111120151.3150658-1-npiggin@gmail.com
On 8xx, we get the following features:
[ 0.000000] cpu_features = 0x0000000000000100
[ 0.000000] possible = 0x0000000000000120
[ 0.000000] always = 0x0000000000000000
This is not correct. As CONFIG_PPC_8xx is mutually exclusive with all
other configurations, the three lines should be equal.
The problem is due to CPU_FTRS_GENERIC_32 which is taken when
CONFIG_BOOK3S_32 is NOT selected. This CPU_FTRS_GENERIC_32 is
pointless because there is no generic configuration supporting
all 32 bits but book3s/32.
Remove this pointless generic features definition to unbreak the
calculation of 'possible' features and 'always' features.
Fixes: 76bc080ef5 ("[POWERPC] Make default cputable entries reflect selected CPU family")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/76a85f30bf981d1aeaae00df99321235494da254.1604426550.git.christophe.leroy@csgroup.eu
Commit 7053f80d96 ("powerpc/64: Prevent stack protection in early
boot") introduced a couple of uses of __attribute__((optimize)) with
function scope, to disable the stack protector in some early boot
code.
Unfortunately, and this is documented in the GCC man pages [0],
overriding function attributes for optimization is broken, and is only
supported for debug scenarios, not for production: the problem appears
to be that setting GCC -f flags using this method will cause it to
forget about some or all other optimization settings that have been
applied.
So the only safe way to disable the stack protector is to disable it
for the entire source file.
[0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
Fixes: 7053f80d96 ("powerpc/64: Prevent stack protection in early boot")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
[mpe: Drop one remaining use of __nostackprotector, reported by snowpatch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028080433.26799-1-ardb@kernel.org
I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
The eeh-basic test got its own 60 seconds timeout (defined in commit
414f50434a "selftests/eeh: Bump EEH wait time to 60s") per breakable
device.
And we have discovered that the number of breakable devices varies
on different hardware. The device recovery time ranges from 0 to 35
seconds. In our test pool it will take about 30 seconds to run on a
Power8 system that with 5 breakable devices, 60 seconds to run on a
Power9 system that with 4 breakable devices.
Extend the timeout setting in the kselftest framework to 5 minutes
to give it a chance to finish.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023024539.9512-1-po-hsu.lin@canonical.com
Currently the clang build of corenet64_smp_defconfig fails with:
arch/powerpc/platforms/85xx/corenet_generic.c:210:1: error:
attribute declaration must precede definition
machine_arch_initcall(corenet_generic, corenet_gen_publish_devices);
Fix it by moving the initcall definition prior to the machine
definition, and directly below the function it calls, which is the
usual style anyway.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023020838.3274226-1-mpe@ellerman.id.au
powerpc used to set the PTE specific flags in set_pte_at(). That is
different from other architectures. To be consistent with other
architectures powerpc updated pfn_pte() to set _PAGE_PTE in commit
379c926d63 ("powerpc/mm: move setting pte specific flags to
pfn_pte")
That commit didn't do the same for pfn_pmd() because we expect
pmd_mkhuge() to do that. But as per Linus that is a bad rule:
The rule that you must use "pmd_mkhuge()" seems _completely_ wrong.
The only valid use to ever make a pmd out of a pfn is to make a
huge-page.
Hence update pfn_pmd() to set _PAGE_PTE.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201022091115.39568-1-aneesh.kumar@linux.ibm.com
fls() and fls64() are using __builtin_ctz() and _builtin_ctzll().
On powerpc, those builtins trivially use ctlzw and ctlzd power
instructions.
Allthough those instructions provide the expected result with
input argument 0, __builtin_ctz() and __builtin_ctzll() are
documented as undefined for value 0.
The easiest fix would be to use fls() and fls64() functions
defined in include/asm-generic/bitops/builtin-fls.h and
include/asm-generic/bitops/fls64.h, but GCC output is not optimal:
00000388 <testfls>:
388: 2c 03 00 00 cmpwi r3,0
38c: 41 82 00 10 beq 39c <testfls+0x14>
390: 7c 63 00 34 cntlzw r3,r3
394: 20 63 00 20 subfic r3,r3,32
398: 4e 80 00 20 blr
39c: 38 60 00 00 li r3,0
3a0: 4e 80 00 20 blr
000003b0 <testfls64>:
3b0: 2c 03 00 00 cmpwi r3,0
3b4: 40 82 00 1c bne 3d0 <testfls64+0x20>
3b8: 2f 84 00 00 cmpwi cr7,r4,0
3bc: 38 60 00 00 li r3,0
3c0: 4d 9e 00 20 beqlr cr7
3c4: 7c 83 00 34 cntlzw r3,r4
3c8: 20 63 00 20 subfic r3,r3,32
3cc: 4e 80 00 20 blr
3d0: 7c 63 00 34 cntlzw r3,r3
3d4: 20 63 00 40 subfic r3,r3,64
3d8: 4e 80 00 20 blr
When the input of fls(x) is a constant, just check x for nullity and
return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction
directly.
For fls64() on PPC64, do the same but with __builtin_clzll() and
cntlzd instruction. On PPC32, lets take the generic fls64() which
will use our fls(). The result is as expected:
00000388 <testfls>:
388: 7c 63 00 34 cntlzw r3,r3
38c: 20 63 00 20 subfic r3,r3,32
390: 4e 80 00 20 blr
000003a0 <testfls64>:
3a0: 2c 03 00 00 cmpwi r3,0
3a4: 40 82 00 10 bne 3b4 <testfls64+0x14>
3a8: 7c 83 00 34 cntlzw r3,r4
3ac: 20 63 00 20 subfic r3,r3,32
3b0: 4e 80 00 20 blr
3b4: 7c 63 00 34 cntlzw r3,r3
3b8: 20 63 00 40 subfic r3,r3,64
3bc: 4e 80 00 20 blr
Fixes: 2fcff790dc ("powerpc: Use builtin functions for fls()/__fls()/fls64()")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/348c2d3f19ffcff8abe50d52513f989c4581d000.1603375524.git.christophe.leroy@csgroup.eu
The only thing keeping the cpu_setup() and cpu_restore() functions
used in the cputable entries for Power7, Power8, Power9 and Power10 in
assembly was cpu_restore() being called before there was a stack in
generic_secondary_smp_init(). Commit ("powerpc/64: Set up a kernel
stack for secondaries before cpu_restore()") means that it is now
possible to use C.
Rewrite the functions in C so they are a little bit easier to read.
This is not changing their functionality.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
[mpe: Tweak copyright and authorship notes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201014072837.24539-2-jniethe5@gmail.com