The common interrupt handler prologue macro and the bad_stack
trampolines include consecutive sequences of register saves, and some
register clears. Neaten such instances by expanding use of the SAVE_GPRS
macro and employing the ZEROIZE_GPR macro when appropriate.
Also simplify an invocation of SAVE_GPRS targetting all non-volatile
registers to SAVE_NVGPRS.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-7-rmclure@linux.ibm.com
Restoring the register state of the interrupted thread involves issuing
a large number of predictable loads to the kernel stack frame. Issue the
REST_GPR{,S} macros to clearly signal when this is happening, and bunch
together restores at the end of the interrupt handler where the saved
value is not consumed earlier in the handler code.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-6-rmclure@linux.ibm.com
Use the convenience macros for saving/clearing/restoring gprs in keeping
with syscall calling conventions. The plural variants of these macros
can store a range of registers for concision.
This works well when the user gpr value we are hoping to save is still
live. In the syscall interrupt handlers, user register state is
sometimes juggled between registers. Hold-off from issuing the SAVE_GPR
macro for applicable neighbouring lines to highlight the delicate
register save logic.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-5-rmclure@linux.ibm.com
Provide register zeroing macros, following the same convention as
existing register stack save/restore macros, to be used in later
change to concisely zero a sequence of consecutive gprs.
The resulting macros are called ZEROIZE_GPRS and ZEROIZE_NVGPRS, keeping
with the naming of the accompanying restore and save macros, and usage
of zeroize to describe this operation elsewhere in the kernel.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-4-rmclure@linux.ibm.com
This reverts commit 8875f47b76 ("powerpc/syscall: Save r3 in regs->orig_r3
").
Save caller's original r3 state to the kernel stackframe before entering
system_call_exception. This allows for user registers to be cleared by
the time system_call_exception is entered, reducing the influence of
user registers on speculation within the kernel.
Prior to this commit, orig_r3 was saved at the beginning of
system_call_exception. Instead, save orig_r3 while the user value is
still live in r3.
Also replicate this early save in 32-bit. A similar save was removed in
commit 6f76a01173 ("powerpc/syscall: implement system call entry/exit
logic in C for PPC32") when 32-bit adopted system_call_exception. Revert
its removal of orig_r3 saves.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-3-rmclure@linux.ibm.com
The asmlinkage macro has no special meaning in powerpc, and prior to
this patch is used sporadically on some syscall handler definitions. On
architectures that do not define asmlinkage, it resolves to extern "C"
for C++ compilers and a nop otherwise. The current invocations of
asmlinkage provide far from complete support for C++ toolchains, and so
the macro serves no purpose in powerpc.
Remove all invocations of asmlinkage in arch/powerpc. These incidentally
only occur in syscall definitions and prototypes.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-2-rmclure@linux.ibm.com
cpu_specs[] is full of #ifdefs depending on the different
types of CPU.
CPUs are mutually exclusive, it is therefore possible to split
cpu_specs[] into smaller more readable pieces.
Create cpu_specs_XXX.h that will each be dedicated on one
of the following mutually exclusive families:
- 40x
- 44x
- 47x
- 8xx
- e500
- book3s/32
- book3s/64
In book3s/32, the block for 603 has been moved in front in order
to not have two 604 blocks.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix CONFIG_47x to be CONFIG_PPC_47x, tweak some formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a44b865e0318286155273b10cdf524ab697928c1.1663606875.git.christophe.leroy@csgroup.eu
The only 64-bit Book3E CPUs we support require the selection
of CONFIG_PPC_E500MC.
However our Kconfig allows configurating a kernel that has 64-bit
Book3E support, but without CONFIG_PPC_E500MC enabled. Such a kernel
would never boot, it doesn't know about any CPUs.
To fix this, force CONFIG_PPC_E500MC to be selected whenever we are
building a 64-bit Book3E kernel.
And add a test to detect future situations where cpu_specs is empty.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ae5d8b8b3ccc346e61d2ec729767f92766273f0b.1663606875.git.christophe.leroy@csgroup.eu
Partition partition@20000 contains generic kernel image and it does not
have to be used only for rescue purposes. Partition partition@1c0000
contains bootable rescue system and partition partition@340000 contains
factory image/data for restoring to NAND. So change partition labels to
better fit their purpose by removing possible misleading substring "rootfs"
from these labels.
Fixes: 54c15ec3b7 ("powerpc: dts: Add DTS file for CZ.NIC Turris 1.x routers")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220830225500.8856-1-pali@kernel.org
Currently powerpc selects HAVE_EFFICIENT_UNALIGNED_ACCESS in all cases
but one. The exception is if the kernel is being built little endian and
explicitly targeted for Power7.
The combination of Power7 and little endian was never commercially
supported, or widely used. It was only ever possible on bare metal
machines, using unofficial firmware, or in qemu guests hosted on those
machines.
The bare metal firmware support for Power7 was removed in 2019, see
skiboot commit 16b7ae64 ("Remove POWER7 and POWER7+ support").
Little endian kernel builds were switched to target Power8 or later in
2018, in commit a73657ea19 ("powerpc/64: Add GENERIC_CPU support for
little endian"). Since then it's only been possible to boot a Power7/LE
kernel by explicitly building for Power7.
So drop the exception and always select HAVE_EFFICIENT_UNALIGNED_ACCESS.
If anyone does still have a Power7/LE machine it should hopefully
continue to boot, just with some performance penality, and if not they
can report a bug.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220916131523.319123-1-mpe@ellerman.id.au
CONFIG_ARCH_HAS_HUGEPD is used to tell core mm when huge page
directories are used.
When they are not used, no need to provide hugepd_t or is_hugepd(),
just rely on the core mm fallback definition.
For that, change core mm behaviour so that CONFIG_ARCH_HAS_HUGEPD
is used instead of indirect is_hugepd macro existence.
powerpc being the only user of huge page directories, there is no
impact on other architectures.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/da81462d93069bb90fe5e762dd3283a644318937.1662543243.git.christophe.leroy@csgroup.eu
Because 64-bit Book3S uses pgtable-nop4d.h, the P4D is folded into the
PGD. So P4D entries are actually PGD entries, or vice versa.
The other way to think of it is that the P4D is a single entry page
table below the PGD. Zero bits of the address are needed to index into
the P4D, therefore a P4D entry maps the same size address space as a PGD
entry.
As explained in the previous commit, there are no huge page sizes
supported directly at the PGD level on 64-bit Book3S, so there are also
no huge page sizes supported at the P4D level.
Therefore p4d_is_leaf() can never be true, so drop the definition and
fallback to the default implementation that always returns false.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220903123640.719846-2-mpe@ellerman.id.au
On powerpc there are two ways for huge pages to be represented in the
top level page table, aka PGD (Page Global Directory).
If the address space mapped by an individual PGD entry does not
correspond to a given huge page size, then the PGD entry points to a
non-standard page table, known as a "hugepd" (Huge Page Directory).
The hugepd contains some number of huge page PTEs sufficient to map the
address space with the given huge page size.
On the other hand, if the address space mapped by an individual PGD
entry does correspond exactly to a given huge page size, that PGD entry
is used to directly encode the huge page PTE in place. In this case the
pgd_huge() wrapper indicates to generic code that the PGD entry is
actually a huge page PTE.
This commit deals with the pgd_huge() case only, it does nothing with
respect to the hugepd case.
Over time the size of the virtual address space supported on powerpc has
increased several times, which means the location at which huge pages
can sit in the tree has also changed. There have also been new huge page
sizes added, with the introduction of the Radix MMU.
On Power9 and later with the Radix MMU, the largest huge page size in
any implementation is 1GB.
Since the introduction of Radix, 1GB entries have been supported at the
PUD level, with both 4K and 64K base page size. Radix has never had a
supported huge page size at the PGD level.
On Power8 or earlier, which uses the Hash MMU, or Power9 or later with
the Hash MMU enabled, the largest huge page size is 16GB.
Using the Hash MMU and a base page size of 4K, 16GB has never been a
supported huge page size at the PGD level, due to the geometry being
incompatible. The two supported huge page sizes (16M & 16GB) both use
the hugepd format.
Using the Hash MMU and a base page size of 64K, 16GB pages were
supported in the past at the PGD level.
However in commit ba95b5d035 ("powerpc/mm/book3s/64: Rework page table
geometry for lower memory usage") the page table layout was reworked to
shrink the size of the PGD.
As a result the 16GB page size now fits at the PUD level when using 64K
base page size.
Therefore there are no longer any supported configurations where
pgd_huge() can be true, so drop the definitions for pgd_huge(), and
fallback to the generic definition which is always false.
Fixes: ba95b5d035 ("powerpc/mm/book3s/64: Rework page table geometry for lower memory usage")
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220903123640.719846-1-mpe@ellerman.id.au
In interrupt_64.S, formerly entry_64.S, there are two toc entries
created for sys_call_table and compat_sys_call_table.
These are no longer used, since the system call entry was converted from
asm to C, so remove them.
Fixes: 68b34588e2 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220913124545.2817825-1-mpe@ellerman.id.au
Const function pointers by convention live in .data.rel.ro if they need
to be relocated. Now that .data.rel.ro is linked into the read-only
region, put them in the right section. This doesn't make much practical
difference, but it will make the C conversion of sys_call_table a
smaller change as far as linking goes.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220916040755.2398112-8-npiggin@gmail.com
powerpc has a number of read-only sections and tables that are put after
RO_DATA(). Move the __end_rodata symbol to cover these as well.
Setting memory to read-only at boot is done using __init_begin, change
that to use __end_rodata.
This makes is_kernel_rodata() exactly cover the read-only region, as
well as other things using __end_rodata (e.g., kernel/dma/debug.c).
Boot dmesg also prints the rodata size more accurately.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220916040755.2398112-2-npiggin@gmail.com
Currently __init_begin is used as the boundary for strict RWX between
executable/read-only text and data, and non-executable (after boot) code
and data.
But that's a little subtle, so add an explicit symbol to document that
the SRWX boundary lies there, and add a comment making it clear that
__init_begin must also begin there.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220916131422.318752-2-mpe@ellerman.id.au