For historical reasons, we leave the top 16 bytes of our task and IRQ
stacks unused, a practice used to ensure that the SP can always be
masked to find the base of the current stack (historically, where
thread_info could be found).
However, this is not necessary, as:
* When an exception is taken from a task stack, we decrement the SP by
S_FRAME_SIZE and stash the exception registers before we compare the
SP against the task stack. In such cases, the SP must be at least
S_FRAME_SIZE below the limit, and can be safely masked to determine
whether the task stack is in use.
* When transitioning to an IRQ stack, we'll place a dummy frame onto the
IRQ stack before enabling asynchronous exceptions, or executing code
we expect to trigger faults. Thus, if an exception is taken from the
IRQ stack, the SP must be at least 16 bytes below the limit.
* We no longer mask the SP to find the thread_info, which is now found
via sp_el0. Note that historically, the offset was critical to ensure
that cpu_switch_to() found the correct stack for new threads that
hadn't yet executed ret_from_fork().
Given that, this initial offset serves no purpose, and can be removed.
This brings us in-line with other architectures (e.g. x86) which do not
rely on this masking.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[Mark: rebase, kill THREAD_START_SP, commit msg additions]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Our __die() implementation tries to dump the stack memory, in addition
to a backtrace, which is problematic.
For contemporary 16K stacks, this can be a lot of data, which can take a
long time to dump, and can push other useful context out of the kernel's
printk ringbuffer (and/or a user's scrollback buffer on an attached
console).
Additionally, the code implicitly assumes that the SP is on the task's
stack, and tries to dump everything between the SP and the highest task
stack address. When the SP points at an IRQ stack (or is corrupted),
this makes the kernel attempt to dump vast amounts of VA space. With
vmap'd stacks, this may result in erroneous accesses to peripherals.
This patch removes the memory dump, leaving us to rely on the backtrace,
and other means of dumping stack memory such as kdump.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Add NAND controller node to LD4, Pro4, sLD8, Pro5, and PXs2.
Set up pinctrl to enable 2 chip select lines except Pro4. The CS1
for Pro4 is multiplexed with other peripherals such as UART2, so
I did not enable it.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Add support for the iommu_device_register interface to make
the s390 hardware iommus visible to the iommu core and in
sysfs.
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull "Allwinner defconfig changes for 4.14" from Chen-Yu Tsai:
Sunxi_defconfig is refreshed and various power supply and ADC drivers
of the AXP PMICs have been enabled.
* tag 'sunxi-defconfig-for-4.14' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm: sunxi: Add AXP20X_ADC
arm: sunxi: Add additional power supplies
arm: sunxi: refresh the defconfig
Pull "ARM64: hisilicon: defconfig updates for 4.14" from Wei Xu:
- Enable Kirin PCIe host, hi6421v530 mfd and regulator,
syscon reboot mode, serdev bus, OP-TEE and K3 DMA support
for hikey and hikey960
- Enable pcie based sas controller support for hip08 SoC
* tag 'hisi-defconfig-for-4.14' of git://github.com/hisilicon/linux-hisi:
arm64: defconfig: enable DMA driver for hi3660
arm64: defconfig: enable OP-TEE
arm64: defconfig: enable support for serial port connected device
arm64: defconfig: enable CONFIG_SYSCON_REBOOT_MODE
arm64: defconfig: enable support hi6421v530 PMIC
arm64: defconfig: enable Kirin PCIe
arm64: defconfig: enable SCSI_HISI_SAS_PCI
Add the arm64 crypto drivers that have been added over the past
couple of kernel releases to its defconfig as modules.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull "mvebu arm64 for 4.14 (part 1)" from Gregory CLEMENT:
Enabling nop-xceiv PHY driver in the defconfig, needed for USB support
on A8K SoC based board.
Enabling fine-grained task level IRQ time accounting for ARMv8 as it was
already done for x86
* tag 'mvebu-arm64-4.14-1' of git://git.infradead.org/linux-mvebu:
arm64: defconfig: enable fine-grained task level IRQ time accounting
arm64: defconfig: enable nop-xceiv PHY driver
Pull "Renesas ARM64 Based SoC Defconfig Updates for v4.14" from Simon Horman:
* compile ak4613 and renesas sound as modules
This is intended to reduce the size of a kernel image compiled
using the defconfig. This is timely as it brings the kernel image
back below the size that can be booted in my environment, a limit
it crept over in v4.13-rc1.
* tag 'renesas-arm64-defconfig-for-v4.14' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
arm64: defconfig: compile ak4613 and renesas sound as modules
commit cee22a1505 ("workqueues: Introduce new flag WQ_POWER_EFFICIENT
for power oriented workqueues") introduced the concept of power
efficient workqueues (4 years back), but it was never enabled in
upstream kernel configs.
Power efficient workqueues are simply marked as "unbound," so that jobs
queued to them can run on any CPU in the system. It leaves the target
CPU selection to the scheduler, which is the best place for such
decision making. This improves power efficiency for workqueues which are
otherwise pinned to a CPU.
Enable it for ARM64 platforms as ARM platforms were the main target for
the introduction of power efficient workqueues.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull "Renesas ARM Based SoC Defconfig Updates for v4.14" from Simon Horman:
* Enable DMA for Renesas serial ports
Geert Uytterhoeven says, "DMA for (H)SCIF(A|B) serial ports on Renesas
R-Car Gen2 and RZ/G1 SoCs is considered stable, hence enable it by
default.".
* Enable Ethernet AVB
For the iWave RZ/G1M Q7 SOM
* Replace DRM_RCAR_HDMI by generic bridge options
* Replace SND_SOC_RSRC_CARD by SND_SIMPLE_SCU_CARD
* Replace USB_XHCI_RCAR by USB_XHCI_PLATFORM
* Enable missing PCIE_RCAR dependency
Defconfig updates for various Kconfig updates covering
renamed Kconfig symbols, now missing dependancies and so on.
* tag 'renesas-defconfig-for-v4.14' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
ARM: multi_v7_defconfig: Enable DMA for Renesas serial ports
ARM: multi_v7_defconfig: Replace DRM_RCAR_HDMI by generic bridge options
ARM: multi_v7_defconfig: Replace SND_SOC_RSRC_CARD by SND_SIMPLE_SCU_CARD
ARM: shmobile: defconfig: Refresh
ARM: shmobile: defconfig: Enable DMA for serial ports
ARM: shmobile: defconfig: Replace DRM_RCAR_HDMI by generic bridge options
ARM: shmobile: defconfig: Replace SND_SOC_RSRC_CARD by SND_SIMPLE_SCU_CARD
ARM: shmobile: defconfig: Replace USB_XHCI_RCAR by USB_XHCI_PLATFORM
ARM: shmobile: defconfig: Enable missing PCIE_RCAR dependency
ARM: shmobile: defconfig: Enable Ethernet AVB
Pull "Bunch of ARM defconfig cleanups for v4.14" from Krzysztof Kozłowski:
Cleanup ARMv7 defconfigs from options not existing anymore.
* tag 'samsung-defconfig-arm-cleanups-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
ARM: qcom_defconfig: Cleanup from non-existing options
ARM: ezx_defconfig: Cleanup from non-existing options
ARM: vexpress_defconfig: Cleanup from non-existing options
ARM: ixp4xx_defconfig: Cleanup from non-existing options
ARM: multi_v7_defconfig: Cleanup from non-existing options
of_irq_get() may return 0 as well as a nagative error number on failure,
(and never on success), however omap44xx_prm_late_init() regards 0 as a
valid IRQ -- fix this.
Fixes: a8f83aefcd ("ARM: OMAP4+: PRM: register interrupt information from DT")
Fixes: c5b3955828 ("ARM: OMAP4: Fix legacy code clean-up regression")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
of_irq_get() may return 0 as well as a nagative error number on failure
(and never on success), however omap3xxx_prm_late_init() regards 0 as a
valid IRQ -- fix this.
Fixes: 1e037794f7 ("ARM: OMAP3+: PRM: register interrupt information from DT")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Pull "Samsung defconfig changes for v4.14" from Krzysztof Kozłowski:
1. Enable some drivers useful on our boards (communication: Bluetooth,
WiFi, NFC, USB; codepages and crypto algorithms).
2. Enable debugging and lock testing options. These might have impact on
performance but we use the exynos_defconfig a lot during development
so they should bring benefits of detecting early locking issues.
* tag 'samsung-defconfig-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
ARM: exynos_defconfig: Enable locking test options
ARM: exynos_defconfig: Enable NLS_UTF8 and some crypto algorithms
ARM: exynos_defconfig: Enable Bluetooth, mac80211, NFC and more USB drivers
Pull "DT fixes for 4.13" from Alexandre Belloni:
- Fix NAND flash support for sama5d2
* tag 'at91-ab-4.13-dt-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
ARM: dts: at91: sama5d2: fix EBI/NAND controllers declaration
ARM: dts: at91: sama5d2: use sama5d2 compatible string for SMC
Pull "i.MX fixes for 4.13, round 2" from Shawn Guo:
- Add missing 'ranges' property for i.MX25 device tree TSCADC node, so
that it's child nodes ADC and TSC device can be probed by kernel.
- Fix i.MX GPCv2 power domain driver to request regulator after power
domain initialization, since regulator could defer probing and
therefore cause power domain initialized twice.
* tag 'imx-fixes-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: i.MX25: add ranges to tscadc
soc: imx: gpcv2: fix regulator deferred probe
Pull "i.MX fixes for 4.13" from Shawn Guo:
- A fix for imx7d-sdb board to move pinctrl_spi4 pins from low power
iomux controller to normal iomuxc node, as the pins belong to normal
iomuxc rather than low power one.
* tag 'imx-fixes-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx7d-sdb: Put pinctrl_spi4 in the correct location
gup_hugepte() checks if pages are present and readable, and
when 'write' is set, also checks if the pages are writable.
Initially this was done by checking if _PAGE_PRESENT and
_PAGE_READ were set. In addition, _PAGE_WRITE was verified for write
accesses.
The problem is that we have to handle the three following cases:
1/ The target defines __PAGE_READ and __PAGE_WRITE
2/ The target defines __PAGE_RW
3/ The target defines __PAGE_RO
In case 1/, this is obvious
In case 2/, __PAGE_READ is defined as 0 and __PAGE_WRITE as __PAGE_RW
so it works as well.
But in case 3, __PAGE_RW is defined as 0, which means __PAGE_WRITE is 0
and then the test returns true (page writable) in all cases.
A first correction was attempted in commit 6b8cb66a6a ("powerpc: Fix
usage of _PAGE_RO in hugepage"), but that fix is wrong:
instead of checking that the page is writable when write is requested,
it checks that the page is NOT writable when write is NOT requested.
This patch adds a new pte_read() helper to check whether a page is
readable or not. This avoids handling all possible cases in
gup_hugepte().
Then gup_hugepte() is modified to use pte_present(), pte_read()
and pte_write() instead of the raw flags.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
__set_fixmap() uses __fix_to_virt() then does the boundary checks
by it self. Instead, we can use fix_to_virt() which does the
verification at build time. For this, we need to use it inline
so that GCC can see the real value of idx at buildtime.
In the meantime, we remove the 'fixmaps' variable.
This variable is set but has never been used from the beginning
(commit 2c419bdeca ("[POWERPC] Port fixmap from x86 and use
for kmap_atomic"))
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
get_pteptr() and __mapin_ram_chunk() are only used locally,
so define them static
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch implements STRICT_KERNEL_RWX on PPC32.
As for CONFIG_DEBUG_PAGEALLOC, it deactivates BAT and LTLB mappings
in order to allow page protection setup at the level of each page.
As BAT/LTLB mappings are deactivated, there might be a performance
impact.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As seen below, allthough the init sections have been freed, the
associated memory area is still marked as executable in the
page tables.
~ dmesg
[ 5.860093] Freeing unused kernel memory: 592K (c0570000 - c0604000)
~ cat /sys/kernel/debug/kernel_page_tables
---[ Start of kernel VM ]---
0xc0000000-0xc0497fff 4704K rw X present dirty accessed shared
0xc0498000-0xc056ffff 864K rw present dirty accessed shared
0xc0570000-0xc059ffff 192K rw X present dirty accessed shared
0xc05a0000-0xc7ffffff 125312K rw present dirty accessed shared
---[ vmalloc() Area ]---
This patch fixes that.
The implementation is done by reusing the change_page_attr()
function implemented for CONFIG_DEBUG_PAGEALLOC
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
__change_page_attr() uses flush_tlb_page().
flush_tlb_page() uses tlbie instruction, which also invalidates
pinned TLBs, which is not what we expect.
This patch modifies the implementation to use flush_tlb_kernel_range()
instead. This will make use of tlbia which will preserve pinned TLBs.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This reduces the DTLB miss handler hot path (user address path)
by one instruction by preserving r10.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
setup_initial_memory_limit() is only called during init.
mmu_patch_cmp_limit() is only called from 8xx_mmu.c
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pinning TLBs bypasses STRICT_KERNEL_RWX or DEBUG_PAGEALLOC protections
so it should only be allowed when those are not selected
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As stated in a comment in head_8xx.S, today we "Always pin the first
8 MB ITLB to prevent ITLB misses while mucking around with SRR0/SRR1
in asm".
This issue has just been cleared by the preceding patch, therefore
we can make this pinning optional (on by default) and independent
of DATA pinning.
This patch also makes pinning of IMMR independent of pinning of DATA.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
By default, the 8xx pins an ITLB on the first 8M of memory in order
to avoid any ITLB miss on kernel code.
However, with some debug functions like DEBUG_PAGEALLOC and
DEBUG_RODATA, pinning TLBs is contradictory.
In order to avoid any ITLB miss in a critical section without pinning
TLBs, we have to ensure that there is no page boundary crossed between
the setup of a new value in SRR0/SRR1 and the associated RFI.
The functions modifying srr0/srr1 are all located in setup_32.S.
They are spread over almost 4kbytes.
The patch forces a 12 bits (4kbytes) alignment for those
functions. This garanties that the functions remain in a
single 4k page.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The macro to check if an address is a kernel address or not is
not used anymore in DTLBmiss handler. It is used in ITLB miss handler
and in DTLB error handler. DTLB error handler is not a hot path, it
doesn't need such optimisation.
In order to simplify a following patch which will rework ITLB miss
handler, we remove the macros and reintroduce them inside the handler.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On the 8xx, the RAM mapped with LTLBs must be seen as block mapped,
just like areas mapped with BATs on standard PPC32.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
SoC device attributes are registered with a call to
soc_device_register() from the machine .init_late() operation, which is
called from the late initcall, after all drivers built-in drivers have
been probed. This results in the impossibility for drivers to use SoC
device matching in their probe function.
The omap_soc_device_init() function is safe to call from the machine
.init() operation, as all data it depends on is initialized from the
.init_early() operation. Move SoC device attribute registration to
machine .init() like on all other ARM platforms.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Normally the values in the resource field and the argument to ARRAY_SIZE
in the num_resources are the same. In this case, the value in the reousrce
field is the same as the one in the previous platform_device structure, and
appears to be a copy-paste error. Replace the value in the resource field
with the argument to the local call to ARRAY_SIZE.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This fixes another invalid use of register expressions.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Larry reported a CPU hotplug lock recursion in the MTRR code.
============================================
WARNING: possible recursive locking detected
systemd-udevd/153 is trying to acquire lock:
(cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c030fc26>] stop_machine+0x16/0x30
but task is already holding lock:
(cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c0234353>] mtrr_add_page+0x83/0x470
....
cpus_read_lock+0x48/0x90
stop_machine+0x16/0x30
mtrr_add_page+0x18b/0x470
mtrr_add+0x3e/0x70
mtrr_add_page() holds the hotplug rwsem already and calls stop_machine()
which acquires it again.
Call stop_machine_cpuslocked() instead.
Reported-and-tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708140920250.1865@nanos
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@suse.de>
In iommu_range_alloc() we generate a mask by right shifting ~0,
however if the specified alignment is 0 then we right shift by 64,
which is undefined. UBSAN tells us so:
UBSAN: Undefined behaviour in ../arch/powerpc/kernel/iommu.c:193:35
shift exponent 64 is too large for 64-bit type 'long unsigned int'
We can avoid it by instead generating the mask with:
align_mask = (1ull << align_order) - 1;
That will also generate an undefined shift if align_order is 64 or
greater, but that shouldn't be a problem for a while.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In a multi node system with discontiguous node ids, nest event values
are not showing up properly. eg. lscpu output:
NUMA node0 CPU(s): 0-15
NUMA node8 CPU(s): 16-31
Nest event values on such systems can be counted on CPUs <= 15:
$./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 0-14 -I 1000 sleep 1000
# time counts unit events
1.000294577 30,17,24,42,880 nest_powerbus0_imc/PM_PB_CYC/
But not on CPUs >= 16:
$./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 16-28 -I 1000 sleep 1000
# time counts unit events
1.000049902 <not supported> nest_powerbus0_imc/PM_PB_CYC/
This is because, when fetching the reference count, the node id (which
may be sparse) is used as the array index, not the node number (which
is 0 based and contiguous).
Fix it by using the node number as the array index.
$./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 16-28 -I 1000 sleep 1000
# time counts unit events
1.000241961 26,12,35,28,704 nest_powerbus0_imc/PM_PB_CYC/
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
[mpe: Change log tweaks for clarity and brevity]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In some obscure Book3E configs (randconfig) we can end up missing a
definition for PGALLOC_GFP in pgtable_64.c.
Fix it by moving the definition to asm/pgalloc.h.
Fixes: de3b87611d ("powerpc/mm/book(e)(3s)/64: Add page table accounting")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Exclude core xmon files from ftrace (along with an xmon xive helper
outside of xmon/) to minimize impact of ftrace while within xmon.
Before:
/sys/kernel/debug/tracing# grep -ci xmon available_filter_functions
26
After:
/sys/kernel/debug/tracing# grep -ci xmon available_filter_functions
0
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Use $(subst ..) on KBUILD_CFLAGS rather than CFLAGS_REMOVE_xxx]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>