Commit b6cb20fdc2 ("powerpc/book3e: Fix set_memory_x() and
set_memory_nx()") implemented a more elaborated version of
pte_mkwrite() suitable for both kernel and user pages. That was
needed because set_memory_x() was using pte_mkwrite(). But since
commit a4c182ecf3 ("powerpc/set_memory: Avoid spinlock recursion
in change_page_attr()") pte_mkwrite() is not used anymore by
set_memory_x() so pte_mkwrite() can be simplified as it is only
used for user pages.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/cdc822322fe2ff4b0f5ecfde71d09d950b1c7557.1695659959.git.christophe.leroy@csgroup.eu
yield_cpu is a sample of a preempted lock holder that gets propagated
back through the queue. Queued waiters use this to yield to the
preempted lock holder without continually sampling the lock word (which
would defeat the purpose of MCS queueing by bouncing the cache line).
The problem is that yield_cpu can become stale. It can take some time to
be passed down the chain, and if any queued waiter gets preempted then
it will cease to propagate the yield_cpu to later waiters.
This can result in yielding to a CPU that no longer holds the lock,
which is bad, but particularly if it is currently in H_CEDE (idle),
then it appears to be preempted and some hypervisors (PowerVM) can
cause very long H_CONFER latencies waiting for H_CEDE wakeup. This
results in latency spikes and hard lockups on oversubscribed
partitions with lock contention.
This is a minimal fix. Before yielding to yield_cpu, sample the lock
word to confirm yield_cpu is still the owner, and bail out of it is not.
Thanks to a bunch of people who reported this and tracked down the
exact problem using tracepoints and dispatch trace logs.
Fixes: 28db61e207 ("powerpc/qspinlock: allow propagation of yield CPU down the queue")
Cc: stable@vger.kernel.org # v6.2+
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: Laurent Dufour <ldufour@linux.ibm.com>
Reported-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Debugged-by: "Nysal Jan K.A" <nysal@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231016124305.139923-2-npiggin@gmail.com
Sachin reported a warning when running the inject-ra-err selftest:
# selftests: powerpc/mce: inject-ra-err
Disabling lock debugging due to kernel taint
MCE: CPU19: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered]
MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [0000000010000e48]
MCE: CPU19: Initiator CPU
MCE: CPU19: Unknown
------------[ cut here ]------------
WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 radix__tlb_flush+0x160/0x180
CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G M E 6.6.0-rc3-00055-g9ed22ae6be81 #4
Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
...
NIP radix__tlb_flush+0x160/0x180
LR radix__tlb_flush+0x104/0x180
Call Trace:
radix__tlb_flush+0xf4/0x180 (unreliable)
tlb_finish_mmu+0x15c/0x1e0
exit_mmap+0x1a0/0x510
__mmput+0x60/0x1e0
exit_mm+0xdc/0x170
do_exit+0x2bc/0x5a0
do_group_exit+0x4c/0xc0
sys_exit_group+0x28/0x30
system_call_exception+0x138/0x330
system_call_vectored_common+0x15c/0x2ec
And bisected it to commit e43c0a0c3c ("powerpc/64s/radix: combine
final TLB flush and lazy tlb mm shootdown IPIs"), which added a warning
in radix__tlb_flush() if mm->context.copros is still elevated.
However it's possible for the copros count to be elevated if a process
exits without first closing file descriptors that are associated with a
copro, eg. VAS.
If the process exits with a VAS file still open, the release callback
is queued up for exit_task_work() via:
exit_files()
put_files_struct()
close_files()
filp_close()
fput()
And called via:
exit_task_work()
____fput()
__fput()
file->f_op->release(inode, file)
coproc_release()
vas_user_win_ops->close_win()
vas_deallocate_window()
mm_context_remove_vas_window()
mm_context_remove_copro()
But that is after exit_mm() has been called from do_exit() and triggered
the warning.
Fix it by dropping the warning, and always calling __flush_all_mm().
In the normal case of no copros, that will result in a call to
_tlbiel_pid(mm->context.id, RIC_FLUSH_ALL) just as the current code
does.
If the copros count is elevated then it will cause a global flush, which
should flush translations from any copros. Note that the process table
entry was cleared in arch_exit_mmap(), so copros should not be able to
fetch any new translations.
Fixes: e43c0a0c3c ("powerpc/64s/radix: combine final TLB flush and lazy tlb mm shootdown IPIs")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Closes: https://lore.kernel.org/all/A8E52547-4BF1-47CE-8AEA-BC5A9D7E3567@linux.ibm.com/
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://msgid.link/20231017121527.1574104-1-mpe@ellerman.id.au
Christophe reported that the change to ARCH_FORCE_MAX_ORDER to limit the
range to 10 had broken his ability to configure hugepages:
# echo 1 > /sys/kernel/mm/hugepages/hugepages-8192kB/nr_hugepages
sh: write error: Invalid argument
Several of the powerpc defconfigs previously set the
ARCH_FORCE_MAX_ORDER value to 12, via the definition in
arch/powerpc/configs/fsl-emb-nonhw.config, used by:
mpc85xx_defconfig
mpc85xx_smp_defconfig
corenet32_smp_defconfig
corenet64_smp_defconfig
mpc86xx_defconfig
mpc86xx_smp_defconfig
Fix it by increasing the allowed range to 12 to restore the previous
behaviour.
Fixes: 358e526a16 ("powerpc/mm: Reinstate ARCH_FORCE_MAX_ORDER ranges")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Closes: https://lore.kernel.org/all/8011d806-5b30-bf26-2bfe-a08c39d57e20@csgroup.eu/
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230824122849.942072-1-mpe@ellerman.id.au
Eddie reported that newer kernels were crashing during boot on his 476
FSP2 system:
kernel tried to execute user page (b7ee2000) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xb7ee2000
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K FSP-2
Modules linked in:
CPU: 0 PID: 61 Comm: mount Not tainted 6.1.55-d23900f.ppcnf-fsp2 #1
Hardware name: ibm,fsp2 476fpe 0x7ff520c0 FSP-2
NIP: b7ee2000 LR: 8c008000 CTR: 00000000
REGS: bffebd83 TRAP: 0400 Not tainted (6.1.55-d23900f.ppcnf-fs p2)
MSR: 00000030 <IR,DR> CR: 00001000 XER: 20000000
GPR00: c00110ac bffebe63 bffebe7e bffebe88 8c008000 00001000 00000d12 b7ee2000
GPR08: 00000033 00000000 00000000 c139df10 48224824 1016c314 10160000 00000000
GPR16: 10160000 10160000 00000008 00000000 10160000 00000000 10160000 1017f5b0
GPR24: 1017fa50 1017f4f0 1017fa50 1017f740 1017f630 00000000 00000000 1017f4f0
NIP [b7ee2000] 0xb7ee2000
LR [8c008000] 0x8c008000
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 0000000000000000 ]---
The problem is in ret_from_syscall where the check for
icache_44x_need_flush is done. When the flush is needed the code jumps
out-of-line to do the flush, and then intends to jump back to continue
the syscall return.
However the branch back to label 1b doesn't return to the correct
location, instead branching back just prior to the return to userspace,
causing bogus register values to be used by the rfi.
The breakage was introduced by commit 6f76a01173
("powerpc/syscall: implement system call entry/exit logic in C for PPC32") which
inadvertently removed the "1" label and reused it elsewhere.
Fix it by adding named local labels in the correct locations. Note that
the return label needs to be outside the ifdef so that CONFIG_PPC_47x=n
compiles.
Fixes: 6f76a01173 ("powerpc/syscall: implement system call entry/exit logic in C for PPC32")
Cc: stable@vger.kernel.org # v5.12+
Reported-by: Eddie James <eajames@linux.ibm.com>
Tested-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/linuxppc-dev/fdaadc46-7476-9237-e104-1d2168526e72@linux.ibm.com/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://msgid.link/20231010114750.847794-1-mpe@ellerman.id.au
PowerPC has a 'btext' font used for the console which is almost identical
to the shared font_sun8x16, so use it rather than duplicating the data.
They were actually identical until about a decade ago when
commit bcfbeecea1 ("drivers: console: font_: Change a glyph from
"broken bar" to "vertical line"")
which changed the | in the shared font to be a solid
bar rather than a broken bar. That's the only difference.
This was originally spotted by the PMF source code analyser, which
noticed that sparc does the same thing with the same data, and they
also share a bunch of functions to manipulate the data. I've previously
posted a near identical patch for sparc.
Tested very lightly with a boot without FS in qemu.
Signed-off-by: "Dr. David Alan Gilbert" <linux@treblig.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230825142754.1487900-1-linux@treblig.org
In powerpc pseries system, below behaviour is observed while
enabling tracing on hcall:
# cd /sys/kernel/debug/tracing/
# cat events/powerpc/hcall_exit/enable
0
# echo 1 > events/powerpc/hcall_exit/enable
# ls
-bash: fork: Bad address
Above is from power9 lpar with latest kernel. Past this, softlockup
is observed. Initially while attempting via perf_event_open to
use "PERF_TYPE_TRACEPOINT", kernel panic was observed.
perf config used:
================
memset(&pe[1],0,sizeof(struct perf_event_attr));
pe[1].type=PERF_TYPE_TRACEPOINT;
pe[1].size=96;
pe[1].config=0x26ULL; /* 38 raw_syscalls/sys_exit */
pe[1].sample_type=0; /* 0 */
pe[1].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1f */
pe[1].inherit=1;
pe[1].precise_ip=0; /* arbitrary skid */
pe[1].wakeup_events=0;
pe[1].bp_type=HW_BREAKPOINT_EMPTY;
pe[1].config1=0x1ULL;
Kernel panic logs:
==================
Kernel attempted to read user page (8) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000008
Faulting instruction address: 0xc0000000004c2814
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: nfnetlink bonding tls rfkill sunrpc dm_service_time dm_multipath pseries_rng xts vmx_crypto xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ibmvfc scsi_transport_fc ibmveth dm_mirror dm_region_hash dm_log dm_mod fuse
CPU: 0 PID: 1431 Comm: login Not tainted 6.4.0+ #1
Hardware name: IBM,8375-42A POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.30 (VL950_892) hv:phyp pSeries
NIP page_remove_rmap+0x44/0x320
LR wp_page_copy+0x384/0xec0
Call Trace:
0xc00000001416e400 (unreliable)
wp_page_copy+0x384/0xec0
__handle_mm_fault+0x9d4/0xfb0
handle_mm_fault+0xf0/0x350
___do_page_fault+0x48c/0xc90
hash__do_page_fault+0x30/0x70
do_hash_fault+0x1a4/0x330
data_access_common_virt+0x198/0x1f0
--- interrupt: 300 at 0x7fffae971abc
git bisect tracked this down to below commit:
'commit baa49d81a9 ("powerpc/pseries: hvcall stack frame overhead")'
This commit changed STACK_FRAME_OVERHEAD (112 ) to
STACK_FRAME_MIN_SIZE (32 ) since 32 bytes is the minimum size
for ELFv2 stack. With the latest kernel, when running on ELFv2,
STACK_FRAME_MIN_SIZE is used to allocate stack size.
During plpar_hcall_trace, first call is made to HCALL_INST_PRECALL
which saves the registers and allocates new stack frame. In the
plpar_hcall_trace code, STK_PARAM is accessed at two places.
1. To save r4: std r4,STK_PARAM(R4)(r1)
2. To access r4 back: ld r12,STK_PARAM(R4)(r1)
HCALL_INST_PRECALL precall allocates a new stack frame. So all
the stack parameter access after the precall, needs to be accessed
with +STACK_FRAME_MIN_SIZE. So the store instruction should be:
std r4,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1)
If the "std" is not updated with STACK_FRAME_MIN_SIZE, we will
end up with overwriting stack contents and cause corruption.
But instead of updating 'std', we can instead remove it since
HCALL_INST_PRECALL already saves it to the correct location.
similarly load instruction should be:
ld r12,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1)
Fix the load instruction to correctly access the stack parameter
with +STACK_FRAME_MIN_SIZE and remove the store of r4 since the
precall saves it correctly.
Cc: stable@vger.kernel.org # v6.2+
Fixes: baa49d81a9 ("powerpc/pseries: hvcall stack frame overhead")
Co-developed-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230929172337.7906-1-atrajeev@linux.vnet.ibm.com
In order to use run_kselftest.sh the list of tests must be emitted to
populate kselftest-list.txt.
The powerpc Makefile is written to use EMIT_TESTS. But support for
EMIT_TESTS was dropped in commit d4e59a536f ("selftests: Use runner.sh
for emit targets"). Although prior to that commit a548de0fe8
("selftests: lib.mk: add test execute bit check to EMIT_TESTS") had
already broken run_kselftest.sh for powerpc due to the executable check
using the wrong path.
It can be fixed by replacing the EMIT_TESTS definitions with actual
emit_tests rules in the powerpc Makefiles. This makes run_kselftest.sh
able to run powerpc tests:
$ cd linux
$ export ARCH=powerpc
$ export CROSS_COMPILE=powerpc64le-linux-gnu-
$ make headers
$ make -j -C tools/testing/selftests install
$ grep -c "^powerpc" tools/testing/selftests/kselftest_install/kselftest-list.txt
182
Fixes: d4e59a536f ("selftests: Use runner.sh for emit targets")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230921072623.828772-1-mpe@ellerman.id.au
The changes to copy_thread() made in commit eed7c420aa ("powerpc:
copy_thread differentiate kthreads and user mode threads") inadvertently
broke arch_stack_walk_reliable() because it has knowledge of the stack
layout.
Fix it by changing the condition to match the new logic in
copy_thread(). The changes make the comments about the stack layout
incorrect, rather than rephrasing them just refer the reader to
copy_thread().
Also the comment about the stack backchain is no longer true, since
commit edbd0387f3 ("powerpc: copy_thread add a back chain to the
switch stack frame"), so remove that as well.
Fixes: eed7c420aa ("powerpc: copy_thread differentiate kthreads and user mode threads")
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230921232441.1181843-1-mpe@ellerman.id.au
ReiserFS has been deprecated for a year and a half, yet is still built
as part of a defconfig kernel.
According to commit eb103a5164 ("reiserfs: Deprecate reiserfs"), the
filesystem is slated to be removed in 2025. Remove it from the defconfig
profiles now, as part of its deprecation process.
Signed-off-by: Peter Lafreniere <peter@n8pjl.ca>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230918175529.19011-3-peter@n8pjl.ca
Upstream Linux never had a "README.legal" file, but it was present
in early source releases of Linux/m68k. It contained a simple copyright
notice and a link to a version of the "COPYING" file that predated the
addition of the "only valid GPL version is v2" clause.
Get rid of the references to non-existent files by replacing the
boilerplate with SPDX license identifiers.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d91725ff1ed5d4b6ba42474e2ebfeebe711cba23.1695031668.git.geert@linux-m68k.org
Syzkaller reported a sleep in atomic context bug relating to the HASHCHK
handler logic:
BUG: sleeping function called from invalid context at arch/powerpc/kernel/traps.c:1518
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 25040, name: syz-executor
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
no locks held by syz-executor/25040.
irq event stamp: 34
hardirqs last enabled at (33): [<c000000000048b38>] prep_irq_for_enabled_exit arch/powerpc/kernel/interrupt.c:56 [inline]
hardirqs last enabled at (33): [<c000000000048b38>] interrupt_exit_user_prepare_main+0x148/0x600 arch/powerpc/kernel/interrupt.c:230
hardirqs last disabled at (34): [<c00000000003e6a4>] interrupt_enter_prepare+0x144/0x4f0 arch/powerpc/include/asm/interrupt.h:176
softirqs last enabled at (0): [<c000000000281954>] copy_process+0x16e4/0x4750 kernel/fork.c:2436
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 15 PID: 25040 Comm: syz-executor Not tainted 6.5.0-rc5-00001-g3ccdff6bb06d #3
Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_021) hv:phyp pSeries
Call Trace:
[c0000000a8247ce0] [c00000000032b0e4] __might_resched+0x3b4/0x400 kernel/sched/core.c:10189
[c0000000a8247d80] [c0000000008c7dc8] __might_fault+0xa8/0x170 mm/memory.c:5853
[c0000000a8247dc0] [c00000000004160c] do_program_check+0x32c/0xb20 arch/powerpc/kernel/traps.c:1518
[c0000000a8247e50] [c000000000009b2c] program_check_common_virt+0x3bc/0x3c0
To determine if a trap was caused by a HASHCHK instruction, we inspect
the user instruction that triggered the trap. However this may sleep
if the page needs to be faulted in (get_user_instr() reaches
__get_user(), which calls might_fault() and triggers the bug message).
Move the HASHCHK handler logic to after we allow IRQs, which is fine
because we are only interested in HASHCHK if it's a user space trap.
Fixes: 5bcba4e6c1 ("powerpc/dexcr: Handle hashchk exception")
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230915034604.45393-1-bgray@linux.ibm.com
It used to be impossible to select CONFIG_CPM2 without selecting
CONFIG_FSL_SOC at the same time because CONFIG_CPM2 was dependent
on CONFIG_8260 and CONFIG_8260 was selecting CONFIG_FSL_SOC.
But after commit eb5aa21372 ("powerpc/82xx: Remove CONFIG_8260
and CONFIG_8272") CONFIG_CPM2 depends on CONFIG_PPC_82xx instead
but CONFIG_PPC_82xx doesn't directly selects CONFIG_FSL_SOC.
Fix it by forcing CONFIG_PPC_82xx to select CONFIG_FSL_SOC just
like already done by PPC_8xx, PPC_MPC512x, PPC_83xx, PPC_86xx.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: eb5aa21372 ("powerpc/82xx: Remove CONFIG_8260 and CONFIG_8272")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/7ab513546148ebe33ddd4b0ea92c7bfd3cce3ad7.1694705016.git.christophe.leroy@csgroup.eu
We recently added support for -fpatchable-function-entry and it is
enabled by default on ppc32 (ppc64 needs gcc v13.1.0). When building the
kernel for ppc32 and also enabling CONFIG_LD_DEAD_CODE_DATA_ELIMINATION,
we see the below build error with older gcc versions:
powerpc-linux-gnu-ld: init/main.o(__patchable_function_entries): error: need linked-to section for --gc-sections
This error is thrown since __patchable_function_entries section would be
garbage collected with --gc-sections since it does not reference any
other kept sections. This has subsequently been fixed with:
https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=b7d072167715829eed0622616f6ae0182900de3e
Disable LD_DEAD_CODE_DATA_ELIMINATION for gcc versions before v11.1.0 if
using -fpatchable-function-entry to avoid this bug.
Fixes: 0f71dcfb4a ("powerpc/ftrace: Add support for -fpatchable-function-entry")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230913134129.2782088-1-naveen@kernel.org
This is called in an atomic context, so is not allowed to sleep if a
user page needs to be faulted in and has nowhere it can be deferred to.
The pagefault_disabled() function is documented as preventing user
access methods from sleeping.
In practice the page will be mapped in nearly always because we are
reading the instruction that just triggered the watchpoint trap.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230829063457.54157-3-bgray@linux.ibm.com
thread_change_pc() uses CPU local data, so must be protected from
swapping CPUs while it is reading the breakpoint struct.
The error is more noticeable after 1e60f3564b ("powerpc/watchpoints:
Track perf single step directly on the breakpoint"), which added an
unconditional __this_cpu_read() call in thread_change_pc(). However the
existing __this_cpu_read() that runs if a breakpoint does need to be
re-inserted has the same issue.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230829063457.54157-2-bgray@linux.ibm.com
Valid domain value is in range 1 to HV_PERF_DOMAIN_MAX. Current code has
check for domain value greater than or equal to HV_PERF_DOMAIN_MAX. But
the check for domain value 0 is missing.
Fix this issue by adding check for domain value 0.
Before:
# ./perf stat -v -e hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ sleep 1
Using CPUID 00800200
Control descriptor is not initialized
Error:
The sys_perf_event_open() syscall returned with 5 (Input/output error) for
event (hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/).
/bin/dmesg | grep -i perf may provide additional information.
Result from dmesg:
[ 37.819387] hv-24x7: hcall failed: [0 0x60040000 0x100 0] => ret
0xfffffffffffffffc (-4) detail=0x2000000 failing ix=0
After:
# ./perf stat -v -e hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ sleep 1
Using CPUID 00800200
Control descriptor is not initialized
Warning:
hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ event is not supported by the kernel.
failed to read counter hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/
Fixes: ebd4a5a3eb ("powerpc/perf/hv-24x7: Minor improvements")
Reported-by: Krishan Gopal Sarawast <krishang@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230825055601.360083-1-kjain@linux.ibm.com
Add more config options that wouldn't be done by the generic debug
config in kernel/configs/debug.config
CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG
Adds an initialized check on each (cpu|mmu)_has_feature()
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
Adds some extra checks around IRQ mask manipulation
CONFIG_PPC_KUAP_DEBUG
Adds some extra KAUP checks around interrupts/context switching
CONFIG_PPC_RFI_SRR_DEBUG
Adds some extra SSR checks around interrupts/syscalls
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230830044238.578840-1-bgray@linux.ibm.com
Currently, is_kdump_kernel() returns true in crash dump capture kernel
for both kdump and fadump crash dump capturing methods, as both these
methods set elfcorehdr_addr. Some restrictions enforced for crash dump
capture kernel, based on is_kdump_kernel(), are specifically meant for
kdump case and not desirable for fadump - eg. IO queues restriction in
device drivers. So, define is_kdump_kernel() to return false when f/w
assisted dump is active.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230912082950.856977-2-hbathini@linux.ibm.com
Currently, is_kdump_kernel() returns true when elfcorehdr_addr is set.
While elfcorehdr_addr is set for kexec based kernel dump mechanism,
alternate dump capturing methods like fadump [1] also set it to export
the vmcore. Since, is_kdump_kernel() is used to restrict resources in
crash dump capture kernel and such restrictions may not be desirable
for fadump, allow is_kdump_kernel() to be defined differently for such
scenarios. With this, is_kdump_kernel() could be false while vmcore is
usable. So, remove unnecessary dependency with is_kdump_kernel(), for
exporting vmcore.
[1] https://docs.kernel.org/powerpc/firmware-assisted-dump.html
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230912082950.856977-1-hbathini@linux.ibm.com
Presently, while reading a vmcore, makedumpfile uses
`cur_cpu_spec.mmu_features` to decide whether the crashed system had
RADIX MMU or not.
Currently, makedumpfile fails to get the `cur_cpu_spec` symbol (unless
a vmlinux is passed with the `-x` flag to makedumpfile), and hence
assigns offsets and shifts (such as pgd_offset_l4) incorrecly considering
MMU to be hash MMU.
Add `cur_cpu_spec` symbol and offset of `mmu_features` in the
`cpu_spec` struct, to VMCOREINFO, so that the symbol address and offset
is accessible to makedumpfile, without needing the vmlinux file
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230911091409.415662-1-adityag@linux.ibm.com