The pseries Shared Processor Logical Partition(SPLPAR) machines
can retrieve a log of dispatch and preempt events from the
hypervisor using data from Disptach Trace Log(DTL) buffer.
With this information, user can retrieve when and why each dispatch &
preempt has occurred. Added an interface to expose the Virtual Processor
Area(VPA) DTL counters via perf.
The following events are available and exposed in sysfs:
vpa_dtl/dtl_cede/ - Trace voluntary (OS initiated) virtual processor waits
vpa_dtl/dtl_preempt/ - Trace time slice preempts
vpa_dtl/dtl_fault/ - Trace virtual partition memory page faults.
vpa_dtl/dtl_all/ - Trace all (dtl_cede/dtl_preempt/dtl_fault)
Added interface defines supported event list, config fields for the
event attributes and their corresponding bit values which are exported
via sysfs. User could use the standard perf tool to access perf events
exposed via vpa-dtl pmu.
The VPA DTL PMU counters do not interrupt on overflow or generate any
PMI interrupts. Therefore, the kernel needs to poll the counters, added
hrtimer code to do that. The timer interval can be provided by user via
sample_period field in nano seconds. There is one hrtimer added per
vpa-dtl pmu thread.
To ensure there are no other conflicting dtl users (example: debugfs dtl
or /proc/powerpc/vcpudispatch_stats), interface added code to use
"down_write_trylock" call to take the dtl_access_lock. The dtl_access_lock
is defined in dtl.h file. Also added global reference count variable called
"dtl_global_refc", to ensure dtl data can be captured per-cpu. Code also
added global lock called "dtl_global_lock" to avoid race condition.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Tejas Manhas <tejas05@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250915102947.26681-3-atrajeev@linux.ibm.com
PAGE_KERNEL_TEXT is an old macro that is used to tell kernel whether
kernel text has to be mapped read-only or read-write based on build
time options.
But nowadays, with functionnalities like jump_labels, static links,
etc ... more only less all kernels need to be read-write at some
point, and some combinations of configs failed to work due to
innacurate setting of PAGE_KERNEL_TEXT. On the other hand, today
we have CONFIG_STRICT_KERNEL_RWX which implements a more controlled
access to kernel modifications.
Instead of trying to keep PAGE_KERNEL_TEXT accurate with all
possible options that may imply kernel text modification, always
set kernel text read-write at startup and rely on
CONFIG_STRICT_KERNEL_RWX to provide accurate protection.
Do this by passing PAGE_KERNEL_X to map_kernel_page() in
__maping_ram_chunk() instead of passing PAGE_KERNEL_TEXT. Once
this is done, the only remaining user of PAGE_KERNEL_TEXT is
mmu_mark_initmem_nx() which uses it in a call to setibat().
As setibat() ignores the RW/RO, we can seamlessly replace
PAGE_KERNEL_TEXT by PAGE_KERNEL_X here as well and get rid of
PAGE_KERNEL_TEXT completely.
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Closes: https://lore.kernel.org/all/342b4120-911c-4723-82ec-d8c9b03a8aef@mailbox.org/
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/8e2d793abf87ae3efb8f6dce10f974ac0eda61b8.1757412205.git.christophe.leroy@csgroup.eu
Since commit 4346ba1604 ("fprobe: Rewrite fprobe on function-graph
tracer"), FPROBE depends on HAVE_FUNCTION_GRAPH_FREGS. With previous
patch adding HAVE_FUNCTION_GRAPH_FREGS for powerpc, FPROBE can be
enabled on powerpc. But with the commit b5fa903b7f ("fprobe: Add
fprobe_header encoding feature"), asm/fprobe.h header is needed to
define arch dependent encode/decode macros. The fprobe header MSB
pattern on powerpc is not 0xf. So, define FPROBE_HEADER_MSB_PATTERN
expected on powerpc.
Also, commit 762abbc0d0 ("fprobe: Use ftrace_regs in fprobe exit
handler") introduced HAVE_FTRACE_REGS_HAVING_PT_REGS for archs that
have pt_regs in ftrace_regs. Advertise that on powerpc to reuse
common definitions like ftrace_partial_regs().
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Aditya Bodkhe <aditya.b1@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250916044035.29033-2-adityab1@linux.ibm.com
commit a1be9ccc57 ("function_graph: Support recording and printing the
return value of function") introduced support for function graph return
value tracing.
Additionally, commit a3ed4157b7 ("fgraph: Replace fgraph_ret_regs with
ftrace_regs") further refactored and optimized the implementation,
making `struct fgraph_ret_regs` unnecessary.
This patch enables the above modifications for powerpc all, ensuring that
function graph return value tracing is available on this architecture.
In this patch we have redefined two functions:
- 'ftrace_regs_get_return_value()' - the existing implementation on
ppc returns -ve of return value based on some conditions not
relevant to our patch.
- 'ftrace_regs_get_frame_pointer()' - always returns 0 in current code .
We also allocate stack space to equivalent of 'SWITCH_FRAME_SIZE',
allowing us to directly use predefined offsets like 'GPR3' and 'GPR4'
this keeps code clean and consistent with already defined offsets .
After this patch, v6.14+ kernel can also be built with FPROBE on powerpc
but there are a few other build and runtime dependencies for FPROBE to
work properly. The next patch addresses them.
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Aditya Bodkhe <adityab1@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250916044035.29033-1-adityab1@linux.ibm.com
The logic for allocating ppc64_stub_entry trampolines in the .stubs
section relies on an inline sentinel, where a NULL .funcdata member
indicates an available slot.
While preceding commits fixed the initialization bugs that led to ftrace
stub corruption, the sentinel-based approach remains fragile: it depends
on an implicit convention between subsystems modifying different
struct types in the same memory area.
Replace the sentinel with an explicit counter, module->arch.num_stubs.
Instead of iterating through memory to find a NULL marker, the module
loader uses this counter as the boundary for the next free slot.
This simplifies the allocation code, hardens it against future changes
to stub structures, and removes the need for an extra relocation slot
previously reserved to terminate the sentinel search.
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250912142740.3581368-4-joe.lawrence@redhat.com
CONFIG_PPC_FTRACE_OUT_OF_LINE introduced setup_ftrace_ool_stubs() to
extend the ppc64le module .stubs section with an array of
ftrace_ool_stub structures for each patchable function.
Fix its ppc64_stub_entry stub reservation loop to properly write across
all of the num_stubs used and not just the first entry.
Fixes: eec37961a5 ("powerpc64/ftrace: Move ftrace sequence out of line")
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250912142740.3581368-3-joe.lawrence@redhat.com
When an ftrace call site is converted to a NOP, its corresponding
dyn_ftrace record should have its ftrace_ops pointer set to
ftrace_nop_ops.
Correct the powerpc implementation to ensure the
ftrace_rec_set_nop_ops() helper is called on all successful NOP
initialization paths. This ensures all ftrace records are consistent
before being handled by the ftrace core.
Fixes: eec37961a5 ("powerpc64/ftrace: Move ftrace sequence out of line")
Suggested-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250912142740.3581368-2-joe.lawrence@redhat.com
Commit 82ef440f9a ("powerpc/603: Copy kernel PGD entries into all
PGDIRs and preallocate execmem page tables") was supposed to extend
to powerpc 603 the copy of kernel PGD entries into all PGDIRs
implemented in a previous patch on the 8xx. But 603 is book3s/32 and
uses a duplicate of pgd_alloc() defined in another header.
So really do the copy at the correct place for the 603.
Fixes: 82ef440f9a ("powerpc/603: Copy kernel PGD entries into all PGDIRs and preallocate execmem page tables")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/752ab7514cae089a2dd7cc0f3d5e35849f76adb9.1755757797.git.christophe.leroy@csgroup.eu
Commit ac9f97ff8b ("powerpc/8xx: Inconditionally use task PGDIR in
DTLB misses") removed the test that needed the valeur in SPRN_EPN but
failed to remove the read.
Remove it.
And remove related comments, including the very same comment
in InstructionTLBMiss that should have been removed by
commit 33c527522f ("powerpc/8xx: Inconditionally use task PGDIR in
ITLB misses").
Also update the comment about absence of a second level table which
has been handled implicitely since commit 5ddb75cee5 ("powerpc/8xx:
remove tests on PGDIR entry validity").
Fixes: ac9f97ff8b ("powerpc/8xx: Inconditionally use task PGDIR in DTLB misses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/5811c8d1d6187f280ad140d6c0ad6010e41eeaeb.1755361995.git.christophe.leroy@csgroup.eu
The hypervisor assigns one pipe per partition for all sources and
assigns new pipe after migration. Also the partition ID that is
used by source as its target ID may be changed after the migration.
So disable hvpipe during SUSPEND event with ‘hvpipe enable’ system
parameter value = 0 and enable it after migration during RESUME
event with hvpipe enable’ system parameter value = 1.
The user space calls such as ioctl()/ read() / write() / poll()
returns -ENXIO between SUSPEND and RESUME events. The user space
process can close FD and reestablish connection with new FD after
migration if needed (Example: source IDs are changed).
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Shashank MS <shashank.gowda@in.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250909084402.1488456-10-haren@linux.ibm.com
The partition uses “Hypervisor Pipe OS Enablement Notification”
system parameter token (value = 64) to enable / disable hvpipe in
the hypervisor. Once hvpipe is enabled, the hypervisor notifies
OS if the payload is pending for that partition from any source.
This system parameter token takes 1 byte length of data with
1 = Enable and 0 = Disable.
Enable hvpipe in the hypervisor with ibm,set-system-parameter
RTAS after registering hvpipe event source interrupt.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Shashank MS <shashank.gowda@in.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250909084402.1488456-9-haren@linux.ibm.com
The hypervisor signals the OS via a Hypervisor Pipe Event external
interrupt when data is available to be received from the pipe.
Then the OS should call RTAS check-exception and provide the input
Event Mask as defined for the ‘ibm,hvpipe-msg-events’. In response,
check-exception will return an event log containing an Pipe Events
message. This message contains the source ID for which this
message is intended to and the pipe status such as whether the
payload is pending in the hypervisor or pipe to source is closed.
If there is any user space process waiting in the wait_queue for
the payload from this source ID, wake up that process which can
issue read() to obtain payload with ibm,receive-hvpipe-msg RTAS
or close FD if the pipe to source is closed.
The hypervisor has one pipe per partition for all sources and it
will not deliver another hvpipe event message until the partition
reads the payload for the previous hvpipe event. So if the source
ID is not found in the source list, issue the dummy
ibm,receive-hvpipe-msg RTAS so that pipe will not blocked.
Register hvpipe event source interrupt based on entries from
/proc/device-tree//event-sources/ibm,hvpipe-msg-events property.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Shashank MS <shashank.gowda@in.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250909084402.1488456-8-haren@linux.ibm.com
The user space polls on the wait_queue for the payload from the
specific source. The hypervisor interrupts the OS when the pipe
status for the specific source is changed such as payload is
available for the partition or pipe to the source is closed. The
OS retrieves the HVPIPE event message with check-exception RTAS
and event message contains the source ID and the pipe status.
Then wakes up all FDs waiting on the wait_queue so that the user
space can read the payload or close the FD if the pipe to source
in the hypervisor is closed.
The hypervisor assigns one pipe per partition for all sources.
Hence issue ibm,receive-hvpipe-msg() to read the pending
payload during release() before closing FD so that pipe to the
partition will not be blocked.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Shashank MS <shashank.gowda@in.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250909084402.1488456-7-haren@linux.ibm.com
ibm,receive-hvpipe-msg RTAS call is used to receive data from the
source (Ex: Hardware Management Console) over the hypervisor
pipe. The hypervisor will signal the OS via a Hypervisor Pipe
Event external interrupt when data is available to be received
from the pipe and the event message has the source ID and the
message type such as payload or closed pipe to the specific
source. The hypervisor will not generate another interrupt for
the next payload until the partition reads the previous payload.
It means the hvpipe is blocked and will not deliver other events
for any source. The maximum data length of 4048 bytes is
supported with this RTAS call right now.
The user space uses read() to receive data from HMC which issues
ibm,receive-hvpipe-msg RTAS and the kernel returns the buffer
length (including papr_hvpipe_hdr length) to the user space for
success or RTAS failure error. If the message is regarding the
pipe closed, kernel just returns the papr_hvpipe_hdr with
flags = HVPIPE_LOST_CONNECTION and expects the user space to
close FD for the corresponding source.
ibm,receive-hvpipe-msg RTAS call passes the buffer and returns
the source ID from where this payload is received and the
payload length.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Shashank MS <shashank.gowda@in.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250909084402.1488456-6-haren@linux.ibm.com
ibm,send-hvpipe-msg RTAS call is used to send data to the source
(Ex: Hardware Management Console) over the hypervisor pipe. The
maximum data length of 4048 bytes is supported with this RTAS call
right now. The user space uses write() to send this payload which
invokes this RTAS. Then the write returns the buffer length
(including papr_hvpipe_hdr length) to the user space for success
or RTAS failure error.
ibm,send-hvpipe-msg call takes source ID as target and the buffer
in the form of buffer list. The buffer list format consists of
work area of size 4K to hold buffer list and number of 4K work
areas depends on buffers is as follows:
Length of Buffer List in bytes
Address of 4K buffer 1
Length of 4K buffer 1 used
...
Address of 4K buffer n
Length of 4K buffer n used
Only one buffer is used right now because of max payload size is
4048 bytes. writev() can be used in future when supported more
than one buffer.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Shashank MS <shashank.gowda@in.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250909084402.1488456-5-haren@linux.ibm.com
The hypervisor provides ibm,send-hvpipe-msg and
ibm,receive-hvpipe-msg RTAS calls which can be used by the
partition to communicate through an inband hypervisor channel with
different external sources such as Hardware Management Console
(HMC). The information exchanged, whether it be messages, raw or
formatted data, etc., is only known to between applications in the
OS and the source (HMC). This patch adds papr-hvpipe character
driver and provides the standard interfaces such as open / ioctl/
read / write to user space for exchanging information with HMC
using send/recevive HVPIPE RTAS functions.
PAPR (7.3.32 Hypervisor Pipe Information Exchange) defines the
HVPIPE usage:
- The hypervisor has one HVPIPE per partition for all sources.
- OS can determine this feature’s availability by detecting the
“ibm,hypervisor-pipe-capable” property in the /rtas node of the
device tree.
- Each source is represented by the source ID which is used in
send / recv HVPIPE RTAS. (Ex: source ID is the target for the
payload in send RTAS).
- Return status of ibm,send-hvpipe-msg can be considered as
delivered the payload.
- Return status of ibm,receive-hvpipe-msg can be considered as
ACK to source.
- The hypervisor generates hvpipe message event interrupt when
the partition has the payload to receive.
Provide the interfaces to the user space with /dev/papr-hvpipe
character device using the following programming model:
int devfd = open("/dev/papr-hvpipe")
int fd = ioctl(devfd, PAPR_HVPIPE_IOC_CREATE_HANDLE, &srcID);
- Restrict the user space to use the same source ID and do not
expect more than one process access with the same source.
char *buf = malloc(size);
- SIZE should be 4K and the buffer contains header and the
payload.
length = write(fd, buf, size);
- OS issues ibm,send-hvpipe-msg RTAS and returns the RTAS status
to the user space.
ret = poll(fd,...)
- The HVPIPE event message IRQ wakes up for any waiting FDs.
length = read(fd, buf, size);
- OS issues ibm,receive-hvpipe-msg to receive payload from the
hypervisor.
release(fd);
- OS issues ibm,receive-hvpipe-msg if any payload is pending so
that pipe is not blocked.
The actual implementation of these calls are added in the
next patches.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Shashank MS <shashank.gowda@in.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250909084402.1488456-4-haren@linux.ibm.com
PowerPC FW introduced HVPIPE RTAS calls such as
ibm,send-hvpipe-msg and ibm,receive-hvpipe-msg for the user space
to exchange information with different sources such as Hardware
Management Consoles (HMC).
HVPIPE_IOC_CREATE_HANDLE is defined to use /dev/papr-hvpipe
interface for ibm,send-hvpipe-msg and ibm,receive-hvpipe-msg
RTAS calls.
Also defined papr_hvpipe_hdr which will added in the payload
that is passed between the kernel and the user space.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Shashank MS <shashank.gowda@in.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250909084402.1488456-2-haren@linux.ibm.com
Remove legacy-of-mm-gpiochip.h header file. The above mentioned
file provides an OF API that's deprecated. There is no agnostic
alternatives to it and we have to open code the logic which was
hidden behind of_mm_gpiochip_add_data(). Note, most of the GPIO
drivers are using their own labeling schemas and resource retrieval
that only a few may gain of the code deduplication, so whenever
alternative is appear we can move drivers again to use that one.
[text copied from commit 34064c8267 ("powerpc/8xx:
Drop legacy-of-mm-gpiochip.h header")]
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/9652736ef05b94d9113ea5ce7899734ef82343d1.1755520794.git.christophe.leroy@csgroup.eu
Remove legacy-of-mm-gpiochip.h header file. The above mentioned
file provides an OF API that's deprecated. There is no agnostic
alternatives to it and we have to open code the logic which was
hidden behind of_mm_gpiochip_add_data(). Note, most of the GPIO
drivers are using their own labeling schemas and resource retrieval
that only a few may gain of the code deduplication, so whenever
alternative is appear we can move drivers again to use that one.
As a side effect this change fixes a potential memory leak on
an error path, if of_mm_gpiochip_add_data() fails.
[Text copied from commit 34064c8267 ("powerpc/8xx: Drop
legacy-of-mm-gpiochip.h header")]
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/2f8f16eac72b9ec202b6e593939b44308891a661.1755519343.git.christophe.leroy@csgroup.eu
Remove legacy-of-mm-gpiochip.h header file. The above mentioned
file provides an OF API that's deprecated. There is no agnostic
alternatives to it and we have to open code the logic which was
hidden behind of_mm_gpiochip_add_data(). Note, most of the GPIO
drivers are using their own labeling schemas and resource retrieval
that only a few may gain of the code deduplication, so whenever
alternative is appear we can move drivers again to use that one.
As a side effect this change fixes a potential memory leak on
an error path, if of_mm_gpiochip_add_data() fails.
[text copied from commit 34064c8267 ("powerpc/8xx:
Drop legacy-of-mm-gpiochip.h header")]
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/2662f24c539db393f11b27f0feae2dc14bb2f08f.1755518891.git.christophe.leroy@csgroup.eu
SPRN_M_TWB contains the address of task PGD minus an offset which
compensates the offset required when accessing the kernel PGDIR.
However, since commit ac9f97ff8b ("powerpc/8xx: Inconditionally use
task PGDIR in DTLB misses") and commit 33c527522f ("powerpc/8xx:
Inconditionally use task PGDIR in ITLB misses") kernel PGDIR is not
used anymore in hot paths.
Remove this offset which was added by
commit fde5a9057f ("powerpc/8xx: Optimise access to swapper_pg_dir")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[maddy: Fixed checkpatch.pl warning for "pathes"]
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/9710d960b512996e64beebfd368cfeaadb28b3ba.1755509047.git.christophe.leroy@csgroup.eu
powerpc supports BPF atomic operations using a loop around
Load-And-Reserve(LDARX/LWARX) and Store-Conditional(STDCX/STWCX)
instructions gated by sync instructions to enforce full ordering.
To implement arena_atomics, arena vm start address is added to the
dst_reg to be used for both the LDARX/LWARX and STDCX/STWCX instructions.
Further, an exception table entry is added for LDARX/LWARX
instruction to land after the loop on fault. At the end of sequence,
dst_reg is restored by subtracting arena vm start address.
bpf_jit_supports_insn() is introduced to selectively enable instruction
support as in other architectures like x86 and arm64.
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250904100835.1100423-5-skb99@linux.ibm.com
The existing code for emitting bpf atomic instruction sequences for
atomic operations such as XCHG, CMPXCHG, ADD, AND, OR, and XOR has been
refactored into a reusable function, bpf_jit_emit_ppc_atomic_op().
It also computes the jump offset and tracks the instruction index for jited
LDARX/LWARX to be used in case it causes a fault.
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250904100835.1100423-4-skb99@linux.ibm.com
LLVM generates bpf_addr_space_cast instruction while translating
pointers between native (zero) address space and
__attribute__((address_space(N))). The addr_space=0 is reserved as
bpf_arena address space.
rY = addr_space_cast(rX, 0, 1) is processed by the verifier and
converted to normal 32-bit move: wX = wY.
rY = addr_space_cast(rX, 1, 0) : used to convert a bpf arena pointer to
a pointer in the userspace vma. This has to be converted by the JIT.
PPC_RAW_RLDICL_DOT, a variant of PPC_RAW_RLDICL is introduced to set
condition register as well.
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250904100835.1100423-3-skb99@linux.ibm.com
Add support for [LDX | STX | ST], PROBE_MEM32, [B | H | W | DW]
instructions. They are similar to PROBE_MEM instructions with the
following differences:
- PROBE_MEM32 supports store.
- PROBE_MEM32 relies on the verifier to clear upper 32-bit of the
src/dst register
- PROBE_MEM32 adds 64-bit kern_vm_start address (which is stored in _R26
in the prologue). Due to bpf_arena constructions such _R26 + reg +
off16 access is guaranteed to be within arena virtual range, so no
address check at run-time.
- PROBE_MEM32 allows STX and ST. If they fault the store is a nop. When
LDX faults the destination register is zeroed.
To support these on powerpc, we do tmp1 = _R26 + src/dst reg and then use
tmp1 as the new src/dst register. This allows us to reuse most of the
code for normal [LDX | STX | ST].
Additionally, bpf_jit_emit_probe_mem_store() is introduced to emit
instructions for storing memory values depending on the size (byte,
halfword, word, doubleword).
Stack layout is adjusted to introduce a new NVR (_R26) and to make
BPF_PPC_STACKFRAME quadword aligned (local_tmp_var is increased by
8 bytes).
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250904100835.1100423-2-skb99@linux.ibm.com
In the upstream commit 214c0eea43
("kbuild: add $(objtree)/ prefix to some in-kernel build artifacts")
artifacts required for building out-of-tree kernel modules had
$(objtree) prepended to them to prepare for building in other
directories.
When building external modules for powerpc,
arch/powerpc/lib/crtsavres.o is required for certain
configurations. This artifact is missing the prepended $(objtree).
Fixes: 13b25489b6 ("kbuild: change working directory to external module directory with M=")
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
Tested-by: Nicolas Schier <n.schier@avm.de>
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250218-buildfix-extmod-powerpc-v2-1-1e78fcf12b56@efficios.com
The calculations for operand/opcode/macro numbers are done in an
identical manner to the already existing ARRAY_SIZE macro in
linux/array_size.h
This patch replaces the sizeof calculations with the macro to make the
code cleaner and more immediately obvious what it is doing.
Signed-off-by: Ruben Wauters <rubenru09@aol.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250719225225.2132-2-rubenru09@aol.com
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembler code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This is bad since macros starting with two underscores are names
that are reserved by the C language. It can also be very confusing
for the developers when switching between userspace and kernelspace
coding, or when dealing with uapi headers that rather should use
__ASSEMBLER__ instead. So let's standardize now on the __ASSEMBLER__
macro that is provided by the compilers.
This is almost a completely mechanical patch (done with a simple
"sed -i" statement), apart from tweaking two comments manually in
arch/powerpc/include/asm/bug.h and arch/powerpc/include/asm/kasan.h
(which did not have proper underscores at the end) and fixing a
checkpatch error about spaces in arch/powerpc/include/asm/spu_csa.h.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250801082007.32904-3-thuth@redhat.com
__ASSEMBLY__ is only defined by the Makefile of the kernel, so
this is not really useful for uapi headers (unless the userspace
Makefile defines it, too). Let's switch to __ASSEMBLER__ which
gets set automatically by the compiler when compiling assembler
code.
This is a completely mechanical patch (done with a simple "sed -i"
statement).
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250801082007.32904-2-thuth@redhat.com
Pull x86 fixes from Borislav Petkov:
- Convert the SSB mitigation to the attack vector controls which got
forgotten at the time
- Prevent the CPUID topology hierarchy detection on AMD from
overwriting the correct initial APIC ID
- Fix the case of a machine shipping without microcode in the BIOS, in
the AMD microcode loader
- Correct the Pentium 4 model range which has a constant TSC
* tag 'x86_urgent_for_v6.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Add attack vector controls for SSB
x86/cpu/topology: Use initial APIC ID from XTOPOLOGY leaf on AMD/HYGON
x86/microcode/AMD: Handle the case of no BIOS microcode
x86/cpu/intel: Fix the constant_tsc model check for Pentium 4
Pull scheduler fixes from Borislav Petkov:
- Fix a stall on the CPU offline path due to mis-counting a deadline
server task twice as part of the runqueue's running tasks count
- Fix a realtime tasks starvation case where failure to enqueue a timer
whose expiration time is already in the past would cause repeated
attempts to re-enqueue a deadline server task which leads to starving
the former, realtime one
- Prevent a delayed deadline server task stop from breaking the
per-runqueue bandwidth tracking
- Have a function checking whether the deadline server task has
stopped, return the correct value
* tag 'sched_urgent_for_v6.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Don't count nr_running for dl_server proxy tasks
sched/deadline: Fix RT task potential starvation when expiry time passed
sched/deadline: Always stop dl-server before changing parameters
sched/deadline: Fix dl_server_stopped()
Pull irq fixes from Borislav Petkov:
- Remove unnecessary and noisy WARN_ONs in gic-v5's init path
- Avoid a kmemleak false positive for the gic-v5's L2 IST table entries
- Fix a retval check in mvebu-gicp's probe function
- Fix a wrong conversion to guards in atmel-aic[5] irqchip
* tag 'irq_urgent_for_v6.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v5: Remove undue WARN_ON()s in the IRS affinity parsing
irqchip/gic-v5: Fix kmemleak L2 IST table entries false positives
irqchip/mvebu-gicp: Fix an IS_ERR() vs NULL check in probe()
irqchip/atmel-aic[5]: Fix incorrect lock guard conversion
Pull hardening fixes from Kees Cook:
- ARM: stacktrace: include asm/sections.h in asm/stacktrace.h (Arnd
Bergmann)
- ubsan: Fix incorrect hand-side used in handle (Junhui Pei)
- hardening: Require clang 20.1.0 for __counted_by (Nathan Chancellor)
* tag 'hardening-v6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
hardening: Require clang 20.1.0 for __counted_by
ARM: stacktrace: include asm/sections.h in asm/stacktrace.h
ubsan: Fix incorrect hand-side used in handle
Pull gpio fixes from Bartosz Golaszewski:
- fix an off-by-one bug in interrupt handling in gpio-timberdale
- update MAINTAINERS
* tag 'gpio-fixes-for-v6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: Change Altera-PIO driver maintainer
gpio: timberdale: fix off-by-one in IRQ type boundary check
Pull arm64 fixes from Catalin Marinas:
- CFI failure due to kpti_ng_pgd_alloc() signature mismatch
- Underallocation bug in the SVE ptrace kselftest
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
kselftest/arm64: Don't open code SVE_PT_SIZE() in fp-ptrace
arm64: mm: Fix CFI failure due to kpti_ng_pgd_alloc function signature
In fp-trace when allocating a buffer to write SVE register data we open
code the addition of the header size to the VL depeendent register data
size, which lead to an underallocation bug when we cut'n'pasted the code
for FPSIMD format writes. Use the SVE_PT_SIZE() macro that the kernel
UAPI provides for this.
Fixes: b84d2b2795 ("kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250812-arm64-fp-trace-macro-v1-1-317cfff986a5@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>