Commit Graph

234440 Commits

Author SHA1 Message Date
Clément Léger
ca1a66cdd6 riscv: uaccess: do not do misaligned accesses in get/put_user()
Doing misaligned access to userspace memory would make a trap on
platform where it is emulated. Latest fixes removed the kernel
capability to do unaligned accesses to userspace memory safely since
interrupts are kept disabled at all time during that. Thus doing so
would crash the kernel.

Such behavior was detected with GET_UNALIGN_CTL() that was doing
a put_user() with an unsigned long* address that should have been an
unsigned int*. Reenabling kernel misaligned access emulation is a bit
risky and it would also degrade performances. Rather than doing that,
we will try to avoid any misaligned accessed by using copy_from/to_user()
which does not do any misaligned accesses. This can be done only for
!CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and thus allows to only generate
a bit more code for this config.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250602193918.868962-4-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:39:17 -07:00
Linus Torvalds
e9e668cd27 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
 "We've got a couple of build fixes when using LLD, a missing TLB
  invalidation and a workaround for broken firmware on SoCs with CPUs
  that implement MPAM:

   - Disable problematic linker assertions for broken versions of LLD

   - Work around sporadic link failure with LLD and various randconfig
     builds

   - Fix missing invalidation in the TLB batching code when reclaim
     races with mprotect() and friends

   - Add a command-line override for MPAM to allow booting on systems
     with broken firmware"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Add override for MPAM
  arm64/mm: Close theoretical race where stale TLB entry remains valid
  arm64: Work around convergence issue with LLD linker
  arm64: Disable LLD linker ASSERT()s for the time being
2025-06-05 11:39:17 -07:00
Clément Léger
020667d661 riscv: process: use unsigned int instead of unsigned long for put_user()
The specification of prctl() for GET_UNALIGN_CTL states that the value is
returned in an unsigned int * address passed as an unsigned long. Change
the type to match that and avoid an unaligned access as well.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250602193918.868962-3-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:39:16 -07:00
Alexandre Ghiti
a434854633 riscv: make unsafe user copy routines use existing assembly routines
The current implementation is underperforming and in addition, it
triggers misaligned access traps on platforms which do not handle
misaligned accesses in hardware.

Use the existing assembly routines to solve both problems at once.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250602193918.868962-2-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:39:15 -07:00
Linus Torvalds
aef7457540 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM fixes from Russell King:

 - Fix arch_memremap_can_ram_remap() which incorrectly passed a PFN to
   memblock_is_map_memory rather than the actual address.

 - Disallow kernel mode NEON when IRQs are disabled

   Explanation:

     "To avoid having to preserve/restore kernel mode NEON state when
      such a softirq is taken softirqs are now disabled when using the
      NEON from task context."

   should explain that it's nested kernel mode.

   In other words, softirqs from user mode are fine, because the context
   will be preserved. softirqs from kernel mode may be from a context
   that has already saved the user NEON state, and thus we would need to
   preserve the NEON state for the parent kernel mode context, and this
   we don't allow.

   The problem occurs when the kernel context disables hard IRQs, and
   then uses NEON. When it's finished, and restores the userspace NEON
   state, we call local_bh_enable() with hard IRQs disabled, which
   causes a warning.

   This commit addresses that by disallowing the use of NEON with hard
   IRQs disabled.

	https://lore.kernel.org/all/20250516231858.27899-4-ebiggers@kernel.org/T/#m104841b6e9346b1814c8b0fb9f2340551b0cd3e8

   has some further context

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9446/1: Disallow kernel mode NEON when IRQs are disabled
  ARM: 9447/1: arm/memremap: fix arch_memremap_can_ram_remap()
2025-06-05 11:33:09 -07:00
Alexandre Ghiti
415a8c81da riscv: hwprobe: export Zabha extension
Export Zabha through the hwprobe syscall.

Reviewed-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20250421141413.394444-1-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:10:18 -07:00
Tiezhu Yang
d7e0cce103 riscv: Make regs_irqs_disabled() more clear
The return value of regs_irqs_disabled() is true or false, so change
its type to reflect that and also make it always inline.

Suggested-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250422113156.25742-1-yangtiezhu@loongson.cn
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:10:17 -07:00
Palmer Dabbelt
9eb9ea31ff Merge patch series "riscv: kexec_file: Support loading Image binary file"
Björn Töpel <bjorn@kernel.org> says:

From: Björn Töpel <bjorn@rivosinc.com>

Hi!

For over a year ago, Daniel and I was testing the V2 of Song's series.
I also promised to take the V2, that had been sitting on the lists for
too long, to rebase it on a new kernel, and re-test it.

One year later, here's the V3! ;-)

There are no changes from V2 other, than some simple checkpatch
cleanups.

Song's original cover:
  | This series makes the kexec_file_load() syscall support to load
  | Image binary file. At the same time, corresponding support for
  | kexec-tools had been pushed to my repo[2].
  |
  | Now, we can leverage that kexec-tools and this series to use the
  | kexec_load() or kexec_file_load() syscall to boot both vmlinux and
  | Image file, as seen in these combo tests:
  |
  | ```
  | 1. kexec -l vmlinux
  | 2. kexec -l Image
  | 3. kexec -s -l vmlinux
  | 4. kexec -s -l Image
  | ```

Notably, kexec-tools has still not made it upstream. I've prepared a
branch on my GH [3], that I indend to post ASAP. That branch is a
collection of fixes/features, including Song's userland Image loading.

The V2 is here [2], and V1 [1].

I've tested the kexec-file/Image on qemu-rv64, with following
combinations:
 * ACPI/UEFI
 * DT/UEFI
 * DT

both "regular" kexec (-s + -e), and crashkernels (-p).

Note that there are two purgatory patches that has to be present (part
of -rc1, so all good):
  commit 28093cfef5 ("riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator")
  commit 3f7023171d ("riscv/purgatory: 4B align purgatory_start")

[1] https://lore.kernel.org/linux-riscv/20230914020044.1397356-1-songshuaishuai@tinylab.org/
[2] https://lore.kernel.org/linux-riscv/20231016092006.3347632-1-songshuaishuai@tinylab.org/
[3] https://github.com/bjoto/kexec-tools/tree/rv-on-master

* patches from https://lore.kernel.org/r/20250409193004.643839-1-bjorn@kernel.org:
  riscv: kexec_file: Support loading Image binary file
  riscv: kexec_file: Split the loading of kernel and others

Link: https://lore.kernel.org/r/20250409193004.643839-1-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:10:11 -07:00
谢致邦 (XIE Zhibang)
48d9aabf2d RISC-V: Kconfig: Fix help text of CMDLINE_EXTEND
It is the built-in command line appended to the bootloader command line,
not the bootloader command line appended to the built-in command line.

Fixes: 3aed8c4326 ("RISC-V: Update Kconfig to better handle CMDLINE")
Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com>
Link: https://lore.kernel.org/r/tencent_A93C7FB46BFD20054AD2FEF4645913FF550A@qq.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:44 -07:00
Samuel Holland
be17c0df67 riscv: module: Optimize PLT/GOT entry counting
perf reports that 99.63% of the cycles from `modprobe amdgpu` are spent
inside module_frob_arch_sections(). This is because amdgpu.ko contains
about 300000 relocations in its .rela.text section, and the algorithm in
count_max_entries() takes quadratic time.

Apply two optimizations from the arm64 code, which together reduce the
total execution time by 99.58%. First, sort the relocations so duplicate
entries are adjacent. Second, reduce the number of relocations that must
be sorted by filtering to only relocations that need PLT/GOT entries, as
done in commit d4e0340919 ("arm64/module: Optimize module load time by
optimizing PLT counting").

Unlike the arm64 code, here the filtering and sorting is done in a
scratch buffer, because the HI20 relocation search optimization in
apply_relocate_add() depends on the original order of the relocations.
This allows accumulating PLT/GOT relocations across sections so sorting
and counting is only done once per module.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250409171526.862481-3-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:43 -07:00
Alexandre Ghiti
881dadf079 Merge patch series "riscv: ftrace: atmoic patching and preempt improvements"
Andy Chiu <andybnac@gmail.com> says:

This series makes atomic code patching in ftrace possible and eliminates
the need of the stop_machine dance. The major difference of this version
is that we merge the CALL_OPS support from Puranjay [1] and make direct
calls available for practical uses such as BPF. Thanks for the time
reviewing the series and suggestions, we hope this version gets a step
closer to happening in the upstream.

Please reference the link to v3 below for more introductory view of the
implementation [2]

Added patch: 2, 4, 10, 11, 12
Modified patch: 5, 6
Unchanged patch: 1, 3, 7, 8, 9
(1, 8 has commit msg modified)

Special thanks to Björn for his efforts on testing and guiding the
series!

[1]: https://lore.kernel.org/lkml/20240306165904.108141-1-puranjay12@gmail.com/
[2]: https://lore.kernel.org/linux-riscv/20241127172908.17149-1-andybnac@gmail.com/

* patches from https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com:
  riscv: Documentation: add a description about dynamic ftrace
  riscv: ftrace: support direct call using call_ops
  riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
  riscv: ftrace: support PREEMPT
  riscv: add a data fence for CMODX in the kernel mode
  riscv: vector: Support calling schedule() for preemptible Vector
  riscv: ftrace: do not use stop_machine to update code
  riscv: ftrace: prepare ftrace for atomic code patching
  kernel: ftrace: export ftrace_sync_ipi
  riscv: ftrace: align patchable functions to 4 Byte boundary
  riscv: ftrace factor out code defined by !WITH_ARG
  riscv: ftrace: support fastcc in Clang for WITH_ARGS

Link: https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-06-05 11:09:41 -07:00
Alexandre Ghiti
c3cc2a4a3a riscv: Add support for PUD THP
Add the necessary page table functions to deal with PUD THP, this
enables the use of PUD pfnmap.

Link: https://lore.kernel.org/r/20250321123954.225097-1-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:40 -07:00
Guo Ren
eb87e56d65 riscv: xchg: Prefetch the destination word for sc.w
The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch makes use of prefetch.w to prefetch cachelines for write
prior to lr/sc loops when using the xchg_small atomic routine.

This patch is inspired by commit 0ea366f5e1 ("arm64: atomics:
prefetch the destination word for write prior to stxr").

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20231231082955.16516-4-guoren@kernel.org
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-5-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:39 -07:00
Guo Ren
a5f947c731 riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop
Enable Linux prefetch and prefetchw primitives using Zicbop.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20231231082955.16516-3-guoren@kernel.org
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-4-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:38 -07:00
Alexandre Ghiti
8d496b5a98 riscv: Add support for Zicbop
Zicbop introduces cache blocks prefetching instructions, add the
necessary support for the kernel to use it in the coming commits.

Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-3-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:37 -07:00
Alexandre Ghiti
f0f4e64b9e riscv: Introduce Zicbop instructions
The S-type instructions are first introduced and then used to define the
encoding of the Zicbop prefetching instructions.

Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-2-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:36 -07:00
Yao Zi
850d7b14c8 riscv/kexec_file: Fix comment in purgatory relocator
Apparently sec_base doesn't mean relocated symbol value, which seems a
copy-pasting error in the comment. Assigned with the address of section
indexed by sym->st_shndx, it should represent base address of the
relevant section. Let's fix the comment to avoid possible confusion.

Fixes: 838b3e2848 ("RISC-V: Load purgatory in kexec_file")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250326073450.57648-2-ziyao@disroot.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:35 -07:00
Song Shuai
809a11eea8 riscv: kexec_file: Support loading Image binary file
This patch creates image_kexec_ops to load Image binary file
for kexec_file_load() syscall.

Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250409193004.643839-3-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:34 -07:00
Song Shuai
1df45f8a9f riscv: kexec_file: Split the loading of kernel and others
This is the preparative patch for kexec_file_load Image support.

It separates the elf_kexec_load() as two parts:
- the first part loads the vmlinux (or Image)
- the second part loads other segments (e.g. initrd,fdt,purgatory)

And the second part is exported as the load_extra_segments() function
which would be used in both kexec-elf.c and kexec-image.c.

No functional change intended.

Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250409193004.643839-2-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:33 -07:00
Andy Chiu
b21cdb9523 riscv: ftrace: support direct call using call_ops
jump to FTRACE_ADDR if distance is out of reach

Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20250407180838.42877-11-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:31 -07:00
Puranjay Mohan
c217157bcd riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V.
This allows each ftrace callsite to provide an ftrace_ops to the common
ftrace trampoline, allowing each callsite to invoke distinct tracer
functions without the need to fall back to list processing or to
allocate custom trampolines for each callsite. This significantly speeds
up cases where multiple distinct trace functions are used and callsites
are mostly traced by a single tracer.

The idea and most of the implementation is taken from the ARM64's
implementation of the same feature. The idea is to place a pointer to
the ftrace_ops as a literal at a fixed offset from the function entry
point, which can be recovered by the common ftrace trampoline.

We use -fpatchable-function-entry to reserve 8 bytes above the function
entry by emitting 2 4 byte or 4 2 byte  nops depending on the presence of
CONFIG_RISCV_ISA_C. These 8 bytes are patched at runtime with a pointer
to the associated ftrace_ops for that callsite. Functions are aligned to
8 bytes to make sure that the accesses to this literal are atomic.

This approach allows for directly invoking ftrace_ops::func even for
ftrace_ops which are dynamically-allocated (or part of a module),
without going via ftrace_ops_list_func.

We've benchamrked this with the ftrace_ops sample module on Spacemit K1
Jupiter:

Without this patch:

baseline (Linux rivos 6.14.0-09584-g7d06015d936c #3 SMP Sat Mar 29
+-----------------------+-----------------+----------------------------+
|  Number of tracers    | Total time (ns) | Per-call average time      |
|-----------------------+-----------------+----------------------------|
| Relevant | Irrelevant |    100000 calls | Total (ns) | Overhead (ns) |
|----------+------------+-----------------+------------+---------------|
|        0 |          0 |        1357958 |          13 |             - |
|        0 |          1 |        1302375 |          13 |             - |
|        0 |          2 |        1302375 |          13 |             - |
|        0 |         10 |        1379084 |          13 |             - |
|        0 |        100 |        1302458 |          13 |             - |
|        0 |        200 |        1302333 |          13 |             - |
|----------+------------+-----------------+------------+---------------|
|        1 |          0 |       13677833 |         136 |           123 |
|        1 |          1 |       18500916 |         185 |           172 |
|        1 |          2 |       22856459 |         228 |           215 |
|        1 |         10 |       58824709 |         588 |           575 |
|        1 |        100 |      505141584 |        5051 |          5038 |
|        1 |        200 |     1580473126 |       15804 |         15791 |
|----------+------------+-----------------+------------+---------------|
|        1 |          0 |       13561000 |         135 |           122 |
|        2 |          0 |       19707292 |         197 |           184 |
|       10 |          0 |       67774750 |         677 |           664 |
|      100 |          0 |      714123125 |        7141 |          7128 |
|      200 |          0 |     1918065668 |       19180 |         19167 |
+----------+------------+-----------------+------------+---------------+

Note: per-call overhead is estimated relative to the baseline case with
0 relevant tracers and 0 irrelevant tracers.

With this patch:

v4-rc4 (Linux rivos 6.14.0-09598-gd75747611c93 #4 SMP Sat Mar 29
+-----------------------+-----------------+----------------------------+
|  Number of tracers    | Total time (ns) | Per-call average time      |
|-----------------------+-----------------+----------------------------|
| Relevant | Irrelevant |    100000 calls | Total (ns) | Overhead (ns) |
|----------+------------+-----------------+------------+---------------|
|        0 |          0 |         1459917 |         14 |             - |
|        0 |          1 |         1408000 |         14 |             - |
|        0 |          2 |         1383792 |         13 |             - |
|        0 |         10 |         1430709 |         14 |             - |
|        0 |        100 |         1383791 |         13 |             - |
|        0 |        200 |         1383750 |         13 |             - |
|----------+------------+-----------------+------------+---------------|
|        1 |          0 |         5238041 |         52 |            38 |
|        1 |          1 |         5228542 |         52 |            38 |
|        1 |          2 |         5325917 |         53 |            40 |
|        1 |         10 |         5299667 |         52 |            38 |
|        1 |        100 |         5245250 |         52 |            39 |
|        1 |        200 |         5238459 |         52 |            39 |
|----------+------------+-----------------+------------+---------------|
|        1 |          0 |         5239083 |         52 |            38 |
|        2 |          0 |        19449417 |        194 |           181 |
|       10 |          0 |        67718584 |        677 |           663 |
|      100 |          0 |       709840708 |       7098 |          7085 |
|      200 |          0 |      2203580626 |      22035 |         22022 |
+----------+------------+-----------------+------------+---------------+

Note: per-call overhead is estimated relative to the baseline case with
0 relevant tracers and 0 irrelevant tracers.

As can be seen from the above:

 a) Whenever there is a single relevant tracer function associated with a
    tracee, the overhead of invoking the tracer is constant, and does not
    scale with the number of tracers which are *not* associated with that
    tracee.

 b) The overhead for a single relevant tracer has dropped to ~1/3 of the
    overhead prior to this series (from 122ns to 38ns). This is largely
    due to permitting calls to dynamically-allocated ftrace_ops without
    going through ftrace_ops_list_func.

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>

[update kconfig, asm, refactor]

Signed-off-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-10-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:30 -07:00
Andy Chiu
d0262e907e riscv: ftrace: support PREEMPT
Now, we can safely enable dynamic ftrace with kernel preemption.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-9-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:29 -07:00
Andy Chiu
ca358692de riscv: add a data fence for CMODX in the kernel mode
RISC-V spec explicitly calls out that a local fence.i is not enough for
the code modification to be visble from a remote hart. In fact, it
states:

To make a store to instruction memory visible to all RISC-V harts, the
writing hart also has to execute a data FENCE before requesting that all
remote RISC-V harts execute a FENCE.I.

Although current riscv drivers for IPI use ordered MMIO when sending IPIs
in order to synchronize the action between previous csd writes, riscv
does not restrict itself to any particular flavor of IPI. Any driver or
firmware implementation that does not order data writes before the IPI
may pose a risk for code-modifying race.

Thus, add a fence here to order data writes before making the IPI.

Signed-off-by: Andy Chiu <andybnac@gmail.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-8-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:28 -07:00
Andy Chiu
d1049fc0de riscv: vector: Support calling schedule() for preemptible Vector
Each function entry implies a call to ftrace infrastructure. And it may
call into schedule in some cases. So, it is possible for preemptible
kernel-mode Vector to implicitly call into schedule. Since all V-regs
are caller-saved, it is possible to drop all V context when a thread
voluntarily call schedule(). Besides, we currently don't pass argument
through vector register, so we don't have to save/restore V-regs in
ftrace trampoline.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20250407180838.42877-7-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:27 -07:00
Andy Chiu
5aa4ef9558 riscv: ftrace: do not use stop_machine to update code
Now it is safe to remove dependency from stop_machine() for us to patch
code in ftrace.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20250407180838.42877-6-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:26 -07:00
Andy Chiu
b2137c3b6d riscv: ftrace: prepare ftrace for atomic code patching
We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
instruction fetch can break down to 4 byte at a time, it is impossible
to update two instructions without a race. In order to mitigate it, we
initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
patching can change NOP4 to JALR to eable/disable ftrcae from a
function. This limits the reach of each ftrace entry to +-2KB displacing
from ftrace_caller.

Starting from the trampoline, we add a level of indirection for it to
reach ftrace caller target. Now, it loads the target address from a
memory location, then perform the jump. This enable the kernel to update
the target atomically.

The new don't-stop-the-world text patching on change only one RISC-V
instruction:

  |  -8: &ftrace_ops of the associated tracer function.
  | <ftrace enable>:
  |   0: auipc  t0, hi(ftrace_caller)
  |   4: jalr   t0, lo(ftrace_caller)
  |
  |  -8: &ftrace_nop_ops
  | <ftrace disable>:
  |   0: auipc  t0, hi(ftrace_caller)
  |   4: nop

This means that f+0x0 is fixed, and should not be claimed by ftrace,
e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the
offset and MCOUNT_INSN_SIZE accordingly.

[ alex: Fix build errors with !CONFIG_DYNAMIC_FTRACE ]

Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20250407180838.42877-5-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:25 -07:00
Andy Chiu
c41bf4326c riscv: ftrace: align patchable functions to 4 Byte boundary
We are changing ftrace code patching in order to remove dependency from
stop_machine() and enable kernel preemption. This requires us to align
functions entry at a 4-B align address.

However, -falign-functions on older versions of GCC alone was not strong
enoungh to align all functions. In fact, cold functions are not aligned
after turning on optimizations. We consider this is a bug in GCC and
turn off guess-branch-probility as a workaround to align all functions.

GCC bug id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

The option -fmin-function-alignment is able to align all functions
properly on newer versions of gcc. So, we add a cc-option to test if
the toolchain supports it.

Suggested-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-3-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:23 -07:00
Andy Chiu
54ecbc8d85 riscv: ftrace factor out code defined by !WITH_ARG
DYNAMIC_FTRACE selects DYNAMIC_FTRACE_WITH_ARGS and mcount-dyn.S in
riscv, so we can remove ifdef jargons of WITH_ARG when it is known that
DYNAMIC_FTRACE is true.

Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20250407180838.42877-2-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:22 -07:00
Andy Chiu
f8693f6dff riscv: ftrace: support fastcc in Clang for WITH_ARGS
Some caller-saved registers which are not defined as function arguments
in the ABI can still be passed as arguments when the kernel is compiled
with Clang. As a result, we must save and restore those registers to
prevent ftrace from clobbering them.

- [1]: https://reviews.llvm.org/D68559

Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Closes: https://lore.kernel.org/linux-riscv/7e7c7914-445d-426d-89a0-59a9199c45b1@yadro.com/
Fixes: 7caa976546 ("ftrace: riscv: move from REGS to ARGS")
Acked-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 11:09:21 -07:00
Linus Torvalds
bfdf35c5dc Merge tag 'dmaengine-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
 "A fairly small update for the dmaengine subsystem. This has a new ARM
  dmaengine driver and couple of new device support and few driver
  changes:

  New support:
   - Renesas RZ/V2H(P) dma support for r9a09g057
   - Arm DMA-350 driver
   - Tegra Tegra264 ADMA support

  Updates:
   - AMD ptdma driver code removal and optimizations
   - Freescale edma error interrupt handler support"

* tag 'dmaengine-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (27 commits)
  dmaengine: idxd: Remove unused pointer and macro
  arm64: dts: renesas: r9a09g057: Add DMAC nodes
  dmaengine: sh: rz-dmac: Add RZ/V2H(P) support
  dmaengine: sh: rz-dmac: Allow for multiple DMACs
  irqchip/renesas-rzv2h: Add rzv2h_icu_register_dma_req()
  dt-bindings: dma: rz-dmac: Document RZ/V2H(P) family of SoCs
  dt-bindings: dma: rz-dmac: Restrict properties for RZ/A1H
  dmaengine: idxd: Narrow the restriction on BATCH to ver. 1 only
  dmaengine: ti: Add NULL check in udma_probe()
  fsldma: Set correct dma_mask based on hw capability
  dmaengine: idxd: Check availability of workqueue allocated by idxd wq driver before using
  dmaengine: xilinx_dma: Set dma_device directions
  dmaengine: tegra210-adma: Add Tegra264 support
  dt-bindings: Document Tegra264 ADMA support
  dmaengine: dw-edma: Add HDMA NATIVE map check
  dmaegnine: fsl-edma: add edma error interrupt handler
  dt-bindings: dma: fsl-edma: increase maxItems of interrupts and interrupt-names
  dmaengine: ARM_DMA350 should depend on ARM/ARM64
  dt-bindings: dma: qcom,bam: Document dma-coherent property
  dmaengine: Add Arm DMA-350 driver
  ...
2025-06-05 08:49:30 -07:00
Marc Zyngier
b5fa1f91e1 KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operand
Now that we don't have any use of __vcpu_sys_reg() as a lvalue,
remove the in-place update, and directly return the sanitised
value.

Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-5-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05 14:18:01 +01:00
Marc Zyngier
b61fae4ee1 KVM: arm64: Don't use __vcpu_sys_reg() to get the address of a sysreg
We are about to prevent the use of __vcpu_sys_reg() as a lvalue,
and getting the address of a rvalue is not a thing.

Update the couple of places where we do this to use the __ctxt_sys_reg()
accessor, which return the address of a register.

Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05 14:18:01 +01:00
Marc Zyngier
8800b7c4bb KVM: arm64: Add RMW specific sysreg accessor
In a number of cases, we perform a Read-Modify-Write operation on
a system register, meaning that we would apply the RESx masks twice.

Instead, provide a new accessor that performs this RMW operation,
allowing the masks to be applied exactly once per operation.

Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05 14:18:01 +01:00
Marc Zyngier
6678791ee3 KVM: arm64: Add assignment-specific sysreg accessor
Assigning a value to a system register doesn't do what it is
supposed to be doing if that register is one that has RESx bits.

The main problem is that we use __vcpu_sys_reg(), which can be used
both as a lvalue and rvalue. When used as a lvalue, the bit masking
occurs *before* the new value is assigned, meaning that we (1) do
pointless work on the old cvalue, and (2) potentially assign an
invalid value as we fail to apply the masks to it.

Fix this by providing a new __vcpu_assign_sys_reg() that does
what it says on the tin, and sanitises the *new* value instead of
the old one. This comes with a significant amount of churn.

Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05 14:17:32 +01:00
Heiko Carstens
11709abccf s390/mm: Fix in_atomic() handling in do_secure_storage_access()
Kernel user spaces accesses to not exported pages in atomic context
incorrectly try to resolve the page fault.
With debug options enabled call traces like this can be seen:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1523
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 419074, name: qemu-system-s39
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
Preemption disabled at:
[<00000383ea47cfa2>] copy_page_from_iter_atomic+0xa2/0x8a0
CPU: 12 UID: 0 PID: 419074 Comm: qemu-system-s39
Tainted: G        W           6.16.0-20250531.rc0.git0.69b3a602feac.63.fc42.s390x+debug #1 PREEMPT
Tainted: [W]=WARN
Hardware name: IBM 3931 A01 703 (LPAR)
Call Trace:
 [<00000383e990d282>] dump_stack_lvl+0xa2/0xe8
 [<00000383e99bf152>] __might_resched+0x292/0x2d0
 [<00000383eaa7c374>] down_read+0x34/0x2d0
 [<00000383e99432f8>] do_secure_storage_access+0x108/0x360
 [<00000383eaa724b0>] __do_pgm_check+0x130/0x220
 [<00000383eaa842e4>] pgm_check_handler+0x114/0x160
 [<00000383ea47d028>] copy_page_from_iter_atomic+0x128/0x8a0
([<00000383ea47d016>] copy_page_from_iter_atomic+0x116/0x8a0)
 [<00000383e9c45eae>] generic_perform_write+0x16e/0x310
 [<00000383e9eb87f4>] ext4_buffered_write_iter+0x84/0x160
 [<00000383e9da0de4>] vfs_write+0x1c4/0x460
 [<00000383e9da123c>] ksys_write+0x7c/0x100
 [<00000383eaa7284e>] __do_syscall+0x15e/0x280
 [<00000383eaa8417e>] system_call+0x6e/0x90
INFO: lockdep is turned off.

It is not allowed to take the mmap_lock while in atomic context. Therefore
handle such a secure storage access fault as if the accessed page is not
mapped: the uaccess function will return -EFAULT, and the caller has to
deal with this. Usually this means that the access is retried in process
context, which allows to resolve the page fault (or in this case export the
page).

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20250603134936.1314139-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-06-05 12:29:22 +02:00
Benjamin Berg
e56a50ff7c um: remove "extern" from implementation of sigchld_handler
There is no need to mark the function as extern in the implementation.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506051226.X8r7X5aa-lkp@intel.com/
Fixes: 8420e08fe3 ("um: Track userspace children dying in SECCOMP mode")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250605050325.1077208-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-05 11:12:13 +02:00
Benjamin Berg
27a041040f um: fix unused variable warning
The code was updated to access the PID of the userspace stub process in
a different way, making the local cpu variable obsolete. Remove it.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506050008.AwXLNxQX-lkp@intel.com/
Fixes: 406d17c6c3 ("um: Implement kernel side of SECCOMP based process handling")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250605050325.1077208-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-05 11:12:13 +02:00
Clément Léger
7977448bf3 riscv: misaligned: add a function to check misalign trap delegability
Checking for the delegability of the misaligned access trap is needed
for the KVM FWFT extension implementation. Add a function to get the
delegability of the misaligned trap exception.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-11-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-04 15:11:07 -07:00
Clément Léger
4eaaa65e30 riscv: misaligned: move emulated access uniformity check in a function
Split the code that check for the uniformity of misaligned accesses
performance on all cpus from check_unaligned_access_emulated_all_cpus()
to its own function which will be used for delegation check. No
functional changes intended.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-10-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-04 15:11:06 -07:00
Clément Léger
1317045a7d riscv: misaligned: declare misaligned_access_speed under CONFIG_RISCV_MISALIGNED
While misaligned_access_speed was defined in a file compile with
CONFIG_RISCV_MISALIGNED, its definition was under
CONFIG_RISCV_SCALAR_MISALIGNED. This resulted in compilation problems
when using it in a file compiled with CONFIG_RISCV_MISALIGNED.

Move the declaration under CONFIG_RISCV_MISALIGNED so that it can be
used unconditionnally when compiled with that config and remove the check
for that variable in traps_misaligned.c.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-9-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-04 15:11:05 -07:00
Clément Léger
9f9f6fdd1d riscv: misaligned: use on_each_cpu() for scalar misaligned access probing
schedule_on_each_cpu() was used without any good reason while documented
as very slow. This call was in the boot path, so better use
on_each_cpu() for scalar misaligned checking. Vector misaligned check
still needs to use schedule_on_each_cpu() since it requires irqs to be
enabled but that's less of a problem since this code is ran in a kthread.
Add a comment to explicit that.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-8-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-04 15:11:04 -07:00
Clément Léger
cf5a8abc65 riscv: misaligned: request misaligned exception from SBI
Now that the kernel can handle misaligned accesses in S-mode, request
misaligned access exception delegation from SBI. This uses the FWFT SBI
extension defined in SBI version 3.0.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-7-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-04 15:11:03 -07:00
Clément Léger
c4a50db1e1 riscv: sbi: add SBI FWFT extension calls
Add FWFT extension calls. This will be ratified in SBI V3.0 hence, it is
provided as a separate commit that can be left out if needed.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-6-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-04 15:11:02 -07:00
Clément Léger
6d6d0641dc riscv: sbi: add FWFT extension interface
This SBI extensions enables supervisor mode to control feature that are
under M-mode control (For instance, Svadu menvcfg ADUE bit, Ssdbltrp
DTE, etc). Add an interface to set local features for a specific cpu
mask as well as for the online cpu mask.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-5-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-04 15:11:01 -07:00
Clément Léger
99cf5b7c73 riscv: sbi: add new SBI error mappings
A few new errors have been added with SBI V3.0, maps them as close as
possible to errno values.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-4-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-04 15:11:00 -07:00
Clément Léger
a7cd450f0e riscv: sbi: remove useless parenthesis
A few parenthesis in check for SBI version/extension were useless,
remove them.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-3-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-04 15:10:59 -07:00
Clément Léger
cf8651f731 riscv: sbi: add Firmware Feature (FWFT) SBI extensions definitions
The Firmware Features extension (FWFT) was added as part of the SBI 3.0
specification. Add SBI definitions to use this extension.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-2-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-04 15:10:58 -07:00
Linus Torvalds
3719a04a80 Merge tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Print the actual delay time in pci_bridge_wait_for_secondary_bus()
     instead of assuming it was 1000ms (Wilfred Mallawa)

   - Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI
     devices', which broke resume from system sleep on AMD platforms and
     has been fixed by other commits (Lukas Wunner)

  Resource management:

   - Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated
     and unnecessary (Philipp Stanner)

   - Remove pcim_iounmap_regions() and pcim_request_region_exclusive()
     and related flags since all uses have been removed (Philipp
     Stanner)

   - Rework devres 'request' functions so they are no longer 'hybrid',
     i.e., their behavior no longer depends on whether
     pcim_enable_device or pci_enable_device() was used, and remove
     related code (Philipp Stanner)

   - Warn (not BUG()) about failure to assign optional resources (Ilpo
     Järvinen)

  Error handling:

   - Log the DPC Error Source ID only when it's actually valid (when
     ERR_FATAL or ERR_NONFATAL was received from a downstream device)
     and decode into bus/device/function (Bjorn Helgaas)

   - Determine AER log level once and save it so all related messages
     use the same level (Karolina Stolarek)

   - Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable
     Errors (Karolina Stolarek)

   - Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs
     controls on interval and burst count, to avoid flooding logs and
     RCU stall warnings (Jon Pan-Doh)

  Power management:

   - Increment PM usage counter when probing reset methods so we don't
     try to read config space of a powered-off device (Alex Williamson)

   - Set all devices to D0 during enumeration to ensure ACPI opregion is
     connected via _REG (Mario Limonciello)

  Power control:

   - Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match
     the filename paths. Retain old deprecated symbols for
     compatibility, except for the pwrctrl slot driver
     (PCI_PWRCTRL_SLOT) (Johan Hovold)

   - When unregistering pwrctrl, cancel outstanding rescan work before
     cleaning up data structures to avoid use-after-free issues (Brian
     Norris)

  Bandwidth control:

   - Simplify link bandwidth controller by replacing the count of Link
     Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN
     flag (Ilpo Järvinen)

   - Update the Link Speed after retraining, since the Link Speed may
     have changed (Ilpo Järvinen)

  PCIe native device hotplug:

   - Ignore Presence Detect Changed caused by DPC.

     pciehp already ignores Link Down/Up events caused by DPC, but on
     slots using in-band presence detect, DPC causes a spurious Presence
     Detect Changed event (Lukas Wunner)

   - Ignore Link Down/Up caused by Secondary Bus Reset.

     On hotplug ports using in-band presence detect, the reset causes a
     Presence Detect Changed event, which mistakenly caused teardown and
     re-enumeration of the device. Drivers may need to annotate code
     that resets their device (Lukas Wunner)

  Virtualization:

   - Add an ACS quirk for Loongson Root Ports that don't advertise ACS
     but don't allow peer-to-peer transactions between Root Ports; the
     quirk allows each Root Port to be in a separate IOMMU group (Huacai
     Chen)

  Endpoint framework:

   - For fixed-size BARs, retain both the actual size and the possibly
     larger size allocated to accommodate iATU alignment requirements
     (Jerome Brunet)

   - Simplify ctrl/SPAD space allocation and avoid allocating more space
     than needed (Jerome Brunet)

   - Correct MSI-X PBA offset calculations for DesignWare and Cadence
     endpoint controllers (Niklas Cassel)

   - Align the return value (number of interrupts) encoding for
     pci_epc_get_msi()/pci_epc_ops::get_msi() and
     pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel)

   - Align the nr_irqs parameter encoding for
     pci_epc_set_msi()/pci_epc_ops::set_msi() and
     pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel)

  Common host controller library:

   - Convert pci-host-common to a library so platforms that don't need
     native host controller drivers don't need to include these helper
     functions (Manivannan Sadhasivam)

  Apple PCIe controller driver:

   - Extract ECAM bridge creation helper from pci_host_common_probe() to
     separate driver-specific things like MSI from PCI things (Marc
     Zyngier)

   - Dynamically allocate RID-to_SID bitmap to prepare for SoCs with
     varying capabilities (Marc Zyngier)

   - Skip ports disabled in DT when setting up ports (Janne Grunau)

   - Add t6020 compatible string (Alyssa Rosenzweig)

   - Add T602x PCIe support (Hector Martin)

   - Directly set/clear INTx mask bits because T602x dropped the
     accessors that could do this without locking (Marc Zyngier)

   - Move port PHY registers to their own reg items to accommodate
     T602x, which moves them around; retain default offsets for existing
     DTs that lack phy%d entries with the reg offsets (Hector Martin)

   - Stop polling for core refclk, which doesn't work on T602x and the
     bootloader has already done anyway (Hector Martin)

   - Use gpiod_set_value_cansleep() when asserting PERST# in probe
     because we're allowed to sleep there (Hector Martin)

  Cadence PCIe controller driver:

   - Drop a runtime PM 'put' to resolve a runtime atomic count underflow
     (Hans Zhang)

   - Make the cadence core buildable as a module (Kishon Vijay Abraham I)

   - Add cdns_pcie_host_disable() and cdns_pcie_ep_disable() for use by
     loadable drivers when they are removed (Siddharth Vadapalli)

  Freescale i.MX6 PCIe controller driver:

   - Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP
     (Richard Zhu)

   - Remove redundant dw_pcie_wait_for_link() from
     imx_pcie_start_link(); since the DWC core does this, imx6 only
     needs it when retraining for a faster link speed (Richard Zhu)

   - Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu)

   - Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in
     some cases, the controller can't exit 'L23 Ready' through Beacon or
     PERST# deassertion (Richard Zhu)

   - Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum:
     controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8
     GT/s, causing timeouts in L1 (Richard Zhu)

   - Wait for i.MX95 PLL lock before enabling controller (Richard Zhu)

   - Save/restore i.MX95 LUT for suspend/resume (Richard Zhu)

  Mobiveil PCIe controller driver:

   - Return bool (not int) for link-up check in
     mobiveil_pab_ops.link_up() and layerscape-gen4, mobiveil (Hans
     Zhang)

  NVIDIA Tegra194 PCIe controller driver:

   - Create debugfs directory for 'aspm_state_cnt' only when
     CONFIG_PCIEASPM is enabled, since there are no other entries (Hans
     Zhang)

  Qualcomm PCIe controller driver:

   - Add OF support for parsing DT 'eq-presets-<N>gts' property for lane
     equalization presets (Krishna Chaitanya Chundru)

   - Read Maximum Link Width from the Link Capabilities register if DT
     lacks 'num-lanes' property (Krishna Chaitanya Chundru)

   - Add Physical Layer 64 GT/s Capability ID and register offsets for
     8, 32, and 64 GT/s lane equalization registers (Krishna Chaitanya
     Chundru)

   - Add generic dwc support for configuring lane equalization presets
     (Krishna Chaitanya Chundru)

   - Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar)

  Renesas R-Car PCIe controller driver:

   - Describe endpoint BAR 4 as being fixed size (Jerome Brunet)

   - Document how to obtain R-Car V4H (r8a779g0) controller firmware
     (Yoshihiro Shimoda)

  Rockchip PCIe controller driver:

   - Reorder rockchip_pci_core_rsts because
     reset_control_bulk_deassert() deasserts in reverse order, to fix a
     link training regression (Jensen Huang)

   - Mark RK3399 as being capable of raising INTx interrupts (Niklas
     Cassel)

  Rockchip DesignWare PCIe controller driver:

   - Check only PCIE_LINKUP, not LTSSM status, to determine whether the
     link is up (Shawn Lin)

   - Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s
     for Root Complex and Endpoint modes (Shawn Lin)

   - Hide the broken ATS Capability in rockchip_pcie_ep_init() instead
     of rockchip_pcie_ep_pre_init() so it stays hidden after PERST#
     resets non-sticky registers (Shawn Lin)

   - Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit()
     (Diederik de Haas)

  Synopsys DesignWare PCIe controller driver:

   - Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training
     more robust; this will not affect the intended link width if all
     lanes are functional (Wenbin Yao)

   - Return bool (not int) for link-up check in dw_pcie_ops.link_up()
     and armada8k, dra7xx, dw-rockchip, exynos, histb, keembay,
     keystone, kirin, meson, qcom, qcom-ep, rcar_gen4, spear13xx,
     tegra194, uniphier, visconti (Hans Zhang)

   - Add debugfs support for exposing DWC device-specific PTM context
     (Manivannan Sadhasivam)

  TI J721E PCIe driver:

   - Make j721e buildable as a loadable and removable module (Siddharth
     Vadapalli)

   - Fix j721e host/endpoint dependencies that result in link failures
     in some configs (Arnd Bergmann)

  Device tree bindings:

   - Add qcom DT binding for 'global' interrupt (PCIe controller and
     link-specific events) for ipq8074, ipq8074-gen3, ipq6018, sa8775p,
     sc7280, sc8180x sdm845, sm8150, sm8250, sm8350 (Manivannan
     Sadhasivam)

   - Add qcom DT binding for 8 MSI SPI interrupts for msm8998, ipq8074,
     ipq8074-gen3, ipq6018 (Manivannan Sadhasivam)

   - Add dw rockchip DT binding for rk3576 and rk3562 (Kever Yang)

   - Correct indentation and style of examples in brcm,stb-pcie,
     cdns,cdns-pcie-ep, intel,keembay-pcie-ep, intel,keembay-pcie,
     microchip,pcie-host, rcar-pci-ep, rcar-pci-host, xilinx-versal-cpm
     (Krzysztof Kozlowski)

   - Convert Marvell EBU (dove, kirkwood, armada-370, armada-xp) and
     armada8k from text to schema DT bindings (Rob Herring)

   - Remove obsolete .txt DT bindings for content that has been moved to
     schemas (Rob Herring)

   - Add qcom DT binding for MHI registers in IPQ5332, IPQ6018, IPQ8074
     and IPQ9574 (Varadarajan Narayanan)

   - Convert v3,v360epc-pci from text to DT schema binding (Rob Herring)

   - Change microchip,pcie-host DT binding to be 'dma-noncoherent' since
     PolarFire may be configured that way (Conor Dooley)

  Miscellaneous:

   - Drop 'pci' suffix from intel_mid_pci.c filename to match similar
     files (Andy Shevchenko)

   - All platforms with PCI have an MMU, so add PCI Kconfig dependency
     on MMU to simplify build testing and avoid inadvertent build
     regressions (Arnd Bergmann)

   - Update Krzysztof Wilczyński's email address in MAINTAINERS
     (Krzysztof Wilczyński)

   - Update Manivannan Sadhasivam's email address in MAINTAINERS
     (Manivannan Sadhasivam)"

* tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (147 commits)
  MAINTAINERS: Update Manivannan Sadhasivam email address
  PCI: j721e: Fix host/endpoint dependencies
  PCI: j721e: Add support to build as a loadable module
  PCI: cadence-ep: Introduce cdns_pcie_ep_disable() helper for cleanup
  PCI: cadence-host: Introduce cdns_pcie_host_disable() helper for cleanup
  PCI: cadence: Add support to build pcie-cadence library as a kernel module
  MAINTAINERS: Update Krzysztof Wilczyński email address
  PCI: Remove unnecessary linesplit in __pci_setup_bridge()
  PCI: WARN (not BUG()) when we fail to assign optional resources
  PCI: Remove unused pci_printk()
  PCI: qcom: Replace PERST# sleep time with proper macro
  PCI: dw-rockchip: Replace PERST# sleep time with proper macro
  PCI: host-common: Convert to library for host controller drivers
  PCI/ERR: Remove misleading TODO regarding kernel panic
  PCI: cadence: Remove duplicate message code definitions
  PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding
  PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding
  PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding
  PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding
  PCI: cadence-ep: Correct PBA offset in .set_msix() callback
  ...
2025-06-04 11:26:17 -07:00
Bjorn Helgaas
3de914864c Merge branch 'pci/misc'
- Drop 'pci' suffix from intel_mid_pci.c filename to match similar files
  (Andy Shevchenko)

- All platforms with PCI have an MMU, so add PCI Kconfig dependency on MMU
  to simplify build testing and avoid inadvertent build regressions (Arnd
  Bergmann)

- Update driver path in PCI NVMe function documentation (Rick Wertenbroek)

- Remove unused pci_printk() (Ilpo Järvinen)

- Warn (not BUG()) about failure to assign optional resources (Ilpo
  Järvinen)

- Update Krzysztof Wilczyński's email address in MAINTAINERS (Krzysztof
  Wilczyński)

- Update Manivannan Sadhasivam's email address in MAINTAINERS (Manivannan
  Sadhasivam)

* pci/misc:
  MAINTAINERS: Update Manivannan Sadhasivam email address
  MAINTAINERS: Update Krzysztof Wilczyński email address
  PCI: Remove unnecessary linesplit in __pci_setup_bridge()
  PCI: WARN (not BUG()) when we fail to assign optional resources
  PCI: Remove unused pci_printk()
  Documentation: Fix path for NVMe PCI endpoint target driver
  PCI: Add CONFIG_MMU dependency
  x86/PCI: Drop 'pci' suffix from intel_mid_pci.c
2025-06-04 10:50:45 -05:00
Benjamin Berg
942349413a um: fix SECCOMP 32bit xstate register restore
There was a typo that caused the extended FP state to be copied into the
wrong location on 32 bit. On 32 bit we only store the xstate internally
as that already contains everything. However, for compatibility, the
mcontext on 32 bit first contains the legacy FP state and then the
xstate.

The code copied the xstate on top of the legacy FP state instead of
using the correct offset. This offset was already calculated in the
xstate_* variables, so simply switch to those to fix the problem.

With this SECCOMP mode works on 32 bit, so lift the restriction.

Fixes: b1e1bd2e69 ("um: Add helper functions to get/set state for SECCOMP")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250604081705.934112-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-04 11:40:36 +02:00