Commit Graph

1324266 Commits

Author SHA1 Message Date
Nathan Chancellor
aef25be35d hexagon: Disable constant extender optimization for LLVM prior to 19.1.0
The Hexagon-specific constant extender optimization in LLVM may crash on
Linux kernel code [1], such as fs/bcache/btree_io.c after
commit 32ed4a620c ("bcachefs: Btree path tracepoints") in 6.12:

  clang: llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp:745: bool (anonymous namespace)::HexagonConstExtenders::ExtRoot::operator<(const HCE::ExtRoot &) const: Assertion `ThisB->getParent() == OtherB->getParent()' failed.
  Stack dump:
  0.      Program arguments: clang --target=hexagon-linux-musl ... fs/bcachefs/btree_io.c
  1.      <eof> parser at end of file
  2.      Code generation
  3.      Running pass 'Function Pass Manager' on module 'fs/bcachefs/btree_io.c'.
  4.      Running pass 'Hexagon constant-extender optimization' on function '@__btree_node_lock_nopath'

Without assertions enabled, there is just a hang during compilation.

This has been resolved in LLVM main (20.0.0) [2] and backported to LLVM
19.1.0 but the kernel supports LLVM 13.0.1 and newer, so disable the
constant expander optimization using the '-mllvm' option when using a
toolchain that is not fixed.

Cc: stable@vger.kernel.org
Link: https://github.com/llvm/llvm-project/issues/99714 [1]
Link: 68df06a0b2 [2]
Link: 2ab8d93061 [3]
Reviewed-by: Brian Cain <bcain@quicinc.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-17 14:07:53 -08:00
Linus Torvalds
5529876063 Merge tag 'ftrace-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace fixes from Steven Rostedt:

 - Always try to initialize the idle functions when graph tracer starts

   A bug was found that when a CPU is offline when graph tracing starts
   and then comes online, that CPU is not traced. The fix to that was to
   move the initialization of the idle shadow stack over to the hot plug
   online logic, which also handle onlined CPUs. The issue was that it
   removed the initialization of the shadow stack when graph tracing
   starts, but the callbacks to the hot plug logic do nothing if graph
   tracing isn't currently running. Although that fix fixed the onlining
   of a CPU during tracing, it broke the CPUs that were already online.

 - Have microblaze not try to get the "true parent" in function tracing

   If function tracing and graph tracing are both enabled at the same
   time the parent of the functions traced by the function tracer may
   sometimes be the graph tracing trampoline. The graph tracing hijacks
   the return pointer of the function to trace it, but that can
   interfere with the function tracing parent output.

   This was fixed by using the ftrace_graph_ret_addr() function passing
   in the kernel stack pointer using the ftrace_regs_get_stack_pointer()
   function. But Al Viro reported that Microblaze does not implement the
   kernel_stack_pointer(regs) helper function that
   ftrace_regs_get_stack_pointer() uses and fails to compile when
   function graph tracing is enabled.

   It was first thought that this was a microblaze issue, but the real
   cause is that this only works when an architecture implements
   HAVE_DYNAMIC_FTRACE_WITH_ARGS, as a requirement for that config is to
   have ftrace always pass a valid ftrace_regs to the callbacks. That
   also means that the architecture supports
   ftrace_regs_get_stack_pointer()

   Microblaze does not set HAVE_DYNAMIC_FTRACE_WITH_ARGS nor does it
   implement ftrace_regs_get_stack_pointer() which caused it to fail to
   build. Only implement the "true parent" logic if an architecture has
   that config set"

* tag 'ftrace-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ftrace: Do not find "true_parent" if HAVE_DYNAMIC_FTRACE_WITH_ARGS is not set
  fgraph: Still initialize idle shadow stacks when starting
2024-12-17 09:14:31 -08:00
Linus Torvalds
a241d7f0d3 Merge tag 's390-6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:

 - Fix DirectMap accounting in /proc/meminfo file

 - Fix strscpy() return code handling that led to "unsigned 'len' is
   never less than zero" warning

 - Fix the calculation determining whether to use three- or four-level
   paging: account KMSAN modules metadata

* tag 's390-6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mm: Consider KMSAN modules metadata for paging levels
  s390/ipl: Fix never less than zero warning
  s390/mm: Fix DirectMap accounting
2024-12-17 09:09:32 -08:00
Linus Torvalds
ed90ed56e4 Merge tag 'erofs-for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
 "The first one fixes a syzbot UAF report caused by a commit introduced
  in this cycle, but it also addresses a longstanding memory leak. The
  second one resolves a PSI memstall mis-accounting issue.

  The remaining patches switch file-backed mounts to use buffered I/Os
  by default instead of direct I/Os, since the page cache of underlay
  files is typically valid and maybe even dirty. This change also aligns
  with the default policy of loopback devices. A mount option has been
  added to try to use direct I/Os explicitly.

  Summary:

   - Fix (pcluster) memory leak and (sbi) UAF after umounting

   - Fix a case of PSI memstall mis-accounting

   - Use buffered I/Os by default for file-backed mounts"

* tag 'erofs-for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: use buffered I/O for file-backed mounts by default
  erofs: reference `struct erofs_device_info` for erofs_map_dev
  erofs: use `struct erofs_device_info` for the primary device
  erofs: add erofs_sb_free() helper
  MAINTAINERS: erofs: update Yue Hu's email address
  erofs: fix PSI memstall accounting
  erofs: fix rare pcluster memory leak after unmounting
2024-12-17 09:04:42 -08:00
Linus Torvalds
1f13c38a85 Merge tag 'hardening-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fix from Kees Cook:
 "Silence a GCC value-range warning that is being ironically triggered
  by bounds checking"

* tag 'hardening-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  fortify: Hide run-time copy size from value range tracking
2024-12-17 08:45:40 -08:00
Linus Torvalds
59dbb9d81a Merge tag 'xsa465+xsa466-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
 "Fix xen netfront crash (XSA-465) and avoid using the hypercall page
  that doesn't do speculation mitigations (XSA-466)"

* tag 'xsa465+xsa466-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: remove hypercall page
  x86/xen: use new hypercall functions instead of hypercall page
  x86/xen: add central hypercall functions
  x86/xen: don't do PV iret hypercall through hypercall page
  x86/static-call: provide a way to do very early static-call updates
  objtool/x86: allow syscall instruction
  x86: make get_cpu_vendor() accessible from Xen code
  xen/netfront: fix crash when removing device
2024-12-17 08:29:58 -08:00
Juergen Gross
7fa0da5373 x86/xen: remove hypercall page
The hypercall page is no longer needed. It can be removed, as from the
Xen perspective it is optional.

But, from Linux's perspective, it removes naked RET instructions that
escape the speculative protections that Call Depth Tracking and/or
Untrain Ret are trying to achieve.

This is part of XSA-466 / CVE-2024-53241.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2024-12-17 08:23:42 +01:00
Juergen Gross
b1c2cb86f4 x86/xen: use new hypercall functions instead of hypercall page
Call the Xen hypervisor via the new xen_hypercall_func static-call
instead of the hypercall page.

This is part of XSA-466 / CVE-2024-53241.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2024-12-17 08:23:41 +01:00
Juergen Gross
b4845bb638 x86/xen: add central hypercall functions
Add generic hypercall functions usable for all normal (i.e. not iret)
hypercalls. Depending on the guest type and the processor vendor
different functions need to be used due to the to be used instruction
for entering the hypervisor:

- PV guests need to use syscall
- HVM/PVH guests on Intel need to use vmcall
- HVM/PVH guests on AMD and Hygon need to use vmmcall

As PVH guests need to issue hypercalls very early during boot, there
is a 4th hypercall function needed for HVM/PVH which can be used on
Intel and AMD processors. It will check the vendor type and then set
the Intel or AMD specific function to use via static_call().

This is part of XSA-466 / CVE-2024-53241.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
2024-12-17 08:23:29 +01:00
Kees Cook
239d87327d fortify: Hide run-time copy size from value range tracking
GCC performs value range tracking for variables as a way to provide better
diagnostics. One place this is regularly seen is with warnings associated
with bounds-checking, e.g. -Wstringop-overflow, -Wstringop-overread,
-Warray-bounds, etc. In order to keep the signal-to-noise ratio high,
warnings aren't emitted when a value range spans the entire value range
representable by a given variable. For example:

	unsigned int len;
	char dst[8];
	...
	memcpy(dst, src, len);

If len's value is unknown, it has the full "unsigned int" range of [0,
UINT_MAX], and GCC's compile-time bounds checks against memcpy() will
be ignored. However, when a code path has been able to narrow the range:

	if (len > 16)
		return;
	memcpy(dst, src, len);

Then the range will be updated for the execution path. Above, len is
now [0, 16] when reading memcpy(), so depending on other optimizations,
we might see a -Wstringop-overflow warning like:

	error: '__builtin_memcpy' writing between 9 and 16 bytes into region of size 8 [-Werror=stringop-overflow]

When building with CONFIG_FORTIFY_SOURCE, the fortified run-time bounds
checking can appear to narrow value ranges of lengths for memcpy(),
depending on how the compiler constructs the execution paths during
optimization passes, due to the checks against the field sizes. For
example:

	if (p_size_field != SIZE_MAX &&
	    p_size != p_size_field && p_size_field < size)

As intentionally designed, these checks only affect the kernel warnings
emitted at run-time and do not block the potentially overflowing memcpy(),
so GCC thinks it needs to produce a warning about the resulting value
range that might be reaching the memcpy().

We have seen this manifest a few times now, with the most recent being
with cpumasks:

In function ‘bitmap_copy’,
    inlined from ‘cpumask_copy’ at ./include/linux/cpumask.h:839:2,
    inlined from ‘__padata_set_cpumasks’ at kernel/padata.c:730:2:
./include/linux/fortify-string.h:114:33: error: ‘__builtin_memcpy’ reading between 257 and 536870904 bytes from a region of size 256 [-Werror=stringop-overread]
  114 | #define __underlying_memcpy     __builtin_memcpy
      |                                 ^
./include/linux/fortify-string.h:633:9: note: in expansion of macro ‘__underlying_memcpy’
  633 |         __underlying_##op(p, q, __fortify_size);                        \
      |         ^~~~~~~~~~~~~
./include/linux/fortify-string.h:678:26: note: in expansion of macro ‘__fortify_memcpy_chk’
  678 | #define memcpy(p, q, s)  __fortify_memcpy_chk(p, q, s,                  \
      |                          ^~~~~~~~~~~~~~~~~~~~
./include/linux/bitmap.h:259:17: note: in expansion of macro ‘memcpy’
  259 |                 memcpy(dst, src, len);
      |                 ^~~~~~
kernel/padata.c: In function ‘__padata_set_cpumasks’:
kernel/padata.c:713:48: note: source object ‘pcpumask’ of size [0, 256]
  713 |                                  cpumask_var_t pcpumask,
      |                                  ~~~~~~~~~~~~~~^~~~~~~~

This warning is _not_ emitted when CONFIG_FORTIFY_SOURCE is disabled,
and with the recent -fdiagnostics-details we can confirm the origin of
the warning is due to FORTIFY's bounds checking:

../include/linux/bitmap.h:259:17: note: in expansion of macro 'memcpy'
  259 |                 memcpy(dst, src, len);
      |                 ^~~~~~
  '__padata_set_cpumasks': events 1-2
../include/linux/fortify-string.h:613:36:
  612 |         if (p_size_field != SIZE_MAX &&
      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  613 |             p_size != p_size_field && p_size_field < size)
      |             ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
      |                                    |
      |                                    (1) when the condition is evaluated to false
      |                                    (2) when the condition is evaluated to true
  '__padata_set_cpumasks': event 3
  114 | #define __underlying_memcpy     __builtin_memcpy
      |                                 ^
      |                                 |
      |                                 (3) out of array bounds here

Note that the cpumask warning started appearing since bitmap functions
were recently marked __always_inline in commit ed8cd2b3bd ("bitmap:
Switch from inline to __always_inline"), which allowed GCC to gain
visibility into the variables as they passed through the FORTIFY
implementation.

In order to silence these false positives but keep otherwise deterministic
compile-time warnings intact, hide the length variable from GCC with
OPTIMIZE_HIDE_VAR() before calling the builtin memcpy.

Additionally add a comment about why all the macro args have copies with
const storage.

Reported-by: "Thomas Weißschuh" <linux@weissschuh.net>
Closes: https://lore.kernel.org/all/db7190c8-d17f-4a0d-bc2f-5903c79f36c2@t-8ch.de/
Reported-by: Nilay Shroff <nilay@linux.ibm.com>
Closes: https://lore.kernel.org/all/20241112124127.1666300-1-nilay@linux.ibm.com/
Tested-by: Nilay Shroff <nilay@linux.ibm.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kees Cook <kees@kernel.org>
2024-12-16 16:23:07 -08:00
Steven Rostedt
166438a432 ftrace: Do not find "true_parent" if HAVE_DYNAMIC_FTRACE_WITH_ARGS is not set
When function tracing and function graph tracing are both enabled (in
different instances) the "parent" of some of the function tracing events
is "return_to_handler" which is the trampoline used by function graph
tracing. To fix this, ftrace_get_true_parent_ip() was introduced that
returns the "true" parent ip instead of the trampoline.

To do this, the ftrace_regs_get_stack_pointer() is used, which uses
kernel_stack_pointer(). The problem is that microblaze does not implement
kerenl_stack_pointer() so when function graph tracing is enabled, the
build fails. But microblaze also does not enabled HAVE_DYNAMIC_FTRACE_WITH_ARGS.
That option has to be enabled by the architecture to reliably get the
values from the fregs parameter passed in. When that config is not set,
the architecture can also pass in NULL, which is not tested for in that
function and could cause the kernel to crash.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jeff Xie <jeff.xie@linux.dev>
Link: https://lore.kernel.org/20241216164633.6df18e87@gandalf.local.home
Fixes: 60b1f578b5 ("ftrace: Get the true parent ip for function tracer")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-16 17:22:26 -05:00
Steven Rostedt
cc252bb592 fgraph: Still initialize idle shadow stacks when starting
A bug was discovered where the idle shadow stacks were not initialized
for offline CPUs when starting function graph tracer, and when they came
online they were not traced due to the missing shadow stack. To fix
this, the idle task shadow stack initialization was moved to using the
CPU hotplug callbacks. But it removed the initialization when the
function graph was enabled. The problem here is that the hotplug
callbacks are called when the CPUs come online, but the idle shadow
stack initialization only happens if function graph is currently
active. This caused the online CPUs to not get their shadow stack
initialized.

The idle shadow stack initialization still needs to be done when the
function graph is registered, as they will not be allocated if function
graph is not registered.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241211135335.094ba282@batman.local.home
Fixes: 2c02f7375e ("fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks")
Reported-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Closes: https://lore.kernel.org/all/CACRpkdaTBrHwRbbrphVy-=SeDz6MSsXhTKypOtLrTQ+DgGAOcQ@mail.gmail.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-16 16:03:33 -05:00
Linus Torvalds
f44d154d6e Merge tag 'soc-fixes-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
 "Three small fixes for the soc tree:

   - devicetee fix for the Arm Juno reference machine, to allow more
     interesting PCI configurations

   - build fix for SCMI firmware on the NXP i.MX platform

   - fix for a race condition in Arm FF-A firmware"

* tag 'soc-fixes-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: dts: fvp: Update PCIe bus-range property
  firmware: arm_ffa: Fix the race around setting ffa_dev->properties
  firmware: arm_scmi: Fix i.MX build dependency
2024-12-16 10:10:53 -08:00
Linus Torvalds
dc690bc256 Merge tag 'platform-drivers-x86-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:

 - alienware-wmi:
    - Add support for Alienware m16 R1 AMD
    - Do not setup legacy LED control with X and G Series

 - intel/ifs: Clearwater Forest support

 - intel/vsec: Panther Lake support

 - p2sb: Do not hide the device if BIOS left it unhidden

 - touchscreen_dmi: Add SARY Tab 3 tablet information

* tag 'platform-drivers-x86-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/intel/vsec: Add support for Panther Lake
  platform/x86/intel/ifs: Add Clearwater Forest to CPU support list
  platform/x86: touchscreen_dmi: Add info for SARY Tab 3 tablet
  p2sb: Do not scan and remove the P2SB device when it is unhidden
  p2sb: Move P2SB hide and unhide code to p2sb_scan_and_cache()
  p2sb: Introduce the global flag p2sb_hidden_by_bios
  p2sb: Factor out p2sb_read_from_cache()
  alienware-wmi: Adds support to Alienware m16 R1 AMD
  alienware-wmi: Fix X Series and G Series quirks
2024-12-16 10:01:57 -08:00
Gao Xiang
6422cde1b0 erofs: use buffered I/O for file-backed mounts by default
For many use cases (e.g. container images are just fetched from remote),
performance will be impacted if underlay page cache is up-to-date but
direct i/o flushes dirty pages first.

Instead, let's use buffered I/O by default to keep in sync with loop
devices and add a (re)mount option to explicitly give a try to use
direct I/O if supported by the underlying files.

The container startup time is improved as below:
[workload] docker.io/library/workpress:latest
                                     unpack        1st run  non-1st runs
EROFS snapshotter buffered I/O file  4.586404265s  0.308s   0.198s
EROFS snapshotter direct I/O file    4.581742849s  2.238s   0.222s
EROFS snapshotter loop               4.596023152s  0.346s   0.201s
Overlayfs snapshotter                5.382851037s  0.206s   0.214s

Fixes: fb17675026 ("erofs: add file-backed mount support")
Cc: Derek McGowan <derek@mcg.dev>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241212134336.2059899-1-hsiangkao@linux.alibaba.com
2024-12-16 21:02:07 +08:00
Gao Xiang
f8d920a402 erofs: reference struct erofs_device_info for erofs_map_dev
Record `m_sb` and `m_dif` to replace `m_fscache`, `m_daxdev`, `m_fp`
and `m_dax_part_off` in order to simplify the codebase.

Note that `m_bdev` is still left since it can be assigned from
`sb->s_bdev` directly.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241212235401.2857246-1-hsiangkao@linux.alibaba.com
2024-12-16 21:02:06 +08:00
Gao Xiang
7b00af2c54 erofs: use struct erofs_device_info for the primary device
Instead of just listing each one directly in `struct erofs_sb_info`
except that we still use `sb->s_bdev` for the primary block device.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241216125310.930933-2-hsiangkao@linux.alibaba.com
2024-12-16 21:01:59 +08:00
Linus Torvalds
78d4f34e21 Linux 6.13-rc3 v6.13-rc3 2024-12-15 15:58:23 -08:00
Linus Torvalds
42a19aa170 Merge tag 'arc-6.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:

 - Sundry build and misc fixes

* tag 'arc-6.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: build: Try to guess GCC variant of cross compiler
  ARC: bpf: Correct conditional check in 'check_jmp_32'
  ARC: dts: Replace deprecated snps,nr-gpios property for snps,dw-apb-gpio-port devices
  ARC: build: Use __force to suppress per-CPU cmpxchg warnings
  ARC: fix reference of dependency for PAE40 config
  ARC: build: disallow invalid PAE40 + 4K page config
  arc: rename aux.h to arc_aux.h
2024-12-15 15:38:12 -08:00
Linus Torvalds
7031a38ab7 Merge tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:

 - Limit EFI zboot to GZIP and ZSTD before it comes in wider use

 - Fix inconsistent error when looking up a non-existent file in
   efivarfs with a name that does not adhere to the NAME-GUID format

 - Drop some unused code

* tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi/esrt: remove esre_attribute::store()
  efivarfs: Fix error on non-existent file
  efi/zboot: Limit compression options to GZIP and ZSTD
2024-12-15 15:33:41 -08:00
Linus Torvalds
151167d85a Merge tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "i2c host fixes: PNX used the wrong unit for timeouts, Nomadik was
  missing a sentinel, and RIIC was missing rounding up"

* tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: riic: Always round-up when calculating bus period
  i2c: nomadik: Add missing sentinel to match table
  i2c: pnx: Fix timeout in wait functions
2024-12-15 15:29:07 -08:00
Vasily Gorbik
282da38b46 s390/mm: Consider KMSAN modules metadata for paging levels
The calculation determining whether to use three- or four-level paging
didn't account for KMSAN modules metadata. Include this metadata in the
virtual memory size calculation to ensure correct paging mode selection
and avoiding potentially unnecessary physical memory size limitations.

Fixes: 65ca73f9fb ("s390/mm: define KMSAN metadata for vmalloc and modules")
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-15 23:35:09 +01:00
Linus Torvalds
dccbe2047a Merge tag 'edac_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:

 - Make sure amd64_edac loads successfully on certain Zen4 memory
   configurations

* tag 'edac_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/amd64: Simplify ECC check on unified memory controllers
2024-12-15 10:01:10 -08:00
Linus Torvalds
f7c7a1ba22 Merge tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:

 - Disable the secure programming interface of the GIC500 chip in the
   RK3399 SoC to fix interrupt priority assignment and even make a dead
   machine boot again when the gic-v3 driver enables pseudo NMIs

 - Correct the declaration of a percpu variable to fix several sparse
   warnings

* tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3: Work around insecure GIC integrations
  irqchip/gic: Correct declaration of *percpu_base pointer in union gic_base
2024-12-15 09:58:27 -08:00
Linus Torvalds
acd855a949 Merge tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:

 - Prevent incorrect dequeueing of the deadline dlserver helper task and
   fix its time accounting

 - Properly track the CFS runqueue runnable stats

 - Check the total number of all queued tasks in a sched fair's runqueue
   hierarchy before deciding to stop the tick

 - Fix the scheduling of the task that got woken last (NEXT_BUDDY) by
   preventing those from being delayed

* tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/dlserver: Fix dlserver time accounting
  sched/dlserver: Fix dlserver double enqueue
  sched/eevdf: More PELT vs DELAYED_DEQUEUE
  sched/fair: Fix sched_can_stop_tick() for fair tasks
  sched/fair: Fix NEXT_BUDDY
2024-12-15 09:38:03 -08:00
Linus Torvalds
81576a9a27 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix confusion with implicitly-shifted MDCR_EL2 masks breaking
     SPE/TRBE initialization

   - Align nested page table walker with the intended memory attribute
     combining rules of the architecture

   - Prevent userspace from constraining the advertised ASID width,
     avoiding horrors of guest TLBIs not matching the intended context
     in hardware

   - Don't leak references on LPIs when insertion into the translation
     cache fails

  RISC-V:

   - Replace csr_write() with csr_set() for HVIEN PMU overflow bit

  x86:

   - Cache CPUID.0xD XSTATE offsets+sizes during module init

     On Intel's Emerald Rapids CPUID costs hundreds of cycles and there
     are a lot of leaves under 0xD. Getting rid of the CPUIDs during
     nested VM-Enter and VM-Exit is planned for the next release, for
     now just cache them: even on Skylake that is 40% faster"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init
  RISC-V: KVM: Fix csr_write -> csr_set for HVIEN PMU overflow bit
  KVM: arm64: vgic-its: Add error handling in vgic_its_cache_translation
  KVM: arm64: Do not allow ID_AA64MMFR0_EL1.ASIDbits to be overridden
  KVM: arm64: Fix S1/S2 combination when FWB==1 and S2 has Device memory type
  arm64: Fix usage of new shifted MDCR_EL2 values
2024-12-15 09:26:13 -08:00
Linus Torvalds
2d8308bf5b Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
 "Single one-line fix in the ufs driver"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Update compl_time_stamp_local_clock after completing a cqe
2024-12-14 15:53:02 -08:00
Linus Torvalds
35f301dd45 Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:

 - Fix a bug in the BPF verifier to track changes to packet data
   property for global functions (Eduard Zingerman)

 - Fix a theoretical BPF prog_array use-after-free in RCU handling of
   __uprobe_perf_func (Jann Horn)

 - Fix BPF tracing to have an explicit list of tracepoints and their
   arguments which need to be annotated as PTR_MAYBE_NULL (Kumar
   Kartikeya Dwivedi)

 - Fix a logic bug in the bpf_remove_insns code where a potential error
   would have been wrongly propagated (Anton Protopopov)

 - Avoid deadlock scenarios caused by nested kprobe and fentry BPF
   programs (Priya Bala Govindasamy)

 - Fix a bug in BPF verifier which was missing a size check for
   BTF-based context access (Kumar Kartikeya Dwivedi)

 - Fix a crash found by syzbot through an invalid BPF prog_array access
   in perf_event_detach_bpf_prog (Jiri Olsa)

 - Fix several BPF sockmap bugs including a race causing a refcount
   imbalance upon element replace (Michal Luczaj)

 - Fix a use-after-free from mismatching BPF program/attachment RCU
   flavors (Jann Horn)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits)
  bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs
  selftests/bpf: Add tests for raw_tp NULL args
  bpf: Augment raw_tp arguments with PTR_MAYBE_NULL
  bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"
  selftests/bpf: Add test for narrow ctx load for pointer args
  bpf: Check size for BTF-based ctx access of pointer members
  selftests/bpf: extend changes_pkt_data with cases w/o subprograms
  bpf: fix null dereference when computing changes_pkt_data of prog w/o subprogs
  bpf: Fix theoretical prog_array UAF in __uprobe_perf_func()
  bpf: fix potential error return
  selftests/bpf: validate that tail call invalidates packet pointers
  bpf: consider that tail calls invalidate packet pointers
  selftests/bpf: freplace tests for tracking of changes_packet_data
  bpf: check changes_pkt_data property for extension programs
  selftests/bpf: test for changing packet data from global functions
  bpf: track changes_pkt_data property for global functions
  bpf: refactor bpf_helper_changes_pkt_data to use helper number
  bpf: add find_containing_subprog() utility function
  bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog
  bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors
  ...
2024-12-14 12:58:14 -08:00
Priya Bala Govindasamy
c83508da56 bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs
BPF program types like kprobe and fentry can cause deadlocks in certain
situations. If a function takes a lock and one of these bpf programs is
hooked to some point in the function's critical section, and if the
bpf program tries to call the same function and take the same lock it will
lead to deadlock. These situations have been reported in the following
bug reports.

In percpu_freelist -
Link: https://lore.kernel.org/bpf/CAADnVQLAHwsa+2C6j9+UC6ScrDaN9Fjqv1WjB1pP9AzJLhKuLQ@mail.gmail.com/T/
Link: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@mail.gmail.com/T/
In bpf_lru_list -
Link: https://lore.kernel.org/bpf/CAPPBnEajj+DMfiR_WRWU5=6A7KKULdB5Rob_NJopFLWF+i9gCA@mail.gmail.com/T/
Link: https://lore.kernel.org/bpf/CAPPBnEZQDVN6VqnQXvVqGoB+ukOtHGZ9b9U0OLJJYvRoSsMY_g@mail.gmail.com/T/
Link: https://lore.kernel.org/bpf/CAPPBnEaCB1rFAYU7Wf8UxqcqOWKmRPU1Nuzk3_oLk6qXR7LBOA@mail.gmail.com/T/

Similar bugs have been reported by syzbot.
In queue_stack_maps -
Link: https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@google.com/
Link: https://lore.kernel.org/all/20240418230932.2689-1-hdanton@sina.com/T/
In lpm_trie -
Link: https://lore.kernel.org/linux-kernel/00000000000035168a061a47fa38@google.com/T/
In ringbuf -
Link: https://lore.kernel.org/bpf/20240313121345.2292-1-hdanton@sina.com/T/

Prevent kprobe and fentry bpf programs from attaching to these critical
sections by removing CC_FLAGS_FTRACE for percpu_freelist.o,
bpf_lru_list.o, queue_stack_maps.o, lpm_trie.o, ringbuf.o files.

The bugs reported by syzbot are due to tracepoint bpf programs being
called in the critical sections. This patch does not aim to fix deadlocks
caused by tracepoint programs. However, it does prevent deadlocks from
occurring in similar situations due to kprobe and fentry programs.

Signed-off-by: Priya Bala Govindasamy <pgovind2@uci.edu>
Link: https://lore.kernel.org/r/CAPPBnEZpjGnsuA26Mf9kYibSaGLm=oF6=12L21X1GEQdqjLnzQ@mail.gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-14 09:49:27 -08:00
Linus Torvalds
a0e3919a2d Merge tag 'usb-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
 "Here are some small USB driver fixes for some reported issues.
  Included in here are:

   - typec driver bugfixes

   - u_serial gadget driver bugfix for much reported and discussed issue

   - dwc2 bugfixes

   - midi gadget driver bugfix

   - ehci-hcd driver bugfix

   - other small bugfixes

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'usb-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: ucsi: Fix connector status writing past buffer size
  usb: typec: ucsi: Fix completion notifications
  usb: dwc2: Fix HCD port connection race
  usb: dwc2: hcd: Fix GetPortStatus & SetPortFeature
  usb: dwc2: Fix HCD resume
  usb: gadget: u_serial: Fix the issue that gs_start_io crashed due to accessing null pointer
  usb: misc: onboard_usb_dev: skip suspend/resume sequence for USB5744 SMBus support
  usb: dwc3: xilinx: make sure pipe clock is deselected in usb2 only mode
  usb: core: hcd: only check primary hcd skip_phy_initialization
  usb: gadget: midi2: Fix interpretation of is_midi1 bits
  usb: dwc3: imx8mp: fix software node kernel dump
  usb: typec: anx7411: fix OF node reference leaks in anx7411_typec_switch_probe()
  usb: typec: anx7411: fix fwnode_handle reference leak
  usb: host: max3421-hcd: Correctly abort a USB request.
  dt-bindings: phy: imx8mq-usb: correct reference to usb-switch.yaml
  usb: ehci-hcd: fix call balance of clocks handling routines
2024-12-14 09:35:22 -08:00
Linus Torvalds
636110be62 Merge tag 'tty-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
 "Here are two small serial driver fixes for 6.13-rc3.  They are:

   - ioport build fallout fix for the 8250 port driver that should
     resolve Guenter's runtime problems

   - sh-sci driver bugfix for a reported problem

  Both of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: serial: Work around warning backtrace in serial8250_set_defaults
  serial: sh-sci: Check if TX data was written to device in .tx_empty()
2024-12-14 09:31:19 -08:00
Linus Torvalds
3de4f6d919 Merge tag 'staging-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
 "Here are some small staging gpib driver build and bugfixes for issues
  that have been much-reported (should finally fix Guenter's build
  issues). There are more of these coming in later -rc releases, but for
  now this should fix the majority of the reported problems.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: gpib: Fix i386 build issue
  staging: gpib: Fix faulty workaround for assignment in if
  staging: gpib: Workaround for ppc build failure
  staging: gpib: Make GPIB_NI_PCI_ISA depend on HAS_IOPORT
2024-12-14 09:27:07 -08:00
Linus Torvalds
ec2092915d Merge tag 'v6.13-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "Fix a regression in rsassa-pkcs1 as well as a buffer overrun in
  hisilicon/debugfs"

* tag 'v6.13-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: hisilicon/debugfs - fix the struct pointer incorrectly offset problem
  crypto: rsassa-pkcs1 - Copy source data for SG list
2024-12-14 09:15:49 -08:00
Linus Torvalds
7824850768 Merge tag 'rust-fixes-6.13' of https://github.com/Rust-for-Linux/linux
Pull rust fixes from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Set bindgen's Rust target version to prevent issues when
     pairing older rustc releases with newer bindgen releases,
     such as bindgen >= 0.71.0 and rustc < 1.82 due to
     unsafe_extern_blocks.

  drm/panic:

   - Remove spurious empty line detected by a new Clippy warning"

* tag 'rust-fixes-6.13' of https://github.com/Rust-for-Linux/linux:
  rust: kbuild: set `bindgen`'s Rust target version
  drm/panic: remove spurious empty line to clean warning
2024-12-14 08:51:43 -08:00
Linus Torvalds
115c0cc251 Merge tag 'iommu-fixes-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:

 - Per-domain device-list locking fixes for the AMD IOMMU driver

 - Fix incorrect use of smp_processor_id() in the NVidia-specific part
   of the ARM-SMMU-v3 driver

 - Intel IOMMU driver fixes:
     - Remove cache tags before disabling ATS
     - Avoid draining PRQ in sva mm release path
     - Fix qi_batch NULL pointer with nested parent domain

* tag 'iommu-fixes-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/vt-d: Avoid draining PRQ in sva mm release path
  iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
  iommu/vt-d: Remove cache tags before disabling ATS
  iommu/amd: Add lockdep asserts for domain->dev_list
  iommu/amd: Put list_add/del(dev_data) back under the domain->lock
  iommu/tegra241-cmdqv: do not use smp_processor_id in preemptible context
2024-12-14 08:41:40 -08:00
Linus Torvalds
5d97859e17 Merge tag 'ata-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Damien Le Moal:

 - Fix an OF node reference leak in the sata_highbank driver

* tag 'ata-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: sata_highbank: fix OF node reference leak in highbank_initialize_phys()
2024-12-14 08:40:05 -08:00
Wolfram Sang
5b6b08af1f Merge tag 'i2c-host-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.13-rc3

- Replaced jiffies with msec for timeout calculations.
- Added a sentinel to the 'of_device_id' array in Nomadik.
- Rounded up bus period calculation in RIIC.
2024-12-14 10:01:46 +01:00
Linus Torvalds
a446e965a1 Merge tag '6.13-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:

 - fix rmmod leak

 - two minor cleanups

 - fix for unlink/rename with pending i/o

* tag '6.13-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: destroy cfid_put_wq on module exit
  cifs: Use str_yes_no() helper in cifs_ses_add_channel()
  cifs: Fix rmdir failure due to ongoing I/O on deleted file
  smb3: fix compiler warning in reparse code
2024-12-13 17:36:02 -08:00
Linus Torvalds
f86135613a Merge tag 'spi-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
 "A few fairly small fixes for v6.13, the most substatial one being
  disabling STIG mode for Cadence QSPI controllers on Altera SoCFPGA
  platforms since it doesn't work"

* tag 'spi-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-cadence-qspi: Disable STIG mode for Altera SoCFPGA.
  spi: rockchip: Fix PM runtime count on no-op cs
  spi: aspeed: Fix an error handling path in aspeed_spi_[read|write]_user()
2024-12-13 17:29:19 -08:00
Linus Torvalds
4e1b4861a2 Merge tag 'regulator-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
 "A couple of additional changes, one ensuring we give AXP717 enough
  time to stabilise after changing voltages which fixes serious
  stability issues on some platforms and another documenting the DT
  support required for the Qualcomm WCN6750"

* tag 'regulator-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: axp20x: AXP717: set ramp_delay
  regulator: dt-bindings: qcom,qca6390-pmu: document wcn6750-pmu
2024-12-13 17:18:28 -08:00
Linus Torvalds
e72da82d5a Merge tag 'drm-fixes-2024-12-14' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "This is the weekly fixes pull for drm. Just has i915, xe and amdgpu
  changes in it. Nothing too major in here:

  i915:
   - Don't use indexed register writes needlessly [dsb]
   - Stop using non-posted DSB writes for legacy LUT [color]
   - Fix NULL pointer dereference in capture_engine
   - Fix memory leak by correcting cache object name in error handler

  xe:
   - Fix a KUNIT test error message (Mirsad Todorovac)
   - Fix an invalidation fence PM ref leak (Daniele)
   - Fix a register pool UAF (Lucas)

  amdgpu:
   - ISP hw init fix
   - SR-IOV fixes
   - Fix contiguous VRAM mapping for UVD on older GPUs
   - Fix some regressions due to drm scheduler changes
   - Workload profile fixes
   - Cleaner shader fix

  amdkfd:
   - Fix DMA map direction for migration
   - Fix a potential null pointer dereference
   - Cacheline size fixes
   - Runtime PM fix"

* tag 'drm-fixes-2024-12-14' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe/reg_sr: Remove register pool
  drm/xe: Call invalidation_fence_fini for PT inval fences in error state
  drm/xe: fix the ERR_PTR() returned on failure to allocate tiny pt
  drm/amdkfd: pause autosuspend when creating pdd
  drm/amdgpu: fix when the cleaner shader is emitted
  drm/amdgpu: Fix ISP HW init issue
  drm/amdkfd: hard-code MALL cacheline size for gfx11, gfx12
  drm/amdkfd: hard-code cacheline size for gfx11
  drm/amdkfd: Dereference null return value
  drm/i915: Fix memory leak by correcting cache object name in error handler
  drm/i915: Fix NULL pointer dereference in capture_engine
  drm/i915/color: Stop using non-posted DSB writes for legacy LUT
  drm/i915/dsb: Don't use indexed register writes needlessly
  drm/amdkfd: Correct the migration DMA map direction
  drm/amd/pm: Set SMU v13.0.7 default workload type
  drm/amd/pm: Initialize power profile mode
  amdgpu/uvd: get ring reference from rq scheduler
  drm/amdgpu: fix UVD contiguous CS mapping problem
  drm/amdgpu: use sjt mec fw on gfx943 for sriov
  Revert "drm/amdgpu: Fix ISP hw init issue"
2024-12-13 16:58:39 -08:00
Linus Torvalds
974acf9974 Merge tag 'pm-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management documentation fix from Rafael Wysocki:
 "Fix a runtime PM documentation mistake that may mislead someone into
  making a coding mistake (Paul Barker)"

* tag 'pm-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Documentation: PM: Clarify pm_runtime_resume_and_get() return value
2024-12-13 16:55:43 -08:00
Linus Torvalds
c810e8df96 Merge tag 'acpi-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
 "These fix two coding mistakes, one in the ACPI resources handling code
  and one in ACPICA:

   - Relocate the addr->info.mem.caching check in acpi_decode_space() to
     only execute it if the resource is of the correct type (Ilpo
     Järvinen)

   - Don't release a context_mutex that was never acquired in
     acpi_remove_address_space_handler() (Daniil Tatianin)"

* tag 'acpi-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: events/evxfregn: don't release the ContextMutex that was never acquired
  ACPI: resource: Fix memory resource type union access
2024-12-13 16:51:56 -08:00
Alexei Starovoitov
a8e1a3ddf7 Merge branch 'explicit-raw_tp-null-arguments'
Kumar Kartikeya Dwivedi says:

====================
Explicit raw_tp NULL arguments

This set reverts the raw_tp masking changes introduced in commit
cb4158ce8e ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL") and
replaces it wwith an explicit list of tracepoints and their arguments
which need to be annotated as PTR_MAYBE_NULL. More context on the
fallout caused by the masking fix and subsequent discussions can be
found in [0].

To remedy this, we implement a solution of explicitly defined tracepoint
and define which args need to be marked NULL or scalar (for IS_ERR
case). The commit logs describes the details of this approach in detail.

We will follow up this solution an approach Eduard is working on to
perform automated analysis of NULL-ness of tracepoint arguments. The
current PoC is available here:

- LLVM branch with the analysis:
  https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params
- Python script for merging of analysis results:
  https://gist.github.com/eddyz87/e47c164466a60e8d49e6911cff146f47

The idea is to infer a tri-state verdict for each tracepoint parameter:
definitely not null, can be null, unknown (in which case no assumptions
should be made).

Using this information, the verifier in most cases will be able to
precisely determine the state of the tracepoint parameter without any
human effort. At that point, the table maintained manually in this set
can be dropped and replace with this automated analysis tool's result.
This will be kept up to date with each kernel release.

  [0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com

Changelog:
----------
v2 -> v3:
v2: https://lore.kernel.org/bpf/20241213175127.2084759-1-memxor@gmail.com

 * Address Eduard's nits, add Reviewed-by

v1 -> v2:
v1: https://lore.kernel.org/bpf/20241211020156.18966-1-memxor@gmail.com

 * Address comments from Jiri
   * Mark module tracepoints args NULL by default
   * Add more sunrpc tracepoints
   * Unify scalar or null handling
 * Address comments from Alexei
   * Use bitmask approach suggested in review
   * Unify scalar or null handling
   * Drop most tests that rely on CONFIG options
   * Drop scripts to generate tests
====================

Link: https://patch.msgid.link/20241213221929.3495062-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13 16:24:54 -08:00
Kumar Kartikeya Dwivedi
0da1955b5b selftests/bpf: Add tests for raw_tp NULL args
Add tests to ensure that arguments are correctly marked based on their
specified positions, and whether they get marked correctly as maybe
null. For modules, all tracepoint parameters should be marked
PTR_MAYBE_NULL by default.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13 16:24:53 -08:00
Kumar Kartikeya Dwivedi
838a10bd2e bpf: Augment raw_tp arguments with PTR_MAYBE_NULL
Arguments to a raw tracepoint are tagged as trusted, which carries the
semantics that the pointer will be non-NULL.  However, in certain cases,
a raw tracepoint argument may end up being NULL. More context about this
issue is available in [0].

Thus, there is a discrepancy between the reality, that raw_tp arguments can
actually be NULL, and the verifier's knowledge, that they are never NULL,
causing explicit NULL check branch to be dead code eliminated.

A previous attempt [1], i.e. the second fixed commit, was made to
simulate symbolic execution as if in most accesses, the argument is a
non-NULL raw_tp, except for conditional jumps.  This tried to suppress
branch prediction while preserving compatibility, but surfaced issues
with production programs that were difficult to solve without increasing
verifier complexity. A more complete discussion of issues and fixes is
available at [2].

Fix this by maintaining an explicit list of tracepoints where the
arguments are known to be NULL, and mark the positional arguments as
PTR_MAYBE_NULL. Additionally, capture the tracepoints where arguments
are known to be ERR_PTR, and mark these arguments as scalar values to
prevent potential dereference.

Each hex digit is used to encode NULL-ness (0x1) or ERR_PTR-ness (0x2),
shifted by the zero-indexed argument number x 4. This can be represented
as follows:
1st arg: 0x1
2nd arg: 0x10
3rd arg: 0x100
... and so on (likewise for ERR_PTR case).

In the future, an automated pass will be used to produce such a list, or
insert __nullable annotations automatically for tracepoints. Each
compilation unit will be analyzed and results will be collated to find
whether a tracepoint pointer is definitely not null, maybe null, or an
unknown state where verifier conservatively marks it PTR_MAYBE_NULL.
A proof of concept of this tool from Eduard is available at [3].

Note that in case we don't find a specification in the raw_tp_null_args
array and the tracepoint belongs to a kernel module, we will
conservatively mark the arguments as PTR_MAYBE_NULL. This is because
unlike for in-tree modules, out-of-tree module tracepoints may pass NULL
freely to the tracepoint. We don't protect against such tracepoints
passing ERR_PTR (which is uncommon anyway), lest we mark all such
arguments as SCALAR_VALUE.

While we are it, let's adjust the test raw_tp_null to not perform
dereference of the skb->mark, as that won't be allowed anymore, and make
it more robust by using inline assembly to test the dead code
elimination behavior, which should still stay the same.

  [0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
  [1]: https://lore.kernel.org/all/20241104171959.2938862-1-memxor@gmail.com
  [2]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
  [3]: https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params

Reported-by: Juri Lelli <juri.lelli@redhat.com> # original bug
Reported-by: Manu Bretelle <chantra@meta.com> # bugs in masking fix
Fixes: 3f00c52393 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Fixes: cb4158ce8e ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13 16:24:53 -08:00
Kumar Kartikeya Dwivedi
c00d738e16 bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"
This patch reverts commit
cb4158ce8e ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"). The
patch was well-intended and meant to be as a stop-gap fixing branch
prediction when the pointer may actually be NULL at runtime. Eventually,
it was supposed to be replaced by an automated script or compiler pass
detecting possibly NULL arguments and marking them accordingly.

However, it caused two main issues observed for production programs and
failed to preserve backwards compatibility. First, programs relied on
the verifier not exploring == NULL branch when pointer is not NULL, thus
they started failing with a 'dereference of scalar' error.  Next,
allowing raw_tp arguments to be modified surfaced the warning in the
verifier that warns against reg->off when PTR_MAYBE_NULL is set.

More information, context, and discusson on both problems is available
in [0]. Overall, this approach had several shortcomings, and the fixes
would further complicate the verifier's logic, and the entire masking
scheme would have to be removed eventually anyway.

Hence, revert the patch in preparation of a better fix avoiding these
issues to replace this commit.

  [0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com

Reported-by: Manu Bretelle <chantra@meta.com>
Fixes: cb4158ce8e ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13 16:24:53 -08:00
Linus Torvalds
c30c65f3fe Merge tag 'block-6.13-20241213' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - Series from Damien fixing issues with the zoned write plugging

 - Fix for a potential UAF in block cgroups

 - Fix deadlock around queue freezing and the sysfs lock

 - Various little cleanups and fixes

* tag 'block-6.13-20241213' of git://git.kernel.dk/linux:
  block: Fix potential deadlock while freezing queue and acquiring sysfs_lock
  block: Fix queue_iostats_passthrough_show()
  blk-mq: Clean up blk_mq_requeue_work()
  mq-deadline: Remove a local variable
  blk-iocost: Avoid using clamp() on inuse in __propagate_weights()
  block: Make bio_iov_bvec_set() accept pointer to const iov_iter
  block: get wp_offset by bdev_offset_from_zone_start
  blk-cgroup: Fix UAF in blkcg_unpin_online()
  MAINTAINERS: update Coly Li's email address
  block: Prevent potential deadlocks in zone write plug error recovery
  dm: Fix dm-zoned-reclaim zone write pointer alignment
  block: Ignore REQ_NOWAIT for zone reset and zone finish operations
  block: Use a zone write plug BIO work for REQ_NOWAIT BIOs
2024-12-13 15:10:59 -08:00
Linus Torvalds
2a816e4f6a Merge tag 'io_uring-6.13-20241213' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
 "A single fix for a regression introduced in the 6.13 merge window"

* tag 'io_uring-6.13-20241213' of git://git.kernel.dk/linux:
  io_uring/rsrc: don't put/free empty buffers
2024-12-13 15:08:55 -08:00
Linus Torvalds
1c021e7908 Merge tag 'libnvdimm-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Ira Weiny:

 - sysbot fix for out of bounds access

* tag 'libnvdimm-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  acpi: nfit: vmalloc-out-of-bounds Read in acpi_nfit_ctl
2024-12-13 15:07:22 -08:00