Commit Graph

1428924 Commits

Author SHA1 Message Date
Alexei Starovoitov
596bef1d71 bpf: Support 32-bit scalar spills in stacksafe()
v1->v2: updated comments
v1: https://lore.kernel.org/bpf/20260322225124.14005-1-alexei.starovoitov@gmail.com/

The commit 6efbde200b ("bpf: Handle scalar spill vs all MISC in stacksafe()")
in stacksafe() only recognized full 64-bit scalar spills when
comparing stack states for equivalence during state pruning and
missed 32-bit scalar spill. When 32-bit scalar is spilled the
check_stack_write_fixed_off() -> save_register_state() calls
mark_stack_slot_misc() for slot[0-3], which preserves STACK_INVALID
and STACK_ZERO (on a fresh stack slot[0-3] remain STACK_INVALID),
sets slot[4-7] = STACK_SPILL, and updates spilled_ptr.

The im=4 path is only reached when im=0 fails: The loop at im=0 already
attempts the 64-bit scalar-spill/all-MISC check. If it matches, i advances
by 7, skipping the entire 8-byte slot. So im=4 is only reached when bytes
0-3 are neither a scalar spill nor all-MISC — they must pass individual
byte-by-byte comparison first. Then bytes 4-7 get the scalar-unit
treatment.

is_spilled_scalar_after(stack, 4): slot_type[4] == STACK_SPILL from a
64-bit spill would have been caught at im=0 (unless it's a pointer spill,
in which case spilled_ptr.type != SCALAR_VALUE -> returns false at im=4
too). A partial overwrite of a 64-bit spill invalidates the entire slot in
check_stack_write_fixed_off().

is_stack_misc_after(stack, 4): Only checks bytes 4-7 are MISC/INVALID,
returns &unbound_reg. Comparing two unbound regs via regsafe() is safe.

Changes to cilium programs:
File             Program                            Insns (A)  Insns (B)  Insns     (DIFF)
_______________  _________________________________  _________  _________  ________________
bpf_host.o       cil_host_policy                        49351      45811    -3540 (-7.17%)
bpf_host.o       cil_to_host                             2384       2270     -114 (-4.78%)
bpf_host.o       cil_to_netdev                         112051     100269  -11782 (-10.51%)
bpf_host.o       tail_handle_ipv4_cont_from_host        61175      60910     -265 (-0.43%)
bpf_host.o       tail_handle_ipv4_cont_from_netdev       9381       8873     -508 (-5.42%)
bpf_host.o       tail_handle_ipv4_from_host             12994       7066   -5928 (-45.62%)
bpf_host.o       tail_handle_ipv4_from_netdev           85015      59875  -25140 (-29.57%)
bpf_host.o       tail_handle_ipv6_cont_from_host        24732      23527    -1205 (-4.87%)
bpf_host.o       tail_handle_ipv6_cont_from_netdev       9463       8953     -510 (-5.39%)
bpf_host.o       tail_handle_ipv6_from_host             12477      11787     -690 (-5.53%)
bpf_host.o       tail_handle_ipv6_from_netdev           30814      30017     -797 (-2.59%)
bpf_host.o       tail_handle_nat_fwd_ipv4                8943       8860      -83 (-0.93%)
bpf_host.o       tail_handle_snat_fwd_ipv4              64716      61625    -3091 (-4.78%)
bpf_host.o       tail_handle_snat_fwd_ipv6              48299      30797  -17502 (-36.24%)
bpf_host.o       tail_ipv4_host_policy_ingress          21591      20017    -1574 (-7.29%)
bpf_host.o       tail_ipv6_host_policy_ingress          21177      20693     -484 (-2.29%)
bpf_host.o       tail_nodeport_nat_egress_ipv4          16588      16543      -45 (-0.27%)
bpf_host.o       tail_nodeport_nat_ingress_ipv4         39200      36116    -3084 (-7.87%)
bpf_host.o       tail_nodeport_nat_ingress_ipv6         50102      48003    -2099 (-4.19%)
bpf_lxc.o        tail_handle_ipv4_cont                 113092      96891  -16201 (-14.33%)
bpf_lxc.o        tail_handle_ipv6                        6727       6701      -26 (-0.39%)
bpf_lxc.o        tail_handle_ipv6_cont                  25567      21805   -3762 (-14.71%)
bpf_lxc.o        tail_ipv4_ct_egress                    28843      15970  -12873 (-44.63%)
bpf_lxc.o        tail_ipv4_ct_ingress                   16691      10213   -6478 (-38.81%)
bpf_lxc.o        tail_ipv4_ct_ingress_policy_only       16691      10213   -6478 (-38.81%)
bpf_lxc.o        tail_ipv4_policy                        6776       6622     -154 (-2.27%)
bpf_lxc.o        tail_ipv4_to_endpoint                   7523       7219     -304 (-4.04%)
bpf_lxc.o        tail_ipv6_ct_egress                    10275       9999     -276 (-2.69%)
bpf_lxc.o        tail_ipv6_ct_ingress                    6466       6438      -28 (-0.43%)
bpf_lxc.o        tail_ipv6_ct_ingress_policy_only        6466       6438      -28 (-0.43%)
bpf_lxc.o        tail_ipv6_policy                        6859       5159   -1700 (-24.78%)
bpf_lxc.o        tail_ipv6_to_endpoint                   7039       4427   -2612 (-37.11%)
bpf_lxc.o        tail_nodeport_ipv6_dsr                  1175       1033    -142 (-12.09%)
bpf_lxc.o        tail_nodeport_nat_egress_ipv4          16318      16292      -26 (-0.16%)
bpf_lxc.o        tail_nodeport_nat_ingress_ipv4         18907      18490     -417 (-2.21%)
bpf_lxc.o        tail_nodeport_nat_ingress_ipv6         14624      14556      -68 (-0.46%)
bpf_lxc.o        tail_nodeport_rev_dnat_ipv4             4776       4588     -188 (-3.94%)
bpf_overlay.o    tail_handle_inter_cluster_revsnat      15733      15498     -235 (-1.49%)
bpf_overlay.o    tail_handle_ipv4                      124682     105717  -18965 (-15.21%)
bpf_overlay.o    tail_handle_ipv6                       16201      15801     -400 (-2.47%)
bpf_overlay.o    tail_handle_snat_fwd_ipv4              21280      19323    -1957 (-9.20%)
bpf_overlay.o    tail_handle_snat_fwd_ipv6              20824      20822       -2 (-0.01%)
bpf_overlay.o    tail_nodeport_ipv6_dsr                  1175       1033    -142 (-12.09%)
bpf_overlay.o    tail_nodeport_nat_egress_ipv4          16293      16267      -26 (-0.16%)
bpf_overlay.o    tail_nodeport_nat_ingress_ipv4         20841      20737     -104 (-0.50%)
bpf_overlay.o    tail_nodeport_nat_ingress_ipv6         14678      14629      -49 (-0.33%)
bpf_sock.o       cil_sock4_connect                       1678       1623      -55 (-3.28%)
bpf_sock.o       cil_sock4_sendmsg                       1791       1736      -55 (-3.07%)
bpf_sock.o       cil_sock6_connect                       3641       3600      -41 (-1.13%)
bpf_sock.o       cil_sock6_recvmsg                       2048       1899     -149 (-7.28%)
bpf_sock.o       cil_sock6_sendmsg                       3755       3721      -34 (-0.91%)
bpf_wireguard.o  tail_handle_ipv4                       31180      27484   -3696 (-11.85%)
bpf_wireguard.o  tail_handle_ipv6                       12095      11760     -335 (-2.77%)
bpf_wireguard.o  tail_nodeport_ipv6_dsr                  1232       1094    -138 (-11.20%)
bpf_wireguard.o  tail_nodeport_nat_egress_ipv4          16071      16061      -10 (-0.06%)
bpf_wireguard.o  tail_nodeport_nat_ingress_ipv4         20804      20565     -239 (-1.15%)
bpf_wireguard.o  tail_nodeport_nat_ingress_ipv6         13490      12224    -1266 (-9.38%)
bpf_xdp.o        tail_lb_ipv4                           49695      42673   -7022 (-14.13%)
bpf_xdp.o        tail_lb_ipv6                          122683      87896  -34787 (-28.36%)
bpf_xdp.o        tail_nodeport_ipv6_dsr                  1833       1862      +29 (+1.58%)
bpf_xdp.o        tail_nodeport_nat_egress_ipv4           6999       6990       -9 (-0.13%)
bpf_xdp.o        tail_nodeport_nat_ingress_ipv4         28903      28780     -123 (-0.43%)
bpf_xdp.o        tail_nodeport_nat_ingress_ipv6        200361     197771    -2590 (-1.29%)
bpf_xdp.o        tail_nodeport_rev_dnat_ipv4             4606       4454     -152 (-3.30%)

Changes to sched-ext:
File                       Program           Insns (A)  Insns (B)  Insns    (DIFF)
_________________________  ________________  _________  _________  _______________
scx_arena_selftests.bpf.o  arena_selftest       236305     236251     -54 (-0.02%)
scx_chaos.bpf.o            chaos_dispatch        12282       8013  -4269 (-34.76%)
scx_chaos.bpf.o            chaos_enqueue         11398       7126  -4272 (-37.48%)
scx_chaos.bpf.o            chaos_init             3854       3828     -26 (-0.67%)
scx_flash.bpf.o            flash_init             1015        979     -36 (-3.55%)
scx_flatcg.bpf.o           fcg_dispatch           1143       1100     -43 (-3.76%)
scx_lavd.bpf.o             lavd_enqueue          35487      35472     -15 (-0.04%)
scx_lavd.bpf.o             lavd_init             21127      21107     -20 (-0.09%)
scx_p2dq.bpf.o             p2dq_enqueue          10210       7854  -2356 (-23.08%)
scx_p2dq.bpf.o             p2dq_init              3233       3207     -26 (-0.80%)
scx_qmap.bpf.o             qmap_init             20285      20230     -55 (-0.27%)
scx_rusty.bpf.o            rusty_select_cpu       1165       1148     -17 (-1.46%)
scxtop.bpf.o               on_sched_switch        2369       2355     -14 (-0.59%)

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260323022410.75444-1-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-24 11:59:52 -07:00
Kumar Kartikeya Dwivedi
02bcf8ef26 bpf: Update MAINTAINERS file for general BPF entry
Per discussion with Alexei, add Eduard and myself as maintainers under
BPF [GENERAL]. While at it, drop R entries for reviewers who have been
inactive.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260324152230.2916217-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-24 09:05:11 -07:00
Varun R Mallya
b43d574c00 selftests/bpf: Add test for struct_ops __ref argument in any position
Add a selftest to verify that the verifier correctly identifies refcounted
arguments in struct_ops programs, even when they are not the first
argument. This ensures that the restriction on tail calls for programs
with __ref arguments is properly enforced regardless of which argument
they appear in.

This test verifies the fix for check_struct_ops_btf_id() proposed by
Keisuke Nishimura [0], which corrected a bug where only the first
argument was checked for the refcounted flag.
The test includes:
- An update to bpf_testmod to add 'test_refcounted_multi', an operator with
  three arguments where the third is tagged with "__ref".
- A BPF program 'test_refcounted_multi' that attempts a tail call.
- A test runner that asserts the verifier rejects the program with
  "program with __ref argument cannot tail call".

[0]: https://lore.kernel.org/bpf/20260320130219.63711-1-keisuke.nishimura@inria.fr/

Signed-off-by: Varun R Mallya <varunrmallya@gmail.com>
Link: https://lore.kernel.org/r/20260321214038.80479-1-varunrmallya@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-24 08:51:23 -07:00
Keisuke Nishimura
25e3e1f109 bpf: Fix refcount check in check_struct_ops_btf_id()
The current implementation only checks whether the first argument is
refcounted. Fix this by iterating over all arguments.

Signed-off-by: Keisuke Nishimura <keisuke.nishimura@inria.fr>
Fixes: 38f1e66abd ("bpf: Do not allow tail call in strcut_ops program with __ref argument")
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260320130219.63711-1-keisuke.nishimura@inria.fr
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-24 08:50:20 -07:00
Weixie Cui
ad2f7ed0ee bpf: propagate kvmemdup_bpfptr errors from bpf_prog_verify_signature
kvmemdup_bpfptr() returns -EFAULT when the user pointer cannot be
copied, and -ENOMEM on allocation failure. The error path always
returned -ENOMEM, misreporting bad addresses as out-of-memory.

Return PTR_ERR(sig) so user space gets the correct errno.

Signed-off-by: Weixie Cui <cuiweixie@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/tencent_C9C5B2B28413D6303D505CD02BFEA4708C07@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-24 08:48:51 -07:00
Martin KaFai Lau
280de43e88 bpf: Remove ipv6_bpf_stub usage in test_run
bpf_prog_test_run_skb() uses net->ipv6.ip6_null_entry for
BPF_PROG_TYPE_LWT_XMIT test runs.

It currently checks ipv6_bpf_stub before using ip6_null_entry.
ipv6_bpf_stub will be removed by the CONFIG_IPV6=m support removal
series posted at [1], so switch this check to ipv6_mod_enabled()
instead.

This change depends on that series [1]. Without it, CONFIG_IPV6=m is
still possible, and net->ipv6.ip6_null_entry remains NULL until
the IPv6 module is loaded.

[1] https://lore.kernel.org/netdev/20260320185649.5411-1-fmancera@suse.de/

Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://lore.kernel.org/r/20260323225250.1623542-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-24 08:47:33 -07:00
Amery Hung
03b7b389fe selftests/bpf: Fix compiler warnings in task_local_data.h
Fix compiler warnings about unused parameter, narrowing non-constant
into a smaller type and comparison between integers of different size.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260323231133.859941-1-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-24 08:46:54 -07:00
Hao Sun
833ef4a954 bpf: Simplify tnum_step()
Simplify tnum_step() from a 10-variable algorithm into a straight
line sequence of bitwise operations.

Problem Reduction:

tnum_step(): Given a tnum `(tval, tmask)` where `tval & tmask == 0`,
and a value `z` with `tval ≤ z < (tval | tmask)`, find the smallest
`r > z`, a tnum-satisfying value, i.e., `r & ~tmask == tval`.

Every tnum-satisfying value has the form tval | s where s is a subset
of tmask bits (s & ~tmask == 0).  Since tval and tmask are disjoint:

    tval | s  =  tval + s

Similarly z = tval + d where d = z - tval, so r > z becomes:

    tval + s  >  tval + d
    s > d

The problem reduces to: find the smallest s, a subset of tmask, such
that s > d.

Notice that `s` must be a subset of tmask, the problem now is simplified.

Algorithm:

The mask bits of `d` form a "counter" that we want to increment by one,
but the counter has gaps at the fixed-bit positions.  A normal +1 would
stop at the first 0-bit it meets; we need it to skip over fixed-bit
gaps and land on the next mask bit.

Step 1 -- plug the gaps:

    d | carry_mask | ~tmask

  - ~tmask fills all fixed-bit positions with 1.
  - carry_mask = (1 << fls64(d & ~tmask)) - 1 fills all positions
    (including mask positions) below the highest non-mask bit of d.

After this, the only remaining 0s are mask bits above the highest
non-mask bit of d where d is also 0 -- exactly the positions where
the carry can validly land.

Step 2 -- increment:

    (d | carry_mask | ~tmask) + 1

Adding 1 flips all trailing 1s to 0 and sets the first 0 to 1.  Since
every gap has been plugged, that first 0 is guaranteed to be a mask bit
above all non-mask bits of d.

Step 3 -- mask:

    ((d | carry_mask | ~tmask) + 1) & tmask

Strip the scaffolding, keeping only mask bits.  Call the result inc.

Step 4 -- result:

    tval | inc

Reattach the fixed bits.

A simple 8-bit example:
    tmask:        1  1  0  1  0  1  1  0
    d:            1  0  1  0  0  0  1  0     (d = 162)
                        ^
                        non-mask 1 at bit 5

With carry_mask = 0b00111111 (smeared from bit 5):

    d|carry|~tm   1  0  1  1  1  1  1  1
    + 1           1  1  0  0  0  0  0  0
    & tmask       1  1  0  0  0  0  0  0

The patch passes my local test: test_verifier, test_progs for
`-t verifier` and `-t reg_bounds`.

CBMC shows the new code is equiv to original one[1], and
a lean4 proof of correctness is available[2]:

theorem tnumStep_correct (tval tmask z : BitVec 64)
    -- Precondition: valid tnum and input z
    (h_consistent : (tval &&& tmask) = 0)
    (h_lo : tval ≤ z)
    (h_hi : z < (tval ||| tmask)) :
    -- Postcondition: r must be:
    --    (1) tnum member
    --    (2) z < r
    --    (3) for any other member w > z, r <= w
    let r := tnumStep tval tmask z
    satisfiesTnum64 r tval tmask ∧
    tval ≤ r ∧ r ≤ (tval ||| tmask) ∧
    z < r ∧
    ∀ w, satisfiesTnum64 w tval tmask → z < w → r ≤ w := by
  -- unfold definition
  unfold tnumStep satisfiesTnum64
  simp only []
  refine ⟨?_, ?_, ?_, ?_, ?_⟩
  -- the solver proves each conjunct
  · bv_decide
  · bv_decide
  · bv_decide
  · bv_decide
  · intro w hw1 hw2; bv_decide

[1] https://github.com/eddyz87/tnum-step-verif/blob/master/main.c
[2] https://pastebin.com/raw/czHKiyY0

Signed-off-by: Hao Sun <hao.sun@inf.ethz.ch>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Reviewed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Link: https://lore.kernel.org/r/20260320162336.166542-1-hao.sun@inf.ethz.ch
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-24 08:45:29 -07:00
Puranjay Mohan
4ea43a4355 bpftool: Enable aarch64 ISA extensions for JIT disassembly
The LLVM disassembler needs ISA extension features enabled to correctly
decode instructions from those extensions. On aarch64, without these
features, instructions like LSE atomics (e.g. ldaddal) are silently
decoded as incorrect instructions and disassembly is truncated.

Use LLVMCreateDisasmCPUFeatures() with "+all" features for aarch64
targets so that the disassembler can handle any instruction the kernel
JIT might emit.

Before:

int bench_trigger_uprobe(void * ctx):
bpf_prog_538c6a43d1c6b84c_bench_trigger_uprobe:
; int cpu = bpf_get_smp_processor_id();
   0:   mov     x9, x30
   4:   nop
   8:   stp     x29, x30, [sp, #-16]!
   c:   mov     x29, sp
  10:   stp     xzr, x26, [sp, #-16]!
  14:   mov     x26, sp
  18:   mrs     x10, SP_EL0
  1c:   ldr     w7, [x10, #16]
; __sync_add_and_fetch(&hits[cpu & CPU_MASK].value, 1);
  20:   and     w7, w7, #0xff
; __sync_add_and_fetch(&hits[cpu & CPU_MASK].value, 1);
  24:   lsl     x7, x7, #7
  28:   mov     x0, #-281474976710656
  2c:   movk    x0, #32768, lsl #32
  30:   movk    x0, #35407, lsl #16
  34:   add     x0, x0, x7
  38:   mov     x1, #1
; __sync_add_and_fetch(&hits[cpu & CPU_MASK].value, 1);
  3c:   mov     x1, #1

After:

int bench_trigger_uprobe(void * ctx):
bpf_prog_538c6a43d1c6b84c_bench_trigger_uprobe:
; int cpu = bpf_get_smp_processor_id();
   0:   mov     x9, x30
   4:   nop
   8:   stp     x29, x30, [sp, #-16]!
   c:   mov     x29, sp
  10:   stp     xzr, x26, [sp, #-16]!
  14:   mov     x26, sp
  18:   mrs     x10, SP_EL0
  1c:   ldr     w7, [x10, #16]
; __sync_add_and_fetch(&hits[cpu & CPU_MASK].value, 1);
  20:   and     w7, w7, #0xff
; __sync_add_and_fetch(&hits[cpu & CPU_MASK].value, 1);
  24:   lsl     x7, x7, #7
  28:   mov     x0, #-281474976710656
  2c:   movk    x0, #32768, lsl #32
  30:   movk    x0, #35407, lsl #16
  34:   add     x0, x0, x7
  38:   mov     x1, #1
; __sync_add_and_fetch(&hits[cpu & CPU_MASK].value, 1);
  3c:   ldaddal x1, x1, [x0]
; return 0;
  40:   mov     w7, #0
  44:   ldp     xzr, x26, [sp], #16
  48:   ldp     x29, x30, [sp], #16
  4c:   mov     x0, x7
  50:   ret
  54:   nop
  58:   ldr     x10, #8
  5c:   br      x10

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Leon Hwang <leon.hwang@linux.dev>
Acked-by: Quentin Monnet <qmo@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260318172259.2882792-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-24 08:44:29 -07:00
Carlos Llamas
9b0cf064ea bpf: Switch CONFIG_CFI_CLANG to CONFIG_CFI
This was renamed in commit 23ef9d4397 ("kcfi: Rename CONFIG_CFI_CLANG
to CONFIG_CFI") as it is now a compiler-agnostic option. Using the wrong
name results in the code getting compiled out. Meaning the CFI failures
for btf_dtor_kfunc_t would still trigger.

Fixes: 99fde4d062 ("bpf, btf: Enforce destructor kfunc type with CFI")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260312183818.2721750-1-cmllamas@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-24 08:42:24 -07:00
Eric Biggers
a02327413a bpf: Remove inclusions of crypto/sha1.h
Since commit 603b441623 ("bpf: Update the bpf_prog_calc_tag to use
SHA256") made BPF program tags use SHA-256 instead of SHA-1, the header
<crypto/sha1.h> no longer needs to be included.  Remove the relevant
inclusions so that they no longer unnecessarily come up in searches for
which kernel code is still using the obsolete SHA-1 algorithm.

Since net/ipv6/addrconf.c was relying on the transitive inclusion of
<crypto/sha1.h> (for an unrelated purpose) via <linux/filter.h>, make it
include <crypto/sha1.h> explicitly in order to keep that file building.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/20260314214555.112386-1-ebiggers@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-24 08:40:45 -07:00
Varun R Mallya
bb6da652c5 selftests/bpf: Improve connect_force_port test reliability
The connect_force_port test fails intermittently in CI because the
hardcoded server ports (60123/60124) may already be in use by other
tests or processes [1].

Fix this by passing port 0 to start_server(), letting the kernel assign
a free port dynamically. The actual assigned port is then propagated to
the BPF programs by writing it into the .bss map's initial value (via
bpf_map__initial_value()) before loading, so the BPF programs use the
correct backend port at runtime.

[1] https://github.com/kernel-patches/bpf/actions/runs/22697676317/job/65808536038

Suggested-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Varun R Mallya <varunrmallya@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://patch.msgid.link/20260323081131.65604-1-varunrmallya@gmail.com
2026-03-23 11:37:34 -07:00
Alexei Starovoitov
bfec8e88ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.0-rc5
Cross-merge BPF and other fixes after downstream PR.

Minor conflicts in:
  tools/testing/selftests/bpf/progs/exceptions_fail.c
  tools/testing/selftests/bpf/progs/verifier_bounds.c

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-22 19:33:29 -07:00
Linus Torvalds
c369299895 Linux 7.0-rc5 v7.0-rc5 2026-03-22 14:42:17 -07:00
Mikko Perttunen
ec69c9e883 i2c: tegra: Don't mark devices with pins as IRQ safe
I2C devices with associated pinctrl states (DPAUX I2C controllers)
will change pinctrl state during runtime PM. This requires taking
a mutex, so these devices cannot be marked as IRQ safe.

Add PINCTRL as dependency to avoid build errors.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/all/E1vsNBv-00000009nfA-27ZK@rmk-PC.armlinux.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-03-22 11:37:58 -07:00
Linus Torvalds
d5273fd3ca Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:

 - Fix how linked registers track zero extension of subregisters (Daniel
   Borkmann)

 - Fix unsound scalar fork for OR instructions (Daniel Wade)

 - Fix exception exit lock check for subprogs (Ihor Solodrai)

 - Fix undefined behavior in interpreter for SDIV/SMOD instructions
   (Jenny Guanni Qu)

 - Release module's BTF when module is unloaded (Kumar Kartikeya
   Dwivedi)

 - Fix constant blinding for PROBE_MEM32 instructions (Sachin Kumar)

 - Reset register ID for END instructions to prevent incorrect value
   tracking (Yazhou Tang)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add a test cases for sync_linked_regs regarding zext propagation
  bpf: Fix sync_linked_regs regarding BPF_ADD_CONST32 zext propagation
  selftests/bpf: Add tests for maybe_fork_scalars() OR vs AND handling
  bpf: Fix unsound scalar forking in maybe_fork_scalars() for BPF_OR
  selftests/bpf: Add tests for sdiv32/smod32 with INT_MIN dividend
  bpf: Fix undefined behavior in interpreter sdiv/smod for INT_MIN
  selftests/bpf: Add tests for bpf_throw lock leak from subprogs
  bpf: Fix exception exit lock checking for subprogs
  bpf: Release module BTF IDR before module unload
  selftests/bpf: Fix pkg-config call on static builds
  bpf: Fix constant blinding for PROBE_MEM32 stores
  selftests/bpf: Add test for BPF_END register ID reset
  bpf: Reset register ID for BPF_END value tracking
2026-03-22 11:16:06 -07:00
Linus Torvalds
ac57fa9faf Merge tag 'trace-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Revert "tracing: Remove pid in task_rename tracing output"

   A change was made to remove the pid field from the task_rename event
   because it was thought that it was always done for the current task
   and recording the pid would be redundant. This turned out to be
   incorrect and there are a few corner case where this is not true and
   caused some regressions in tooling.

 - Fix the reading from user space for migration

   The reading of user space uses a seq lock type of logic where it uses
   a per-cpu temporary buffer and disables migration, then enables
   preemption, does the copy from user space, disables preemption,
   enables migration and checks if there was any schedule switches while
   preemption was enabled. If there was a context switch, then it is
   considered that the per-cpu buffer could be corrupted and it tries
   again. There's a protection check that tests if it takes a hundred
   tries, it issues a warning and exits out to prevent a live lock.

   This was triggered because the task was selected by the load balancer
   to be migrated to another CPU, every time preemption is enabled the
   migration task would schedule in try to migrate the task but can't
   because migration is disabled and let it run again. This caused the
   scheduler to schedule out the task every time it enabled preemption
   and made the loop never exit (until the 100 iteration test
   triggered).

   Fix this by enabling and disabling preemption and keeping migration
   enabled if the reading from user space needs to be done again. This
   will let the migration thread migrate the task and the copy from user
   space will likely pass on the next iteration.

 - Fix trace_marker copy option freeing

   The "copy_trace_marker" option allows a tracing instance to get a
   copy of a write to the trace_marker file of the top level instance.
   This is managed by a link list protected by RCU. When an instance is
   removed, a check is made if the option is set, and if so
   synchronized_rcu() is called.

   The problem is that an iteration is made to reset all the flags to
   what they were when the instance was created (to perform clean ups)
   was done before the check of the copy_trace_marker option and that
   option was cleared, so the synchronize_rcu() was never called.

   Move the clearing of all the flags after the check of
   copy_trace_marker to do synchronize_rcu() so that the option is still
   set if it was before and the synchronization is performed.

 - Fix entries setting when validating the persistent ring buffer

   When validating the persistent ring buffer on boot up, the number of
   events per sub-buffer is added to the sub-buffer meta page. The
   validator was updating cpu_buffer->head_page (the first sub-buffer of
   the per-cpu buffer) and not the "head_page" variable that was
   iterating the sub-buffers. This was causing the first sub-buffer to
   be assigned the entries for each sub-buffer and not the sub-buffer
   that was supposed to be updated.

 - Use "hash" value to update the direct callers

   When updating the ftrace direct callers, it assigned a temporary
   callback to all the callback functions of the ftrace ops and not just
   the functions represented by the passed in hash. This causes an
   unnecessary slow down of the functions of the ftrace_ops that is not
   being modified. Only update the functions that are going to be
   modified to call the ftrace loop function so that the update can be
   made on those functions.

* tag 'trace-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ftrace: Use hash argument for tmp_ops in update_ftrace_direct_mod
  ring-buffer: Fix to update per-subbuf entries of persistent ring buffer
  tracing: Fix trace_marker copy link list updates
  tracing: Fix failure to read user space from system call trace events
  tracing: Revert "tracing: Remove pid in task_rename tracing output"
2026-03-22 11:10:31 -07:00
Linus Torvalds
11ac4ce3f7 Merge tag 'i2c-for-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:

 - fix broken I2C communication on Armada 3700 with recovery

 - fix device_node reference leak in probe (fsi)

 - fix NULL-deref when serial string is missing (cp2615)

* tag 'i2c-for-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: pxa: defer reset on Armada 3700 when recovery is used
  i2c: fsi: Fix a potential leak in fsi_i2c_probe()
  i2c: cp2615: fix serial string NULL-deref at probe
2026-03-22 11:05:34 -07:00
Linus Torvalds
8d8bd2a5aa Merge tag 'x86-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:

 - Improve Qemu MCE-injection behavior by only using AMD SMCA MSRs if
   the feature bit is set

 - Fix the relative path of gettimeofday.c inclusion in vclock_gettime.c

 - Fix a boot crash on UV clusters when a socket is marked as
   'deconfigured' which are mapped to the SOCK_EMPTY node ID by
   the UV firmware, while Linux APIs expect NUMA_NO_NODE.

   The difference being (0xffff [unsigned short ~0]) vs [int -1]

* tag 'x86-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/uv: Handle deconfigured sockets
  x86/entry/vdso: Fix path of included gettimeofday.c
  x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs
2026-03-22 10:54:12 -07:00
Linus Torvalds
ebfd9b7af2 Merge tag 'perf-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:

 - Fix a PMU driver crash on AMD EPYC systems, caused by
   a race condition in x86_pmu_enable()

 - Fix a possible counter-initialization bug in x86_pmu_enable()

 - Fix a counter inheritance bug in inherit_event() and
   __perf_event_read()

 - Fix an Intel PMU driver branch constraints handling bug
   found by UBSAN

 - Fix the Intel PMU driver's new Off-Module Response (OMR)
   support code for Diamond Rapids / Nova lake, to fix a snoop
   information parsing bug

* tag 'perf-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix OMR snoop information parsing issues
  perf/x86/intel: Add missing branch counters constraint apply
  perf: Make sure to use pmu_ctx->pmu for groups
  x86/perf: Make sure to program the counter value for stopped events on migration
  perf/x86: Move event pointer setup earlier in x86_pmu_enable()
2026-03-22 10:31:51 -07:00
Linus Torvalds
dea622e183 Merge tag 'objtool-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
 "Fix three more livepatching related build environment bugs, and a
  false positive warning with Clang jump tables"

* tag 'objtool-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix Clang jump table detection
  livepatch/klp-build: Fix inconsistent kernel version
  objtool/klp: fix mkstemp() failure with long paths
  objtool/klp: fix data alignment in __clone_symbol()
2026-03-22 10:17:50 -07:00
Linus Torvalds
d56d4a110f Merge tag 'locking-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
 "Fix a sparse build error regression in <linux/local_lock_internal.h>
  caused by the locking context-analysis changes"

* tag 'locking-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  include/linux/local_lock_internal.h: Make this header file again compatible with sparse
2026-03-22 09:57:20 -07:00
Linus Torvalds
b5fddfad34 Merge tag 'irq-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
 "Fix a mailbox channel leak in the riscv-rpmi-sysmsi irqchip driver"

* tag 'irq-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/riscv-rpmi-sysmsi: Fix mailbox channel leak in rpmi_sysmsi_probe()
2026-03-22 09:55:58 -07:00
Linus Torvalds
d723091c8c Merge tag 'driver-core-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:

 - Generalize driver_override in the driver core, providing a common
   sysfs implementation and concurrency-safe accessors for bus
   implementations

 - Do not use driver_override as IRQ name in the hwmon axi-fan driver

 - Remove an unnecessary driver_override check in sh platform_early

 - Migrate the platform bus to use the generic driver_override
   infrastructure, fixing a UAF condition caused by accessing the
   driver_override field without proper locking in the platform_match()
   callback

* tag 'driver-core-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  driver core: platform: use generic driver_override infrastructure
  sh: platform_early: remove pdev->driver_override check
  hwmon: axi-fan: don't use driver_override as IRQ name
  docs: driver-model: document driver_override
  driver core: generalize driver_override in struct device
2026-03-21 16:59:09 -07:00
Jiri Olsa
50b35c9e50 ftrace: Use hash argument for tmp_ops in update_ftrace_direct_mod
The modify logic registers temporary ftrace_ops object (tmp_ops) to trigger
the slow path for all direct callers to be able to safely modify attached
addresses.

At the moment we use ops->func_hash for tmp_ops filter, which represents all
the systems attachments. It's faster to use just the passed hash filter, which
contains only the modified sites and is always a subset of the ops->func_hash.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Menglong Dong <menglong8.dong@gmail.com>
Cc: Song Liu <song@kernel.org>
Link: https://patch.msgid.link/20260312123738.129926-1-jolsa@kernel.org
Fixes: e93672f770 ("ftrace: Add update_ftrace_direct_mod function")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-21 16:51:04 -04:00
Masami Hiramatsu (Google)
f35dbac694 ring-buffer: Fix to update per-subbuf entries of persistent ring buffer
Since the validation loop in rb_meta_validate_events() updates the same
cpu_buffer->head_page->entries, the other subbuf entries are not updated.
Fix to use head_page to update the entries field, since it is the cursor
in this loop.

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ian Rogers <irogers@google.com>
Fixes: 5f3b6e839f ("ring-buffer: Validate boot range memory events")
Link: https://patch.msgid.link/177391153882.193994.17158784065013676533.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-21 16:47:28 -04:00
Steven Rostedt
07183aac4a tracing: Fix trace_marker copy link list updates
When the "copy_trace_marker" option is enabled for an instance, anything
written into /sys/kernel/tracing/trace_marker is also copied into that
instances buffer. When the option is set, that instance's trace_array
descriptor is added to the marker_copies link list. This list is protected
by RCU, as all iterations uses an RCU protected list traversal.

When the instance is deleted, all the flags that were enabled are cleared.
This also clears the copy_trace_marker flag and removes the trace_array
descriptor from the list.

The issue is after the flags are called, a direct call to
update_marker_trace() is performed to clear the flag. This function
returns true if the state of the flag changed and false otherwise. If it
returns true here, synchronize_rcu() is called to make sure all readers
see that its removed from the list.

But since the flag was already cleared, the state does not change and the
synchronization is never called, leaving a possible UAF bug.

Move the clearing of all flags below the updating of the copy_trace_marker
option which then makes sure the synchronization is performed.

Also use the flag for checking the state in update_marker_trace() instead
of looking at if the list is empty.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260318185512.1b6c7db4@gandalf.local.home
Fixes: 7b382efd5e ("tracing: Allow the top level trace_marker to write into another instances")
Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/all/20260225133122.237275-1-sashal@kernel.org/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-21 16:43:53 -04:00
Steven Rostedt
edca33a562 tracing: Fix failure to read user space from system call trace events
The system call trace events call trace_user_fault_read() to read the user
space part of some system calls. This is done by grabbing a per-cpu
buffer, disabling migration, enabling preemption, calling
copy_from_user(), disabling preemption, enabling migration and checking if
the task was preempted while preemption was enabled. If it was, the buffer
is considered corrupted and it tries again.

There's a safety mechanism that will fail out of this loop if it fails 100
times (with a warning). That warning message was triggered in some
pi_futex stress tests. Enabling the sched_switch trace event and
traceoff_on_warning, showed the problem:

 pi_mutex_hammer-1375    [006] d..21   138.981648: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981651: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981656: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981659: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981664: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981667: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981671: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981675: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981679: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981682: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981687: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981690: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981695: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981698: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981703: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981706: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981711: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981714: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981719: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981722: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981727: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981730: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981735: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981738: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95

What happened was the task 1375 was flagged to be migrated. When
preemption was enabled, the migration thread woke up to migrate that task,
but failed because migration for that task was disabled. This caused the
loop to fail to exit because the task scheduled out while trying to read
user space.

Every time the task enabled preemption the migration thread would schedule
in, try to migrate the task, fail and let the task continue. But because
the loop would only enable preemption with migration disabled, it would
always fail because each time it enabled preemption to read user space,
the migration thread would try to migrate it.

To solve this, when the loop fails to read user space without being
scheduled out, enabled and disable preemption with migration enabled. This
will allow the migration task to successfully migrate the task and the
next loop should succeed to read user space without being scheduled out.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260316130734.1858a998@gandalf.local.home
Fixes: 64cf7d058a ("tracing: Have trace_marker use per-cpu data to read user space")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-21 16:42:36 -04:00
Xuewen Yan
a6f22e50c7 tracing: Revert "tracing: Remove pid in task_rename tracing output"
This reverts commit e3f6a42272.

The commit says that the tracepoint only deals with the current task,
however the following case is not current task:

comm_write() {
    p = get_proc_task(inode);
    if (!p)
        return -ESRCH;

    if (same_thread_group(current, p))
        set_task_comm(p, buffer);
}
where set_task_comm() calls __set_task_comm() which records
the update of p and not current.

So revert the patch to show pid.

Cc: <mhiramat@kernel.org>
Cc: <mathieu.desnoyers@efficios.com>
Cc: <elver@google.com>
Cc: <kees@kernel.org>
Link: https://patch.msgid.link/20260306075954.4533-1-xuewen.yan@unisoc.com
Fixes: e3f6a42272 ("tracing: Remove pid in task_rename tracing output")
Reported-by: Guohua Yan <guohua.yan@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-21 16:41:18 -04:00
Daniel Borkmann
4a04d13576 selftests/bpf: Add a test cases for sync_linked_regs regarding zext propagation
Add multiple test cases for linked register tracking with alu32 ops:

  - Add a test that checks sync_linked_regs() regarding reg->id (the linked
    target register) for BPF_ADD_CONST32 rather than known_reg->id (the
    branch register).

  - Add a test case for linked register tracking that exposes the cross-type
    sync_linked_regs() bug. One register uses alu32 (w7 += 1, BPF_ADD_CONST32)
    and another uses alu64 (r8 += 2, BPF_ADD_CONST64), both linked to the
    same base register.

  - Add a test case that exercises regsafe() path pruning when two execution
    paths reach the same program point with linked registers carrying
    different ADD_CONST flags (BPF_ADD_CONST32 from alu32 vs BPF_ADD_CONST64
    from alu64). This particular test passes with and without the fix since
    the pruning will fail due to different ranges, but it would still be
    useful to carry this one as a regression test for the unreachable div
    by zero.

With the fix applied all the tests pass:

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_linked_scalars
  [...]
  ./test_progs -t verifier_linked_scalars
  #602/1   verifier_linked_scalars/scalars: find linked scalars:OK
  #602/2   verifier_linked_scalars/sync_linked_regs_preserves_id:OK
  #602/3   verifier_linked_scalars/scalars_neg:OK
  #602/4   verifier_linked_scalars/scalars_neg_sub:OK
  #602/5   verifier_linked_scalars/scalars_neg_alu32_add:OK
  #602/6   verifier_linked_scalars/scalars_neg_alu32_sub:OK
  #602/7   verifier_linked_scalars/scalars_pos:OK
  #602/8   verifier_linked_scalars/scalars_sub_neg_imm:OK
  #602/9   verifier_linked_scalars/scalars_double_add:OK
  #602/10  verifier_linked_scalars/scalars_sync_delta_overflow:OK
  #602/11  verifier_linked_scalars/scalars_sync_delta_overflow_large_range:OK
  #602/12  verifier_linked_scalars/scalars_alu32_big_offset:OK
  #602/13  verifier_linked_scalars/scalars_alu32_basic:OK
  #602/14  verifier_linked_scalars/scalars_alu32_wrap:OK
  #602/15  verifier_linked_scalars/scalars_alu32_zext_linked_reg:OK
  #602/16  verifier_linked_scalars/scalars_alu32_alu64_cross_type:OK
  #602/17  verifier_linked_scalars/scalars_alu32_alu64_regsafe_pruning:OK
  #602/18  verifier_linked_scalars/alu32_negative_offset:OK
  #602/19  verifier_linked_scalars/spurious_precision_marks:OK
  #602     verifier_linked_scalars:OK
  Summary: 1/19 PASSED, 0 SKIPPED, 0 FAILED

Co-developed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260319211507.213816-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:19:40 -07:00
Daniel Borkmann
bc308be380 bpf: Fix sync_linked_regs regarding BPF_ADD_CONST32 zext propagation
Jenny reported that in sync_linked_regs() the BPF_ADD_CONST32 flag is
checked on known_reg (the register narrowed by a conditional branch)
instead of reg (the linked target register created by an alu32 operation).

Example case with reg:

  1. r6 = bpf_get_prandom_u32()
  2. r7 = r6 (linked, same id)
  3. w7 += 5 (alu32 -- r7 gets BPF_ADD_CONST32, zero-extended by CPU)
  4. if w6 < 0xFFFFFFFC goto safe (narrows r6 to [0xFFFFFFFC, 0xFFFFFFFF])
  5. sync_linked_regs() propagates to r7 but does NOT call zext_32_to_64()
  6. Verifier thinks r7 is [0x100000001, 0x100000004] instead of [1, 4]

Since known_reg above does not have BPF_ADD_CONST32 set above, zext_32_to_64()
is never called on alu32-derived linked registers. This causes the verifier
to track incorrect 64-bit bounds, while the CPU correctly zero-extends the
32-bit result.

The code checking known_reg->id was correct however (see scalars_alu32_wrap
selftest case), but the real fix needs to handle both directions - zext
propagation should be done when either register has BPF_ADD_CONST32, since
the linked relationship involves a 32-bit operation regardless of which
side has the flag.

Example case with known_reg (exercised also by scalars_alu32_wrap):

  1. r1 = r0; w1 += 0x100 (alu32 -- r1 gets BPF_ADD_CONST32)
  2. if r1 > 0x80 - known_reg = r1 (has BPF_ADD_CONST32), reg = r0 (doesn't)

Hence, fix it by checking for (reg->id | known_reg->id) & BPF_ADD_CONST32.

Moreover, sync_linked_regs() also has a soundness issue when two linked
registers used different ALU widths: one with BPF_ADD_CONST32 and the
other with BPF_ADD_CONST64. The delta relationship between linked registers
assumes the same arithmetic width though. When one register went through
alu32 (CPU zero-extends the 32-bit result) and the other went through
alu64 (no zero-extension), the propagation produces incorrect bounds.

Example:

  r6 = bpf_get_prandom_u32()     // fully unknown
  if r6 >= 0x100000000 goto out  // constrain r6 to [0, U32_MAX]
  r7 = r6
  w7 += 1                        // alu32: r7.id = N | BPF_ADD_CONST32
  r8 = r6
  r8 += 2                        // alu64: r8.id = N | BPF_ADD_CONST64
  if r7 < 0xFFFFFFFF goto out    // narrows r7 to [0xFFFFFFFF, 0xFFFFFFFF]

At the branch on r7, sync_linked_regs() runs with known_reg=r7
(BPF_ADD_CONST32) and reg=r8 (BPF_ADD_CONST64). The delta path
computes:

  r8 = r7 + (delta_r8 - delta_r7) = 0xFFFFFFFF + (2 - 1) = 0x100000000

Then, because known_reg->id has BPF_ADD_CONST32, zext_32_to_64(r8) is
called, truncating r8 to [0, 0]. But r8 used a 64-bit ALU op -- the
CPU does NOT zero-extend it. The actual CPU value of r8 is
0xFFFFFFFE + 2 = 0x100000000, not 0. The verifier now underestimates
r8's 64-bit bounds, which is a soundness violation.

Fix sync_linked_regs() by skipping propagation when the two registers
have mixed ALU widths (one BPF_ADD_CONST32, the other BPF_ADD_CONST64).

Lastly, fix regsafe() used for path pruning: the existing checks used
"& BPF_ADD_CONST" to test for offset linkage, which treated
BPF_ADD_CONST32 and BPF_ADD_CONST64 as equivalent.

Fixes: 7a433e5193 ("bpf: Support negative offsets, BPF_SUB, and alu32 for linked register tracking")
Reported-by: Jenny Guanni Qu <qguanni@gmail.com>
Co-developed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260319211507.213816-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:19:40 -07:00
Alexei Starovoitov
61bc846081 Merge branch 'libbpf-add-bpf_program__clone-for-individual-program-loading'
Mykyta Yatsenko says:

====================
libbpf: Add bpf_program__clone() for individual program loading

This series adds bpf_program__clone() to libbpf and converts veristat to
use it, replacing the costly per-program object re-opening pattern.

veristat needs to load each BPF program in isolation to collect
per-program verification statistics. Previously it achieved this by
opening a fresh bpf_object for every program, disabling autoload on all
but the target, and loading the whole object. For object files with many
programs this meant repeating ELF parsing and BTF processing N times.

Patch 1 introduces bpf_program__clone(), which loads a single program
from a prepared object into the kernel and returns an fd owned by the
caller. It populates load parameters from the prepared object and lets
callers override any field via bpf_prog_load_opts. Fields written by the
prog_prepare_load_fn callback (expected_attach_type, attach_btf_id,
attach_btf_obj_fd) are seeded from prog/obj defaults before the
callback, then overridden with caller opts after, so explicit values
always win.

Patch 2 converts veristat to prepare the object once and clone each
program individually, eliminating redundant work.

Patch 3 adds a selftest verifying that caller-provided attach_btf_id
overrides are respected by bpf_program__clone().

Performance
Tested on selftests: 918 objects, ~4270 programs:
 - Wall time:   36.88s -> 23.18s (37% faster)
 - User time:   20.80s -> 16.07s (23% faster)
 - Kernel time: 12.07s ->  6.06s (50% faster)

Per-program loading also improves coverage: 83 programs that previously
failed now succeed.

Known regression:
 - Program-containing maps (PROG_ARRAY, DEVMAP, CPUMAP) track owner
   program type. Programs with incompatible attributes loaded against
   a shared map will be rejected. This is expected kernel behavior.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Changes in v5:
- Fix overriding of the attach_btf_id, attach_btf_fd, etc: the override
provided by the caller is applied after prog_prepare_load_fn().
- Added selftest to verify attach_btf_id override works as expected.
- Link to v4: https://lore.kernel.org/all/20260316-veristat_prepare-v3-0-94e5691e0494@meta.com/

Changes in v4:
- Replace OPTS_SET() with direct struct assignment for local
  bpf_prog_load_opts in bpf_program__clone() (libbpf.c)
- Remove unnecessary pattr pointer indirection (libbpf.c)
- Separate input and output fields in bpf_program__clone(): input
  fields (prog_flags, fd_array, etc.) are merged from caller opts
  before the callback; output fields (expected_attach_type,
  attach_btf_id, attach_btf_obj_fd) are initialized from prog/obj
  defaults for the callback, then overridden with caller opts after,
  so explicit caller values always win (libbpf.c)
- Add selftest for attach_btf_id override
- Link to v3: https://lore.kernel.org/r/20260206-veristat_prepare-a4a041873c53-v3@meta.com

Changes in v3:
- Clone fd_array_cnt in bpf_object__clone()
- In veristat do not fail if bpf_object__prepare() fails,
continue per-program processing to produce per program output
- Link to v2: https://lore.kernel.org/r/20260220-veristat_prepare-v2-0-15bff49022a7@meta.com

Changes in v2:
 - Removed map cloning entirely (libbpf.c)
 - Renamed bpf_prog_clone() -> bpf_program__clone()
 - Removed unnecessary obj NULL check (libbpf.c)
 - Fixed opts handling — no longer mutates caller's opts (libbpf.c)
 - Link to v1: https://lore.kernel.org/all/20260212-veristat_prepare-v1-0-c351023fb0db@meta.com/

---
====================

Link: https://patch.msgid.link/20260317-veristat_prepare-v4-0-74193d4cc9d9@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:17:14 -07:00
Mykyta Yatsenko
ceebdeec6e selftests/bpf: Test bpf_program__clone() attach_btf_id override
Add a test that verifies bpf_program__clone() respects caller-provided
attach_btf_id in bpf_prog_load_opts.

The BPF program has SEC("fentry/bpf_fentry_test1"). It is cloned twice
from the same prepared object: first with no opts, verifying the
callback resolves attach_btf_id from sec_name to bpf_fentry_test1;
then with attach_btf_id overridden to bpf_fentry_test2, verifying the
loaded program is actually attached to bpf_fentry_test2. Both results
are checked via bpf_prog_get_info_by_fd().

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260317-veristat_prepare-v4-3-74193d4cc9d9@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:17:14 -07:00
Mykyta Yatsenko
3be706b937 selftests/bpf: Use bpf_program__clone() in veristat
Replace veristat's per-program object re-opening with
bpf_program__clone().

Previously, veristat opened a separate bpf_object for every program in a
multi-program object file, iterated all programs to enable only the
target one, and then loaded the entire object.

Use bpf_object__prepare() once, then call bpf_program__clone() for each
program individually. This lets veristat load programs one at a time
from a single prepared object.

The caller now owns the returned fd and closes it after collecting stats.
Remove the special single-program fast path and the per-file early exit
in handle_verif_mode() so all files are always processed.

Split fixup_obj() into fixup_obj_maps() for object-wide map fixups that
must run before bpf_object__prepare(), and fixup_obj() for per-program
fixups (struct_ops masking, freplace type guessing) that run before each
bpf_program__clone() call.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260317-veristat_prepare-v4-2-74193d4cc9d9@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:17:14 -07:00
Mykyta Yatsenko
970bd2dced libbpf: Introduce bpf_program__clone()
Add bpf_program__clone() API that loads a single BPF program from a
prepared BPF object into the kernel, returning a file descriptor owned
by the caller.

After bpf_object__prepare(), callers can use bpf_program__clone() to
load individual programs with custom bpf_prog_load_opts, instead of
loading all programs at once via bpf_object__load(). Non-zero fields in
opts override the defaults derived from the program and object
internals; passing NULL opts populates everything automatically.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260317-veristat_prepare-v4-1-74193d4cc9d9@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:17:13 -07:00
Alexei Starovoitov
06880982c6 Merge branch 'bpf-fix-unsound-scalar-forking-for-bpf_or'
Daniel Wade says:

====================
bpf: Fix unsound scalar forking for BPF_OR

maybe_fork_scalars() unconditionally sets the pushed path dst register
to 0 for both BPF_AND and BPF_OR.  For AND this is correct (0 & K == 0),
but for OR it is wrong (0 | K == K, not 0).  This causes the verifier to
track an incorrect value on the pushed path, leading to a verifier/runtime
divergence that allows out-of-bounds map value access.

v4: Use block comment style for multi-line comments in selftests (Amery Hung)
    Add Reviewed-by/Acked-by tags
v3: Use single-line comment style in selftests (Alexei Starovoitov)
v2: Use push_stack(env, env->insn_idx, ...) to re-execute the insn
    on the pushed path (Eduard Zingerman)
====================

Link: https://patch.msgid.link/20260314021521.128361-1-danjwade95@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:14:29 -07:00
Daniel Wade
0ad1734cc5 selftests/bpf: Add tests for maybe_fork_scalars() OR vs AND handling
Add three test cases to verifier_bounds.c to verify that
maybe_fork_scalars() correctly tracks register values for BPF_OR
operations with constant source operands:

1. or_scalar_fork_rejects_oob: After ARSH 63 + OR 8, the pushed
   path should have dst = 8. With value_size = 8, accessing
   map_value + 8 is out of bounds and must be rejected.

2. and_scalar_fork_still_works: Regression test ensuring AND
   forking continues to work. ARSH 63 + AND 4 produces pushed
   dst = 0 and current dst = 4, both within value_size = 8.

3. or_scalar_fork_allows_inbounds: After ARSH 63 + OR 4, the
   pushed path has dst = 4, which is within value_size = 8
   and should be accepted.

These tests exercise the fix in the previous patch, which makes the
pushed path re-execute the ALU instruction so it computes the correct
result for BPF_OR.

Signed-off-by: Daniel Wade <danjwade95@gmail.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260314021521.128361-3-danjwade95@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:14:28 -07:00
Daniel Wade
c845894ebd bpf: Fix unsound scalar forking in maybe_fork_scalars() for BPF_OR
maybe_fork_scalars() is called for both BPF_AND and BPF_OR when the
source operand is a constant.  When dst has signed range [-1, 0], it
forks the verifier state: the pushed path gets dst = 0, the current
path gets dst = -1.

For BPF_AND this is correct: 0 & K == 0.
For BPF_OR this is wrong:    0 | K == K, not 0.

The pushed path therefore tracks dst as 0 when the runtime value is K,
producing an exploitable verifier/runtime divergence that allows
out-of-bounds map access.

Fix this by passing env->insn_idx (instead of env->insn_idx + 1) to
push_stack(), so the pushed path re-executes the ALU instruction with
dst = 0 and naturally computes the correct result for any opcode.

Fixes: bffacdb80b ("bpf: Recognize special arithmetic shift in the verifier")
Signed-off-by: Daniel Wade <danjwade95@gmail.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260314021521.128361-2-danjwade95@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:14:28 -07:00
Alexei Starovoitov
1abd3feb36 Merge branch 'bpf-fix-abs-int_min-undefined-behavior-in-interpreter-sdiv-smod'
Jenny Guanni Qu says:

====================
bpf: Fix abs(INT_MIN) undefined behavior in interpreter sdiv/smod

The BPF interpreter's signed 32-bit division and modulo handlers use
abs() on s32 operands, which is undefined for S32_MIN. This causes
the interpreter to compute wrong results, creating a mismatch with
the verifier's range tracking.

For example, INT_MIN / 2 returns 0x40000000 instead of the correct
0xC0000000. The verifier tracks the correct range, so a crafted BPF
program can exploit the mismatch for out-of-bounds map value access
(confirmed by KASAN).

Patch 1 introduces abs_s32() which handles S32_MIN correctly and
replaces all 8 abs((s32)...) call sites. s32 is the only affected
case -- the s64 handlers do not use abs().

Patch 2 adds selftests covering sdiv32 and smod32 with INT_MIN
dividend to prevent regression.

Changes since v4:
  - Renamed __safe_abs32() to abs_s32() and dropped inline keyword
    per Alexei Starovoitov's feedback

Changes since v3:
  - Fixed stray blank line deletion in the file header
  - Improved comment per Yonghong Song's suggestion
  - Added JIT vs interpreter context to selftest commit message

Changes since v2:
  - Simplified to use -(u32)x per Mykyta Yatsenko's suggestion

Changes since v1:
  - Moved helper above kerneldoc comment block to fix build warnings
====================

Link: https://patch.msgid.link/20260311011116.2108005-1-qguanni@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:12:17 -07:00
Jenny Guanni Qu
4ac95c65ef selftests/bpf: Add tests for sdiv32/smod32 with INT_MIN dividend
Add tests to verify that signed 32-bit division and modulo operations
produce correct results when the dividend is INT_MIN (0x80000000).

The bug fixed in the previous commit only affects the BPF interpreter
path. When JIT is enabled (the default on most architectures), the
native CPU division instruction produces the correct result and these
tests pass regardless. With bpf_jit_enable=0, the interpreter is used
and without the previous fix, INT_MIN / 2 incorrectly returns
0x40000000 instead of 0xC0000000 due to abs(S32_MIN) undefined
behavior, causing these tests to fail.

Test cases:
  - SDIV32 INT_MIN / 2 = -1073741824 (imm and reg divisor)
  - SMOD32 INT_MIN % 2 = 0 (positive and negative divisor)

Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com>
Link: https://lore.kernel.org/r/20260311011116.2108005-3-qguanni@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:12:17 -07:00
Jenny Guanni Qu
c77b30bd1d bpf: Fix undefined behavior in interpreter sdiv/smod for INT_MIN
The BPF interpreter's signed 32-bit division and modulo handlers use
the kernel abs() macro on s32 operands. The abs() macro documentation
(include/linux/math.h) explicitly states the result is undefined when
the input is the type minimum. When DST contains S32_MIN (0x80000000),
abs((s32)DST) triggers undefined behavior and returns S32_MIN unchanged
on arm64/x86. This value is then sign-extended to u64 as
0xFFFFFFFF80000000, causing do_div() to compute the wrong result.

The verifier's abstract interpretation (scalar32_min_max_sdiv) computes
the mathematically correct result for range tracking, creating a
verifier/interpreter mismatch that can be exploited for out-of-bounds
map value access.

Introduce abs_s32() which handles S32_MIN correctly by casting to u32
before negating, avoiding signed overflow entirely. Replace all 8
abs((s32)...) call sites in the interpreter's sdiv32/smod32 handlers.

s32 is the only affected case -- the s64 division/modulo handlers do
not use abs().

Fixes: ec0e2da95f ("bpf: Support new signed div/mod instructions.")
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com>
Link: https://lore.kernel.org/r/20260311011116.2108005-2-qguanni@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:12:16 -07:00
Tiezhu Yang
f4706504e2 selftests/bpf: Add alignment flag for test_verifier 190 testcase
There exists failure when executing the testcase "./test_verifier 190" if
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not set on LoongArch.

  #190/p calls: two calls that return map_value with incorrect bool check FAIL
  ...
  misaligned access off (0x0; 0xffffffffffffffff)+0 size 8
  ...
  Summary: 0 PASSED, 0 SKIPPED, 1 FAILED

It means that the program has unaligned accesses, but the kernel sets
CONFIG_ARCH_STRICT_ALIGN by default to enable -mstrict-align to prevent
unaligned accesses, so add a flag F_NEEDS_EFFICIENT_UNALIGNED_ACCESS
into the testcase to avoid the failure.

This is somehow similar with the commit ce1f289f54 ("selftests/bpf:
Add F_NEEDS_EFFICIENT_UNALIGNED_ACCESS to some tests").

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/20260310064507.4228-3-yangtiezhu@loongson.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:11:18 -07:00
Alexei Starovoitov
21337b58f5 Merge branch 'bpf-consolidate-sleepable-context-checks-in-verifier'
Puranjay Mohan says:

====================
bpf: Consolidate sleepable context checks in verifier

The BPF verifier has multiple call-checking functions that independently
validate whether sleepable operations are permitted in the current
context. Each function open-codes its own checks against active_rcu_locks,
active_preempt_locks, active_irq_id, and in_sleepable, duplicating the
logic already provided by in_sleepable_context().

This series consolidates these scattered checks into calls to
in_sleepable_context() across check_helper_call(), check_kfunc_call(),
and check_func_call(), reducing code duplication and making the error
reporting consistent. No functional change.
====================

Link: https://patch.msgid.link/20260318174327.3151925-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:09:35 -07:00
Puranjay Mohan
a2542a91aa bpf: Consolidate sleepable checks in check_func_call()
The sleepable context check for global function calls in
check_func_call() open-codes the same checks that in_sleepable_context()
already performs. Replace the open-coded check with a call to
in_sleepable_context() and use non_sleepable_context_description() for
the error message, consistent with check_helper_call() and
check_kfunc_call().

Note that in_sleepable_context() also checks active_locks, which
overlaps with the existing active_locks check above it. However, the two
checks serve different purposes: the active_locks check rejects all
global function calls while holding a lock (not just sleepable ones), so
it must remain as a separate guard.

Update the expected error messages in the irq and preempt_lock selftests
to match.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260318174327.3151925-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:09:35 -07:00
Puranjay Mohan
cd9840c413 bpf: Consolidate sleepable checks in check_kfunc_call()
check_kfunc_call() has multiple scattered checks that reject sleepable
kfuncs in various non-sleepable contexts (RCU, preempt-disabled, IRQ-
disabled). These are the same conditions already checked by
in_sleepable_context(), so replace them with a single consolidated
check.

This also simplifies the preempt lock tracking by flattening the nested
if/else structure into a linear chain: preempt_disable increments,
preempt_enable checks for underflow and decrements. The sleepable check
is kept as a separate block since it is logically distinct from the lock
accounting.

No functional change since in_sleepable_context() checks all the same
state (active_rcu_locks, active_preempt_locks, active_locks,
active_irq_id, in_sleepable).

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260318174327.3151925-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:09:35 -07:00
Puranjay Mohan
a0d06cf102 bpf: Consolidate sleepable checks in check_helper_call()
check_helper_call() prints the error message for every
env->cur_state->active* element when calling a sleepable helper.
Consolidate all of them into a single print statement.

The check for env->cur_state->active_locks was not part of the removed
print statements and will not be triggered with the consolidated print
as well because it is checked in do_check() before check_helper_call()
is even reached.

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260318174327.3151925-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 13:09:34 -07:00
Ihor Solodrai
a1e5c46eae selftests/bpf: Add tests for bpf_throw lock leak from subprogs
Add test cases to ensure the verifier correctly rejects bpf_throw from
subprogs when RCU, preempt, or IRQ locks are held:

  * reject_subprog_rcu_lock_throw: subprog acquires bpf_rcu_read_lock and
    then calls bpf_throw
  * reject_subprog_throw_preempt_lock: always-throwing subprog called while
    caller holds bpf_preempt_disable
  * reject_subprog_throw_irq_lock: always-throwing subprog called while
    caller holds bpf_local_irq_save

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260320000809.643798-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 12:51:44 -07:00
Ihor Solodrai
6c2128505f bpf: Fix exception exit lock checking for subprogs
process_bpf_exit_full() passes check_lock = !curframe to
check_resource_leak(), which is false in cases when bpf_throw() is
called from a static subprog. This makes check_resource_leak() to skip
validation of active_rcu_locks, active_preempt_locks, and
active_irq_id on exception exits from subprogs.

At runtime bpf_throw() unwinds the stack via ORC without releasing any
user-acquired locks, which may cause various issues as the result.

Fix by setting check_lock = true for exception exits regardless of
curframe, since exceptions bypass all intermediate frame
cleanup. Update the error message prefix to "bpf_throw" for exception
exits to distinguish them from normal BPF_EXIT.

Fix reject_subprog_with_rcu_read_lock test which was previously
passing for the wrong reason. Test program returned directly from the
subprog call without closing the RCU section, so the error was
triggered by the unclosed RCU lock on normal exit, not by
bpf_throw. Update __msg annotations for affected tests to match the
new "bpf_throw" error prefix.

The spin_lock case is not affected because they are already checked [1]
at the call site in do_check_insn() before bpf_throw can run.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/verifier.c?h=v7.0-rc4#n21098

Assisted-by: Claude:claude-opus-4-6
Fixes: f18b03faba ("bpf: Implement BPF exceptions")
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260320000809.643798-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-21 12:51:44 -07:00
Wolfram Sang
2f42e85622 Merge tag 'i2c-host-fixes-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-fixes for v7.0-rc5

pxa: fix broken I2C communication on Armada 3700 with recovery
fsi: fix device_node reference leak in probe
cp2615: fix NULL-deref when serial string is missing
2026-03-21 19:52:12 +01:00
Linus Torvalds
113ae7b4de Merge tag 'hwmon-for-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:

 - max6639: Fix pulses-per-revolution implementation

 - Several PMBus drivers: Add missing error checks

* tag 'hwmon-for-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (max6639) Fix pulses-per-revolution implementation
  hwmon: (pmbus/isl68137) Fix unchecked return value and use sysfs_emit()
  hwmon: (pmbus/ina233) Add error check for pmbus_read_word_data() return value
  hwmon: (pmbus/mp2869) Check pmbus_read_byte_data() before using its return value
  hwmon: (pmbus/mp2975) Add error check for pmbus_read_word_data() return value
  hwmon: (pmbus/hac300s) Add error check for pmbus_read_word_data() return value
2026-03-21 09:09:51 -07:00