Alexei Starovoitov 2eecf81e14 Merge branch 'bpf: Fix to preserve reg parent/live fields when copying range info'
Eduard Zingerman says:

====================

Struct bpf_reg_state is copied directly in several places including:
- check_stack_write_fixed_off() (via save_register_state());
- check_stack_read_fixed_off();
- find_equal_scalars().

However, a literal copy of this struct also copies the following fields:

struct bpf_reg_state {
	...
	struct bpf_reg_state *parent;
	...
	enum bpf_reg_liveness live;
	...
};

This breaks register parentage chain and liveness marking logic.
The commit message for the first patch has a detailed example.
This patch-set replaces direct copies with a call to a function
copy_register_state(dst,src), which preserves 'parent' and 'live'
fields of the 'dst'.

The fix comes with a significant verifier runtime penalty for some
selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg
and cilium BPF binaries (see [1]):

$ ./veristat -e file,prog,states -C -f 'states_diff>10' master-baseline.log current.log
File                        Program                           States (A)  States (B)  States   (DIFF)
--------------------------  --------------------------------  ----------  ----------  ---------------
bpf_host.o                  tail_handle_ipv4_from_host               231         299    +68 (+29.44%)
bpf_host.o                  tail_handle_nat_fwd_ipv4                1088        1320   +232 (+21.32%)
bpf_host.o                  tail_handle_nat_fwd_ipv6                 716         729     +13 (+1.82%)
bpf_host.o                  tail_nodeport_nat_ingress_ipv4           281         314    +33 (+11.74%)
bpf_host.o                  tail_nodeport_nat_ingress_ipv6           245         256     +11 (+4.49%)
bpf_lxc.o                   tail_handle_nat_fwd_ipv4                1088        1320   +232 (+21.32%)
bpf_lxc.o                   tail_handle_nat_fwd_ipv6                 716         729     +13 (+1.82%)
bpf_lxc.o                   tail_ipv4_ct_egress                      239         262     +23 (+9.62%)
bpf_lxc.o                   tail_ipv4_ct_ingress                     239         262     +23 (+9.62%)
bpf_lxc.o                   tail_ipv4_ct_ingress_policy_only         239         262     +23 (+9.62%)
bpf_lxc.o                   tail_ipv6_ct_egress                      181         195     +14 (+7.73%)
bpf_lxc.o                   tail_ipv6_ct_ingress                     181         195     +14 (+7.73%)
bpf_lxc.o                   tail_ipv6_ct_ingress_policy_only         181         195     +14 (+7.73%)
bpf_lxc.o                   tail_nodeport_nat_ingress_ipv4           281         314    +33 (+11.74%)
bpf_lxc.o                   tail_nodeport_nat_ingress_ipv6           245         256     +11 (+4.49%)
bpf_overlay.o               tail_handle_nat_fwd_ipv4                 799         829     +30 (+3.75%)
bpf_overlay.o               tail_nodeport_nat_ingress_ipv4           281         314    +33 (+11.74%)
bpf_overlay.o               tail_nodeport_nat_ingress_ipv6           245         256     +11 (+4.49%)
bpf_sock.o                  cil_sock4_connect                         47          70    +23 (+48.94%)
bpf_sock.o                  cil_sock4_sendmsg                         45          68    +23 (+51.11%)
bpf_sock.o                  cil_sock6_post_bind                       31          42    +11 (+35.48%)
bpf_xdp.o                   tail_lb_ipv4                            4413        6457  +2044 (+46.32%)
bpf_xdp.o                   tail_lb_ipv6                            6876        7249    +373 (+5.42%)
test_cls_redirect.bpf.o     cls_redirect                            4704        4799     +95 (+2.02%)
test_tcp_hdr_options.bpf.o  estab                                    180         206    +26 (+14.44%)
xdp_synproxy_kern.bpf.o     syncookie_tc                           21059       21485    +426 (+2.02%)
xdp_synproxy_kern.bpf.o     syncookie_xdp                          21857       23122   +1265 (+5.79%)
--------------------------  --------------------------------  ----------  ----------  ---------------

I looked through verification log for bpf_xdp.o tail_lb_ipv4 program in
order to identify the reason for ~50% visited states increase.
The slowdown is triggered by a difference in handling of three stack slots:
fp-56, fp-72 and fp-80, with the main difference coming from fp-72.
In fact the following change removes all the difference:

@@ -3256,7 +3256,10 @@ static void save_register_state(struct bpf_func_state *state,
 {
        int i;

-       copy_register_state(&state->stack[spi].spilled_ptr, reg);
+       if ((spi == 6 /*56*/ || spi == 8 /*72*/ || spi == 9 /*80*/) && size != BPF_REG_SIZE)
+               state->stack[spi].spilled_ptr = *reg;
+       else
+               copy_register_state(&state->stack[spi].spilled_ptr, reg);

For fp-56 I found the following pattern for divergences between
verification logs with and w/o this patch:

- At some point insn 1862 is reached and checkpoint is created;
- At some other point insn 1862 is reached again:
  - with this patch:
    - the current state is considered *not* equivalent to the old checkpoint;
    - the reason for mismatch is the state of fp-56:
      - current state: fp-56=????mmmm
      - checkpoint: fp-56_rD=mmmmmmmm
  - without this patch the current state is considered equivalent to the
    checkpoint, the fp-56 is not present in the checkpoint.

Here is a fragment of the verification log for when the checkpoint in
question created at insn 1862:

checkpoint 1862:  ... fp-56=mmmmmmmm ...
1862: ...
1863: ...
1864: (61) r1 = *(u32 *)(r0 +0)
1865: ...
1866: (63) *(u32 *)(r10 -56) = r1     ; R1_w=scalar(...) R10=fp0 fp-56=
1867: (bf) r2 = r10                   ; R2_w=fp0 R10=fp0
1868: (07) r2 += -56                  ; R2_w=fp-56
; return map_lookup_elem(&LB4_BACKEND_MAP_V2, &backend_id);
1869: (18) r1 = 0xffff888100286000    ; R1_w=map_ptr(off=0,ks=4,vs=8,imm=0)
1871: (85) call bpf_map_lookup_elem#1

- Without this patch:
  - at insn 1864 r1 liveness is set to REG_LIVE_WRITTEN;
  - at insn 1866 fp-56 liveness is set REG_LIVE_WRITTEN mark because
    of the direct r1 copy in save_register_state();
  - at insn 1871 REG_LIVE_READ is not propagated to fp-56 at
    checkpoint 1862 because of the REG_LIVE_WRITTEN mark;
  - eventually fp-56 is pruned from checkpoint at 1862 in
    clean_func_state().
- With this patch:
  - at insn 1864 r1 liveness is set to REG_LIVE_WRITTEN;
  - at insn 1866 fp-56 liveness is *not* set to REG_LIVE_WRITTEN mark
    because write size is not equal to BPF_REG_SIZE;
  - at insn 1871 REG_LIVE_READ is propagated to fp-56 at checkpoint 1862.

Hence more states have to be visited by verifier with this patch compared
to current master.

Similar patterns could be found for both fp-72 and fp-80, although these
are harder to track trough the log because of a big number of insns between
slot write and bpf_map_lookup_elem() call triggering read mark, boils down
to the following C code:

	struct ipv4_frag_id frag_id = {
		.daddr = ip4->daddr,
		.saddr = ip4->saddr,
		.id = ip4->id,
		.proto = ip4->protocol,
		.pad = 0,
	};
    ...
    map_lookup_elem(..., &frag_id);

Where:
- .id is mapped to fp-72, write of size u16;
- .saddr is mapped to fp-80, write of size u32.

This patch-set is a continuation of discussion from [2].

Changes v1 -> v2 (no changes in the code itself):
- added analysis for the tail_lb_ipv4 verification slowdown;
- rebase against fresh master branch.

[1] git@github.com:anakryiko/cilium.git
[2] https://lore.kernel.org/bpf/517af2c57ee4b9ce2d96a8cf33f7295f2d2dfe13.camel@gmail.com/
====================

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-19 15:19:40 -08:00
2022-12-04 01:59:16 +01:00
2022-12-30 17:22:14 +09:00
2022-09-28 09:02:20 +02:00
2023-01-08 11:49:43 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
Description
No description provided
Readme 3.4 GiB
Languages
C 97%
Assembly 1%
Shell 0.6%
Rust 0.5%
Python 0.4%
Other 0.3%