When libbpf initializes the kernel's struct_ops in
"bpf_map__init_kern_struct_ops()", it enforces all
pointer types must be a function pointer and rejects
others. It turns out to be too strict. For example,
when directly using "struct tcp_congestion_ops" from vmlinux.h,
it has a "struct module *owner" member and it is set to NULL
in a bpf_tcp_cc.o.
Instead, it only needs to ensure the member is a function
pointer if it has been set (relocated) to a bpf-prog.
This patch moves the "btf_is_func_proto(kern_mtype)" check
after the existing "if (!prog) { continue; }". The original debug
message in "if (!prog) { continue; }" is also removed since it is
no longer valid. Beside, there is a later debug message to tell
which function pointer is set.
The "btf_is_func_proto(mtype)" has already been guaranteed
in "bpf_object__collect_st_ops_relos()" which has been run
before "bpf_map__init_kern_struct_ops()". Thus, this check
is removed.
v2:
- Remove outdated debug message (Andrii)
Remove because there is a later debug message to tell
which function pointer is set.
- Following mtype->type is no longer needed. Remove:
"skip_mods_and_typedefs(btf, mtype->type, &mtype_id)"
- Do "if (!prog)" test before skip_mods_and_typedefs.
Fixes: 590a008882 ("bpf: libbpf: Add STRUCT_OPS support")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212021030.266932-1-kafai@fb.com
All 32-bit variants of BPF_FETCH (add, and, or, xor, xchg, cmpxchg)
define a 32-bit subreg and thus have zext_dst set. Their encoding,
however, uses dst_reg field as a base register, which causes
opt_subreg_zext_lo32_rnd_hi32() to zero-extend said base register
instead of the one the insn really defines (r0 or src_reg).
Fix by properly choosing a register being defined, similar to how
check_atomic() already does that.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210210204502.83429-1-iii@linux.ibm.com
bpf_prog_realloc copies contents of struct bpf_prog.
The pointers have to be cleared before freeing old struct.
Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
Fixes: 700d4796ef ("bpf: Optimize program stats")
Fixes: ca06f55b90 ("bpf: Add per-program recursion prevention mechanism")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This builds up on the existing socket cookie test which checks whether
the bpf_get_socket_cookie helpers provide the same value in
cgroup/connect6 and sockops programs for a socket created by the
userspace part of the test.
Instead of having an update_cookie sockops program tag a socket local
storage with 0xFF, this uses both an update_cookie_sockops program and
an update_cookie_tracing program which succesively tag the socket with
0x0F and then 0xF0.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-5-revest@chromium.org
When migrating from the bpf.h's to the vmlinux.h's definition of struct
bps_sock, an interesting LLVM behavior happened. LLVM started producing
two fetches of ctx->sk in the sockops program this means that the
verifier could not keep track of the NULL-check on ctx->sk. Therefore,
we need to extract ctx->sk in a variable before checking and
dereferencing it.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-4-revest@chromium.org
Currently, the selftest for the BPF socket_cookie helpers is built and
run independently from test_progs. It's easy to forget and hard to
maintain.
This patch moves the socket cookies test into prog_tests/ and vastly
simplifies its logic by:
- rewriting the loading code with BPF skeletons
- rewriting the server/client code with network helpers
- rewriting the cgroup code with test__join_cgroup
- rewriting the error handling code with CHECKs
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-3-revest@chromium.org
Nathan reported issue with cleaning empty build directory:
$ make -s O=build distclean
../../scripts/Makefile.include:4: *** \
O=/ho...build/tools/bpf/resolve_btfids does not exist. Stop.
The problem that tools scripts require existing output
directory, otherwise it fails.
Adding check around the resolve_btfids clean target to
ensure the output directory is in place.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/bpf/20210211124004.1144344-1-jolsa@kernel.org
Since sleepable programs are now executing under migrate_disable
the per-cpu maps are safe to use.
The map-in-map were ok to use in sleepable from the time sleepable
progs were introduced.
Note that non-preallocated maps are still not safe, since there is
no rcu_read_lock yet in sleepable programs and dynamically allocated
map elements are relying on rcu protection. The sleepable programs
have rcu_read_lock_trace instead. That limitation will be addresses
in the future.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-9-alexei.starovoitov@gmail.com
Since sleepable programs don't migrate from the cpu the excution stats can be
computed for them as well. Reuse the same infrastructure for both sleepable and
non-sleepable programs.
run_cnt -> the number of times the program was executed.
run_time_ns -> the program execution time in nanoseconds including the
off-cpu time when the program was sleeping.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-4-alexei.starovoitov@gmail.com
For double-checked locking in bpf_common_lru_push_free(), node->type is
read outside the critical section and then re-checked under the lock.
However, concurrent writes to node->type result in data races.
For example, the following concurrent access was observed by KCSAN:
write to 0xffff88801521bc22 of 1 bytes by task 10038 on cpu 1:
__bpf_lru_node_move_in kernel/bpf/bpf_lru_list.c:91
__local_list_flush kernel/bpf/bpf_lru_list.c:298
...
read to 0xffff88801521bc22 of 1 bytes by task 10043 on cpu 0:
bpf_common_lru_push_free kernel/bpf/bpf_lru_list.c:507
bpf_lru_push_free kernel/bpf/bpf_lru_list.c:555
...
Fix the data races where node->type is read outside the critical section
(for double-checked locking) by marking the access with READ_ONCE() as
well as ensuring the variable is only accessed once.
Fixes: 3a08c2fd76 ("bpf: LRU List")
Reported-by: syzbot+3536db46dfa58c573458@syzkaller.appspotmail.com
Reported-by: syzbot+516acdb03d3e27d91bcd@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210209112701.3341724-1-elver@google.com
Atomic tests store a DW, but then load it back as a W from the same
address. This doesn't work on big-endian systems, and since the point
of those tests is not testing narrow loads, fix simply by loading a
DW.
Fixes: 98d666d05a ("bpf: Add tests for new BPF atomic operations")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210020713.77911-1-iii@linux.ibm.com
Andrei Matei says:
====================
Before this patch, variable offset access to the stack was dissalowed
for regular instructions, but was allowed for "indirect" accesses (i.e.
helpers). This patch removes the restriction, allowing reading and
writing to the stack through stack pointers with variable offsets. This
makes stack-allocated buffers more usable in programs, and brings stack
pointers closer to other types of pointers.
The motivation is being able to use stack-allocated buffers for data
manipulation. When the stack size limit is sufficient, allocating
buffers on the stack is simpler than per-cpu arrays, or other
alternatives.
V2 -> V3
- var-offset writes mark all the stack slots in range as initialized, so
that future reads are not rejected.
- rewrote the C test to not use uprobes, as per Andrii's suggestion.
- addressed other review comments from Alexei.
V1 -> V2
- add support for var-offset stack writes, in addition to reads
- add a C test
- made variable offset direct reads no longer destroy spilled registers
in the access range
- address review nits
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Before this patch, variable offset access to the stack was dissalowed
for regular instructions, but was allowed for "indirect" accesses (i.e.
helpers). This patch removes the restriction, allowing reading and
writing to the stack through stack pointers with variable offsets. This
makes stack-allocated buffers more usable in programs, and brings stack
pointers closer to other types of pointers.
The motivation is being able to use stack-allocated buffers for data
manipulation. When the stack size limit is sufficient, allocating
buffers on the stack is simpler than per-cpu arrays, or other
alternatives.
In unpriviledged programs, variable-offset reads and writes are
disallowed (they were already disallowed for the indirect access case)
because the speculative execution checking code doesn't support them.
Additionally, when writing through a variable-offset stack pointer, if
any pointers are in the accessible range, there's possilibities of later
leaking pointers because the write cannot be tracked precisely.
Writes with variable offset mark the whole range as initialized, even
though we don't know which stack slots are actually written. This is in
order to not reject future reads to these slots. Note that this doesn't
affect writes done through helpers; like before, helpers need the whole
stack range to be initialized to begin with.
All the stack slots are in range are considered scalars after the write;
variable-offset register spills are not tracked.
For reads, all the stack slots in the variable range needs to be
initialized (but see above about what writes do), otherwise the read is
rejected. All register spilled in stack slots that might be read are
marked as having been read, however reads through such pointers don't do
register filling; the target register will always be either a scalar or
a constant zero.
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210207011027.676572-2-andreimatei1@gmail.com
Jiri Olsa says:
====================
hi,
resolve_btfids tool is used during the kernel build,
so we should clean it on kernel's make clean.
v2 changes:
- add Song's acks on patches 1 and 4 (others changed) [Song]
- add missing / [Andrii]
- change srctree variable initialization [Andrii]
- shifted ifdef for clean target [Andrii]
thanks,
jirka
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
We want this clean to be called from tree's root Makefile,
which defines same srctree variable and that will screw
the make setup.
We actually do not use srctree being passed from outside,
so we can solve this by setting current srctree value
directly.
Also changing the way how srctree is initialized as suggested
by Andrri.
Also root Makefile does not define the implicit RM variable,
so adding RM initialization.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210205124020.683286-4-jolsa@kernel.org
Setting up separate build directories for libbpf and libpsubcmd,
so it's separated from other objects and we don't get them mixed
in the future.
It also simplifies cleaning, which is now simple rm -rf.
Also there's no need for FEATURE-DUMP.libbpf and bpf_helper_defs.h
files in .gitignore anymore.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210205124020.683286-2-jolsa@kernel.org
KP Singh says:
====================
- Use ring_buffer__consume without BPF_RB_FORCE_WAKEUP as suggested by
Andrii
- Use ASSERT_OK_PTR macro
Sleepable programs currently do not have access to any ringbuffer and
since the perf ring buffer is a per-cpu map, it would not be trivial to
enable for sleepable programs. Our specific use-case is to use the
bpf_ima_inode_hash helper and write the hash to a ring buffer from a
sleepable LSM hook.
This series allows the BPF ringbuffer to be used in sleepable programs
(tracing and lsm). Since the helper prototypes were already exposed
the only change required was have the verifier allow
BPF_MAP_TYPE_RINGBUF for sleepable programs. The ima test is also
modified to use the ringbuffer instead of global variables.
Based on dicussions we had over the BPF office hours and enabling all
the possible debug options, I could not find any issues or warnings when
using the ring buffer from sleepable programs.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
The BPF ringbuffer map is pre-allocated and the implementation logic
does not rely on disabling preemption or per-cpu data structures. Using
the BPF ringbuffer sleepable LSM and tracing programs does not trigger
any warnings with DEBUG_ATOMIC_SLEEP, DEBUG_PREEMPT,
PROVE_RCU and PROVE_LOCKING and LOCKDEP enabled.
This allows helpers like bpf_copy_from_user and bpf_ima_inode_hash to
write to the BPF ring buffer from sleepable BPF programs.
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210204193622.3367275-2-kpsingh@kernel.org
KP Singh says:
====================
# v4 -> v5
- Use %Y (modification time) instead of %W (creation time) of the local
copy of the kernel config to check for newer upstream config.
- Rename the script to vmtest.sh
# v3 -> v4
- Fix logic for updating kernel config to not download the file
if there are no upstream modifications and avoid extraneous
kernel compilation as suggested by Andrii.
- This also removes the need for the -k flag.
# v2 -> v3
- Fixes to silence verbose commands
- Fixed output buffering without being teed out
- Fixed the clobbered error code of the script
- Other fixes suggested by Andrii
# v1 -> v2
- The script now compiles the kernel by default, and the -k option
implies "keep the kernel"
- Pointer to the script in the docs.
- Some minor simplifications.
Allow developers and contributors to understand if their changes would
end up breaking the BPF CI and avoid the back and forth required for
fixing the test cases in the CI environment. The se
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Add a short note to make contributors aware of the existence of the
script. The documentation does not intentionally document all the
options of the script to avoid mentioning it in two places (it's
available in the usage / help message of the script).
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210204194544.3383814-3-kpsingh@kernel.org
The script runs the BPF selftests locally on the same kernel image
as they would run post submit in the BPF continuous integration
framework.
The goal of the script is to allow contributors to run selftests locally
in the same environment to check if their changes would end up breaking
the BPF CI and reduce the back-and-forth between the maintainers and the
developers.
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/bpf/20210204194544.3383814-2-kpsingh@kernel.org
Libbpf's Makefile relies on Linux tools infrastructure's feature detection
framework, but libbpf's needs are very modest: it detects the presence of
libelf and libz, both of which are mandatory. So it doesn't benefit much from
the framework, but pays significant costs in terms of maintainability and
debugging experience, when something goes wrong. The other feature detector,
testing for the presernce of minimal BPF API in system headers is long
obsolete as well, providing no value.
So stop using feature detection and just assume the presence of libelf and
libz during build time. Worst case, user will get a clear and actionable
linker error, e.g.:
/usr/bin/ld: cannot find -lelf
On the other hand, we completely bypass recurring issues various users
reported over time with false negatives of feature detection (libelf or libz
not being detected, while they are actually present in the system).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/bpf/20210203203445.3356114-1-andrii@kernel.org
Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine
in order to alloc skbs in bulk for XDP_PASS verdict.
Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.
The proposed approach has been tested in the following scenario:
eth (ixgbe) --> XDP_REDIRECT --> veth0 --> (remote-ns) veth1 --> XDP_PASS
XDP_REDIRECT: xdp_redirect_map bpf sample
XDP_PASS: xdp_rxq_info bpf sample
traffic generator: pkt_gen sending udp traffic on a remote device
bpf-next master: ~3.64Mpps
bpf-next + skb bulking allocation: ~3.79Mpps
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/a14a30d3c06fff24e13f836c733d80efc0bd6eb5.1611957532.git.lorenzo@kernel.org
This patch adds support to verifier tests to check for a succession of
verifier log messages on program load failure. This makes the errstr
field work uniformly across REJECT and VERBOSE_ACCEPT checks.
This patch also increases the maximum size of a message in the series of
messages to test from 80 chars to 200 chars. This is in order to keep
existing tests working, which sometimes test for messages larger than 80
chars (which was accepted in the REJECT case, when testing for a single
message, but not in the VERBOSE_ACCEPT case, when testing for possibly
multiple messages).
And example of such a long, checked message is in bounds.c: "R1 has
unknown scalar with mixed signed bounds, pointer arithmetic with it
prohibited for !root"
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210130220150.59305-1-andreimatei1@gmail.com
Some compilers trigger a warning when tmp_dir_path is allocated
with a fixed size of 64-bytes and used in the following snprintf:
snprintf(tmp_exec_path, sizeof(tmp_exec_path), "%s/copy_of_rm",
tmp_dir_path);
warning: ‘/copy_of_rm’ directive output may be truncated writing 11
bytes into a region of size between 1 and 64 [-Wformat-truncation=]
This is because it assumes that tmp_dir_path can be a maximum of 64
bytes long and, therefore, the end-result can get truncated. Fix it by
not using a fixed size in the initialization of tmp_dir_path which
allows the compiler to track actual size of the array better.
Fixes: 2f94ac1918 ("bpf: Update local storage test to check handling of null ptrs")
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210202213730.1906931-1-kpsingh@kernel.org