mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-01-21 14:03:24 -05:00
Alexei Starovoitov says:
====================
pull-request: bpf-next 2023-12-18
This PR is larger than usual and contains changes in various parts
of the kernel.
The main changes are:
1) Fix kCFI bugs in BPF, from Peter Zijlstra.
End result: all forms of indirect calls from BPF into kernel
and from kernel into BPF work with CFI enabled. This allows BPF
to work with CONFIG_FINEIBT=y.
2) Introduce BPF token object, from Andrii Nakryiko.
It adds an ability to delegate a subset of BPF features from privileged
daemon (e.g., systemd) through special mount options for userns-bound
BPF FS to a trusted unprivileged application. The design accommodates
suggestions from Christian Brauner and Paul Moore.
Example:
$ sudo mkdir -p /sys/fs/bpf/token
$ sudo mount -t bpf bpffs /sys/fs/bpf/token \
-o delegate_cmds=prog_load:MAP_CREATE \
-o delegate_progs=kprobe \
-o delegate_attachs=xdp
3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei.
- Complete precision tracking support for register spills
- Fix verification of possibly-zero-sized stack accesses
- Fix access to uninit stack slots
- Track aligned STACK_ZERO cases as imprecise spilled registers.
It improves the verifier "instructions processed" metric from single
digit to 50-60% for some programs.
- Fix verifier retval logic
4) Support for VLAN tag in XDP hints, from Larysa Zaremba.
5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu.
End result: better memory utilization and lower I$ miss for calls to BPF
via BPF trampoline.
6) Fix race between BPF prog accessing inner map and parallel delete,
from Hou Tao.
7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu.
It allows BPF interact with IPSEC infra. The intent is to support
software RSS (via XDP) for the upcoming ipsec pcpu work.
Experiments on AWS demonstrate single tunnel pcpu ipsec reaching
line rate on 100G ENA nics.
8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao.
9) BPF file verification via fsverity, from Song Liu.
It allows BPF progs get fsverity digest.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits)
bpf: Ensure precise is reset to false in __mark_reg_const_zero()
selftests/bpf: Add more uprobe multi fail tests
bpf: Fail uprobe multi link with negative offset
selftests/bpf: Test the release of map btf
s390/bpf: Fix indirect trampoline generation
selftests/bpf: Temporarily disable dummy_struct_ops test on s390
x86/cfi,bpf: Fix bpf_exception_cb() signature
bpf: Fix dtor CFI
cfi: Add CFI_NOSEAL()
x86/cfi,bpf: Fix bpf_struct_ops CFI
x86/cfi,bpf: Fix bpf_callback_t CFI
x86/cfi,bpf: Fix BPF JIT call
cfi: Flip headers
selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment
selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test
selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment
bpf: Limit the number of kprobes when attaching program to multiple kprobes
bpf: Limit the number of uprobes when attaching program to multiple uprobes
bpf: xdp: Register generic_kfunc_set with XDP programs
selftests/bpf: utilize string values for delegate_xxx mount options
...
====================
Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
415 lines
10 KiB
YAML
415 lines
10 KiB
YAML
# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
|
|
|
|
name: netdev
|
|
|
|
doc:
|
|
netdev configuration over generic netlink.
|
|
|
|
definitions:
|
|
-
|
|
type: flags
|
|
name: xdp-act
|
|
render-max: true
|
|
entries:
|
|
-
|
|
name: basic
|
|
doc:
|
|
XDP features set supported by all drivers
|
|
(XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX)
|
|
-
|
|
name: redirect
|
|
doc:
|
|
The netdev supports XDP_REDIRECT
|
|
-
|
|
name: ndo-xmit
|
|
doc:
|
|
This feature informs if netdev implements ndo_xdp_xmit callback.
|
|
-
|
|
name: xsk-zerocopy
|
|
doc:
|
|
This feature informs if netdev supports AF_XDP in zero copy mode.
|
|
-
|
|
name: hw-offload
|
|
doc:
|
|
This feature informs if netdev supports XDP hw offloading.
|
|
-
|
|
name: rx-sg
|
|
doc:
|
|
This feature informs if netdev implements non-linear XDP buffer
|
|
support in the driver napi callback.
|
|
-
|
|
name: ndo-xmit-sg
|
|
doc:
|
|
This feature informs if netdev implements non-linear XDP buffer
|
|
support in ndo_xdp_xmit callback.
|
|
-
|
|
type: flags
|
|
name: xdp-rx-metadata
|
|
entries:
|
|
-
|
|
name: timestamp
|
|
doc:
|
|
Device is capable of exposing receive HW timestamp via bpf_xdp_metadata_rx_timestamp().
|
|
-
|
|
name: hash
|
|
doc:
|
|
Device is capable of exposing receive packet hash via bpf_xdp_metadata_rx_hash().
|
|
-
|
|
name: vlan-tag
|
|
doc:
|
|
Device is capable of exposing receive packet VLAN tag via bpf_xdp_metadata_rx_vlan_tag().
|
|
-
|
|
type: flags
|
|
name: xsk-flags
|
|
entries:
|
|
-
|
|
name: tx-timestamp
|
|
doc:
|
|
HW timestamping egress packets is supported by the driver.
|
|
-
|
|
name: tx-checksum
|
|
doc:
|
|
L3 checksum HW offload is supported by the driver.
|
|
-
|
|
name: queue-type
|
|
type: enum
|
|
entries: [ rx, tx ]
|
|
|
|
attribute-sets:
|
|
-
|
|
name: dev
|
|
attributes:
|
|
-
|
|
name: ifindex
|
|
doc: netdev ifindex
|
|
type: u32
|
|
checks:
|
|
min: 1
|
|
-
|
|
name: pad
|
|
type: pad
|
|
-
|
|
name: xdp-features
|
|
doc: Bitmask of enabled xdp-features.
|
|
type: u64
|
|
enum: xdp-act
|
|
-
|
|
name: xdp-zc-max-segs
|
|
doc: max fragment count supported by ZC driver
|
|
type: u32
|
|
checks:
|
|
min: 1
|
|
-
|
|
name: xdp-rx-metadata-features
|
|
doc: Bitmask of supported XDP receive metadata features.
|
|
See Documentation/networking/xdp-rx-metadata.rst for more details.
|
|
type: u64
|
|
enum: xdp-rx-metadata
|
|
-
|
|
name: xsk-features
|
|
doc: Bitmask of enabled AF_XDP features.
|
|
type: u64
|
|
enum: xsk-flags
|
|
-
|
|
name: page-pool
|
|
attributes:
|
|
-
|
|
name: id
|
|
doc: Unique ID of a Page Pool instance.
|
|
type: uint
|
|
checks:
|
|
min: 1
|
|
max: u32-max
|
|
-
|
|
name: ifindex
|
|
doc: |
|
|
ifindex of the netdev to which the pool belongs.
|
|
May be reported as 0 if the page pool was allocated for a netdev
|
|
which got destroyed already (page pools may outlast their netdevs
|
|
because they wait for all memory to be returned).
|
|
type: u32
|
|
checks:
|
|
min: 1
|
|
max: s32-max
|
|
-
|
|
name: napi-id
|
|
doc: Id of NAPI using this Page Pool instance.
|
|
type: uint
|
|
checks:
|
|
min: 1
|
|
max: u32-max
|
|
-
|
|
name: inflight
|
|
type: uint
|
|
doc: |
|
|
Number of outstanding references to this page pool (allocated
|
|
but yet to be freed pages). Allocated pages may be held in
|
|
socket receive queues, driver receive ring, page pool recycling
|
|
ring, the page pool cache, etc.
|
|
-
|
|
name: inflight-mem
|
|
type: uint
|
|
doc: |
|
|
Amount of memory held by inflight pages.
|
|
-
|
|
name: detach-time
|
|
type: uint
|
|
doc: |
|
|
Seconds in CLOCK_BOOTTIME of when Page Pool was detached by
|
|
the driver. Once detached Page Pool can no longer be used to
|
|
allocate memory.
|
|
Page Pools wait for all the memory allocated from them to be freed
|
|
before truly disappearing. "Detached" Page Pools cannot be
|
|
"re-attached", they are just waiting to disappear.
|
|
Attribute is absent if Page Pool has not been detached, and
|
|
can still be used to allocate new memory.
|
|
-
|
|
name: page-pool-info
|
|
subset-of: page-pool
|
|
attributes:
|
|
-
|
|
name: id
|
|
-
|
|
name: ifindex
|
|
-
|
|
name: page-pool-stats
|
|
doc: |
|
|
Page pool statistics, see docs for struct page_pool_stats
|
|
for information about individual statistics.
|
|
attributes:
|
|
-
|
|
name: info
|
|
doc: Page pool identifying information.
|
|
type: nest
|
|
nested-attributes: page-pool-info
|
|
-
|
|
name: alloc-fast
|
|
type: uint
|
|
value: 8 # reserve some attr ids in case we need more metadata later
|
|
-
|
|
name: alloc-slow
|
|
type: uint
|
|
-
|
|
name: alloc-slow-high-order
|
|
type: uint
|
|
-
|
|
name: alloc-empty
|
|
type: uint
|
|
-
|
|
name: alloc-refill
|
|
type: uint
|
|
-
|
|
name: alloc-waive
|
|
type: uint
|
|
-
|
|
name: recycle-cached
|
|
type: uint
|
|
-
|
|
name: recycle-cache-full
|
|
type: uint
|
|
-
|
|
name: recycle-ring
|
|
type: uint
|
|
-
|
|
name: recycle-ring-full
|
|
type: uint
|
|
-
|
|
name: recycle-released-refcnt
|
|
type: uint
|
|
|
|
-
|
|
name: napi
|
|
attributes:
|
|
-
|
|
name: ifindex
|
|
doc: ifindex of the netdevice to which NAPI instance belongs.
|
|
type: u32
|
|
checks:
|
|
min: 1
|
|
-
|
|
name: id
|
|
doc: ID of the NAPI instance.
|
|
type: u32
|
|
-
|
|
name: irq
|
|
doc: The associated interrupt vector number for the napi
|
|
type: u32
|
|
-
|
|
name: pid
|
|
doc: PID of the napi thread, if NAPI is configured to operate in
|
|
threaded mode. If NAPI is not in threaded mode (i.e. uses normal
|
|
softirq context), the attribute will be absent.
|
|
type: u32
|
|
-
|
|
name: queue
|
|
attributes:
|
|
-
|
|
name: id
|
|
doc: Queue index; most queue types are indexed like a C array, with
|
|
indexes starting at 0 and ending at queue count - 1. Queue indexes
|
|
are scoped to an interface and queue type.
|
|
type: u32
|
|
-
|
|
name: ifindex
|
|
doc: ifindex of the netdevice to which the queue belongs.
|
|
type: u32
|
|
checks:
|
|
min: 1
|
|
-
|
|
name: type
|
|
doc: Queue type as rx, tx. Each queue type defines a separate ID space.
|
|
type: u32
|
|
enum: queue-type
|
|
-
|
|
name: napi-id
|
|
doc: ID of the NAPI instance which services this queue.
|
|
type: u32
|
|
|
|
operations:
|
|
list:
|
|
-
|
|
name: dev-get
|
|
doc: Get / dump information about a netdev.
|
|
attribute-set: dev
|
|
do:
|
|
request:
|
|
attributes:
|
|
- ifindex
|
|
reply: &dev-all
|
|
attributes:
|
|
- ifindex
|
|
- xdp-features
|
|
- xdp-zc-max-segs
|
|
- xdp-rx-metadata-features
|
|
- xsk-features
|
|
dump:
|
|
reply: *dev-all
|
|
-
|
|
name: dev-add-ntf
|
|
doc: Notification about device appearing.
|
|
notify: dev-get
|
|
mcgrp: mgmt
|
|
-
|
|
name: dev-del-ntf
|
|
doc: Notification about device disappearing.
|
|
notify: dev-get
|
|
mcgrp: mgmt
|
|
-
|
|
name: dev-change-ntf
|
|
doc: Notification about device configuration being changed.
|
|
notify: dev-get
|
|
mcgrp: mgmt
|
|
-
|
|
name: page-pool-get
|
|
doc: |
|
|
Get / dump information about Page Pools.
|
|
(Only Page Pools associated with a net_device can be listed.)
|
|
attribute-set: page-pool
|
|
do:
|
|
request:
|
|
attributes:
|
|
- id
|
|
reply: &pp-reply
|
|
attributes:
|
|
- id
|
|
- ifindex
|
|
- napi-id
|
|
- inflight
|
|
- inflight-mem
|
|
- detach-time
|
|
dump:
|
|
reply: *pp-reply
|
|
config-cond: page-pool
|
|
-
|
|
name: page-pool-add-ntf
|
|
doc: Notification about page pool appearing.
|
|
notify: page-pool-get
|
|
mcgrp: page-pool
|
|
config-cond: page-pool
|
|
-
|
|
name: page-pool-del-ntf
|
|
doc: Notification about page pool disappearing.
|
|
notify: page-pool-get
|
|
mcgrp: page-pool
|
|
config-cond: page-pool
|
|
-
|
|
name: page-pool-change-ntf
|
|
doc: Notification about page pool configuration being changed.
|
|
notify: page-pool-get
|
|
mcgrp: page-pool
|
|
config-cond: page-pool
|
|
-
|
|
name: page-pool-stats-get
|
|
doc: Get page pool statistics.
|
|
attribute-set: page-pool-stats
|
|
do:
|
|
request:
|
|
attributes:
|
|
- info
|
|
reply: &pp-stats-reply
|
|
attributes:
|
|
- info
|
|
- alloc-fast
|
|
- alloc-slow
|
|
- alloc-slow-high-order
|
|
- alloc-empty
|
|
- alloc-refill
|
|
- alloc-waive
|
|
- recycle-cached
|
|
- recycle-cache-full
|
|
- recycle-ring
|
|
- recycle-ring-full
|
|
- recycle-released-refcnt
|
|
dump:
|
|
reply: *pp-stats-reply
|
|
config-cond: page-pool-stats
|
|
-
|
|
name: queue-get
|
|
doc: Get queue information from the kernel.
|
|
Only configured queues will be reported (as opposed to all available
|
|
hardware queues).
|
|
attribute-set: queue
|
|
do:
|
|
request:
|
|
attributes:
|
|
- ifindex
|
|
- type
|
|
- id
|
|
reply: &queue-get-op
|
|
attributes:
|
|
- id
|
|
- type
|
|
- napi-id
|
|
- ifindex
|
|
dump:
|
|
request:
|
|
attributes:
|
|
- ifindex
|
|
reply: *queue-get-op
|
|
-
|
|
name: napi-get
|
|
doc: Get information about NAPI instances configured on the system.
|
|
attribute-set: napi
|
|
do:
|
|
request:
|
|
attributes:
|
|
- id
|
|
reply: &napi-get-op
|
|
attributes:
|
|
- id
|
|
- ifindex
|
|
- irq
|
|
- pid
|
|
dump:
|
|
request:
|
|
attributes:
|
|
- ifindex
|
|
reply: *napi-get-op
|
|
|
|
mcast-groups:
|
|
list:
|
|
-
|
|
name: mgmt
|
|
-
|
|
name: page-pool
|