Files
linux/Documentation/netlink/specs/netdev.yaml
Jakub Kicinski c49b292d03 Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-12-18

This PR is larger than usual and contains changes in various parts
of the kernel.

The main changes are:

1) Fix kCFI bugs in BPF, from Peter Zijlstra.

End result: all forms of indirect calls from BPF into kernel
and from kernel into BPF work with CFI enabled. This allows BPF
to work with CONFIG_FINEIBT=y.

2) Introduce BPF token object, from Andrii Nakryiko.

It adds an ability to delegate a subset of BPF features from privileged
daemon (e.g., systemd) through special mount options for userns-bound
BPF FS to a trusted unprivileged application. The design accommodates
suggestions from Christian Brauner and Paul Moore.

Example:
$ sudo mkdir -p /sys/fs/bpf/token
$ sudo mount -t bpf bpffs /sys/fs/bpf/token \
             -o delegate_cmds=prog_load:MAP_CREATE \
             -o delegate_progs=kprobe \
             -o delegate_attachs=xdp

3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei.

 - Complete precision tracking support for register spills
 - Fix verification of possibly-zero-sized stack accesses
 - Fix access to uninit stack slots
 - Track aligned STACK_ZERO cases as imprecise spilled registers.
   It improves the verifier "instructions processed" metric from single
   digit to 50-60% for some programs.
 - Fix verifier retval logic

4) Support for VLAN tag in XDP hints, from Larysa Zaremba.

5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu.

End result: better memory utilization and lower I$ miss for calls to BPF
via BPF trampoline.

6) Fix race between BPF prog accessing inner map and parallel delete,
from Hou Tao.

7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu.

It allows BPF interact with IPSEC infra. The intent is to support
software RSS (via XDP) for the upcoming ipsec pcpu work.
Experiments on AWS demonstrate single tunnel pcpu ipsec reaching
line rate on 100G ENA nics.

8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao.

9) BPF file verification via fsverity, from Song Liu.

It allows BPF progs get fsverity digest.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits)
  bpf: Ensure precise is reset to false in __mark_reg_const_zero()
  selftests/bpf: Add more uprobe multi fail tests
  bpf: Fail uprobe multi link with negative offset
  selftests/bpf: Test the release of map btf
  s390/bpf: Fix indirect trampoline generation
  selftests/bpf: Temporarily disable dummy_struct_ops test on s390
  x86/cfi,bpf: Fix bpf_exception_cb() signature
  bpf: Fix dtor CFI
  cfi: Add CFI_NOSEAL()
  x86/cfi,bpf: Fix bpf_struct_ops CFI
  x86/cfi,bpf: Fix bpf_callback_t CFI
  x86/cfi,bpf: Fix BPF JIT call
  cfi: Flip headers
  selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment
  selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test
  selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment
  bpf: Limit the number of kprobes when attaching program to multiple kprobes
  bpf: Limit the number of uprobes when attaching program to multiple uprobes
  bpf: xdp: Register generic_kfunc_set with XDP programs
  selftests/bpf: utilize string values for delegate_xxx mount options
  ...
====================

Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 16:46:08 -08:00

415 lines
10 KiB
YAML

# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
name: netdev
doc:
netdev configuration over generic netlink.
definitions:
-
type: flags
name: xdp-act
render-max: true
entries:
-
name: basic
doc:
XDP features set supported by all drivers
(XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX)
-
name: redirect
doc:
The netdev supports XDP_REDIRECT
-
name: ndo-xmit
doc:
This feature informs if netdev implements ndo_xdp_xmit callback.
-
name: xsk-zerocopy
doc:
This feature informs if netdev supports AF_XDP in zero copy mode.
-
name: hw-offload
doc:
This feature informs if netdev supports XDP hw offloading.
-
name: rx-sg
doc:
This feature informs if netdev implements non-linear XDP buffer
support in the driver napi callback.
-
name: ndo-xmit-sg
doc:
This feature informs if netdev implements non-linear XDP buffer
support in ndo_xdp_xmit callback.
-
type: flags
name: xdp-rx-metadata
entries:
-
name: timestamp
doc:
Device is capable of exposing receive HW timestamp via bpf_xdp_metadata_rx_timestamp().
-
name: hash
doc:
Device is capable of exposing receive packet hash via bpf_xdp_metadata_rx_hash().
-
name: vlan-tag
doc:
Device is capable of exposing receive packet VLAN tag via bpf_xdp_metadata_rx_vlan_tag().
-
type: flags
name: xsk-flags
entries:
-
name: tx-timestamp
doc:
HW timestamping egress packets is supported by the driver.
-
name: tx-checksum
doc:
L3 checksum HW offload is supported by the driver.
-
name: queue-type
type: enum
entries: [ rx, tx ]
attribute-sets:
-
name: dev
attributes:
-
name: ifindex
doc: netdev ifindex
type: u32
checks:
min: 1
-
name: pad
type: pad
-
name: xdp-features
doc: Bitmask of enabled xdp-features.
type: u64
enum: xdp-act
-
name: xdp-zc-max-segs
doc: max fragment count supported by ZC driver
type: u32
checks:
min: 1
-
name: xdp-rx-metadata-features
doc: Bitmask of supported XDP receive metadata features.
See Documentation/networking/xdp-rx-metadata.rst for more details.
type: u64
enum: xdp-rx-metadata
-
name: xsk-features
doc: Bitmask of enabled AF_XDP features.
type: u64
enum: xsk-flags
-
name: page-pool
attributes:
-
name: id
doc: Unique ID of a Page Pool instance.
type: uint
checks:
min: 1
max: u32-max
-
name: ifindex
doc: |
ifindex of the netdev to which the pool belongs.
May be reported as 0 if the page pool was allocated for a netdev
which got destroyed already (page pools may outlast their netdevs
because they wait for all memory to be returned).
type: u32
checks:
min: 1
max: s32-max
-
name: napi-id
doc: Id of NAPI using this Page Pool instance.
type: uint
checks:
min: 1
max: u32-max
-
name: inflight
type: uint
doc: |
Number of outstanding references to this page pool (allocated
but yet to be freed pages). Allocated pages may be held in
socket receive queues, driver receive ring, page pool recycling
ring, the page pool cache, etc.
-
name: inflight-mem
type: uint
doc: |
Amount of memory held by inflight pages.
-
name: detach-time
type: uint
doc: |
Seconds in CLOCK_BOOTTIME of when Page Pool was detached by
the driver. Once detached Page Pool can no longer be used to
allocate memory.
Page Pools wait for all the memory allocated from them to be freed
before truly disappearing. "Detached" Page Pools cannot be
"re-attached", they are just waiting to disappear.
Attribute is absent if Page Pool has not been detached, and
can still be used to allocate new memory.
-
name: page-pool-info
subset-of: page-pool
attributes:
-
name: id
-
name: ifindex
-
name: page-pool-stats
doc: |
Page pool statistics, see docs for struct page_pool_stats
for information about individual statistics.
attributes:
-
name: info
doc: Page pool identifying information.
type: nest
nested-attributes: page-pool-info
-
name: alloc-fast
type: uint
value: 8 # reserve some attr ids in case we need more metadata later
-
name: alloc-slow
type: uint
-
name: alloc-slow-high-order
type: uint
-
name: alloc-empty
type: uint
-
name: alloc-refill
type: uint
-
name: alloc-waive
type: uint
-
name: recycle-cached
type: uint
-
name: recycle-cache-full
type: uint
-
name: recycle-ring
type: uint
-
name: recycle-ring-full
type: uint
-
name: recycle-released-refcnt
type: uint
-
name: napi
attributes:
-
name: ifindex
doc: ifindex of the netdevice to which NAPI instance belongs.
type: u32
checks:
min: 1
-
name: id
doc: ID of the NAPI instance.
type: u32
-
name: irq
doc: The associated interrupt vector number for the napi
type: u32
-
name: pid
doc: PID of the napi thread, if NAPI is configured to operate in
threaded mode. If NAPI is not in threaded mode (i.e. uses normal
softirq context), the attribute will be absent.
type: u32
-
name: queue
attributes:
-
name: id
doc: Queue index; most queue types are indexed like a C array, with
indexes starting at 0 and ending at queue count - 1. Queue indexes
are scoped to an interface and queue type.
type: u32
-
name: ifindex
doc: ifindex of the netdevice to which the queue belongs.
type: u32
checks:
min: 1
-
name: type
doc: Queue type as rx, tx. Each queue type defines a separate ID space.
type: u32
enum: queue-type
-
name: napi-id
doc: ID of the NAPI instance which services this queue.
type: u32
operations:
list:
-
name: dev-get
doc: Get / dump information about a netdev.
attribute-set: dev
do:
request:
attributes:
- ifindex
reply: &dev-all
attributes:
- ifindex
- xdp-features
- xdp-zc-max-segs
- xdp-rx-metadata-features
- xsk-features
dump:
reply: *dev-all
-
name: dev-add-ntf
doc: Notification about device appearing.
notify: dev-get
mcgrp: mgmt
-
name: dev-del-ntf
doc: Notification about device disappearing.
notify: dev-get
mcgrp: mgmt
-
name: dev-change-ntf
doc: Notification about device configuration being changed.
notify: dev-get
mcgrp: mgmt
-
name: page-pool-get
doc: |
Get / dump information about Page Pools.
(Only Page Pools associated with a net_device can be listed.)
attribute-set: page-pool
do:
request:
attributes:
- id
reply: &pp-reply
attributes:
- id
- ifindex
- napi-id
- inflight
- inflight-mem
- detach-time
dump:
reply: *pp-reply
config-cond: page-pool
-
name: page-pool-add-ntf
doc: Notification about page pool appearing.
notify: page-pool-get
mcgrp: page-pool
config-cond: page-pool
-
name: page-pool-del-ntf
doc: Notification about page pool disappearing.
notify: page-pool-get
mcgrp: page-pool
config-cond: page-pool
-
name: page-pool-change-ntf
doc: Notification about page pool configuration being changed.
notify: page-pool-get
mcgrp: page-pool
config-cond: page-pool
-
name: page-pool-stats-get
doc: Get page pool statistics.
attribute-set: page-pool-stats
do:
request:
attributes:
- info
reply: &pp-stats-reply
attributes:
- info
- alloc-fast
- alloc-slow
- alloc-slow-high-order
- alloc-empty
- alloc-refill
- alloc-waive
- recycle-cached
- recycle-cache-full
- recycle-ring
- recycle-ring-full
- recycle-released-refcnt
dump:
reply: *pp-stats-reply
config-cond: page-pool-stats
-
name: queue-get
doc: Get queue information from the kernel.
Only configured queues will be reported (as opposed to all available
hardware queues).
attribute-set: queue
do:
request:
attributes:
- ifindex
- type
- id
reply: &queue-get-op
attributes:
- id
- type
- napi-id
- ifindex
dump:
request:
attributes:
- ifindex
reply: *queue-get-op
-
name: napi-get
doc: Get information about NAPI instances configured on the system.
attribute-set: napi
do:
request:
attributes:
- id
reply: &napi-get-op
attributes:
- id
- ifindex
- irq
- pid
dump:
request:
attributes:
- ifindex
reply: *napi-get-op
mcast-groups:
list:
-
name: mgmt
-
name: page-pool