linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-02 08:39:08 -04:00

Author	SHA1	Message	Date
Yujie Liu	f690ff9122	bpf/tests: Remove duplicate JSGT tests It seems unnecessary that JSGT is tested twice (one before JSGE and one after JSGE) since others are tested only once. Remove the duplicate JSGT tests. Fixes: `0bbaa02b48` ("bpf/tests: Add tests to check source register zero-extension") Signed-off-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Link: https://lore.kernel.org/bpf/20231130034018.2144963-1-yujie.liu@intel.com	2023-11-30 12:17:33 +01:00
Alexei Starovoitov	b5145153a7	Merge branch 'xsk-tx-metadata' Stanislav Fomichev says: ==================== xsk: TX metadata This series implements initial TX metadata (offloads) for AF_XDP. See patch #2 for the main implementation and mlx5/stmmac ones for the example on how to consume the metadata on the device side. Starting with two types of offloads: - request TX timestamp (and write it back into the metadata area) - request TX checksum offload Changes since v5: - preserve xsk_tx_metadata flags across tx and completion by moving them out of 'request' union (Jesper) - fix xdp_metadata checksum failure in big endian (Alexei) - add SPDX to xdp-rx-metadata.rst (Simon) v5: https://lore.kernel.org/bpf/20231102225837.1141915-1-sdf@google.com/ Performance (mlx5): I've implemented a small xskgen tool to try to saturate single tx queue: https://github.com/fomichev/xskgen/tree/master Here are the performance numbers with some analysis. 1. Baseline. Running with commit `eb62e6aef9` ("Merge branch 'bpf: Support bpf_get_func_ip helper in uprobes'"), nothing from this series: - with 1400 bytes of payload: 98 gbps, 8 mpps ./xskgen -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.189130 sec, 98.357623 gbps 8.409509 mpps - with 200 bytes of payload: 49 gbps, 23 mpps ./xskgen -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000064 packets 20960134144 bits, took 0.422235 sec, 49.640921 gbps 23.683645 mpps 2. Adding single commit that supports reserving tx_metadata_len changes nothing numbers-wise. - baseline for 1400 ./xskgen -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.189247 sec, 98.347946 gbps 8.408682 mpps - baseline for 200 ./xskgen -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 20960000000 bits, took 0.421248 sec, 49.756913 gbps 23.738985 mpps 3. Adding -M flag causes xskgen to reserve the metadata and fill it, but doesn't set XDP_TX_METADATA descriptor option. - new baseline for 1400 (with only filling the metadata) ./xskgen -M -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.188767 sec, 98.387657 gbps 8.412077 mpps - new baseline for 200 (with only filling the metadata) ./xskgen -M -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 20960000000 bits, took 0.410213 sec, 51.095407 gbps 24.377579 mpps (the numbers go sligtly up here, not really sure why, maybe some cache-related side-effects? 4. Next, I'm running the same test but with the commit that adds actual general infra to parse XDP_TX_METADATA (but no driver support). Essentially applying "xsk: add TX timestamp and TX checksum offload support" from this series. Numbers are the same. - fill metadata for 1400 ./xskgen -M -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.188430 sec, 98.415557 gbps 8.414463 mpps - fill metadata for 200 ./xskgen -M -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 20960000000 bits, took 0.411559 sec, 50.928299 gbps 24.297853 mpps - request metadata for 1400 ./xskgen -m -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.188723 sec, 98.391299 gbps 8.412389 mpps - request metadata for 200 ./xskgen -m -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000064 packets 20960134144 bits, took 0.411240 sec, 50.968131 gbps 24.316856 mpps 5. Now, for the most interesting part, I'm adding mlx5 driver support. The mpps for 200 bytes case goes down from 23 mpps to 19 mpps, but _only_ when I enable the metadata. This looks like a side effect of me pushing extra metadata pointer via mlx5e_xdpi_fifo_push. Hence, this part is wrapped into 'if (xp_tx_metadata_enabled)' to not affect the existing non-metadata use-cases. Since this is not regressing existing workloads, I'm not spending any time trying to optimize it more (and leaving it up to mlx owners to purse if they see any good way to do it). - same baseline ./xskgen -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.189434 sec, 98.332484 gbps 8.407360 mpps ./xskgen -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000128 packets 20960268288 bits, took 0.425254 sec, 49.288821 gbps 23.515659 mpps - fill metadata for 1400 ./xskgen -M -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.189528 sec, 98.324714 gbps 8.406696 mpps - fill metadata for 200 ./xskgen -M -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000128 packets 20960268288 bits, took 0.519085 sec, 40.379260 gbps 19.264914 mpps - request metadata for 1400 ./xskgen -m -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.189329 sec, 98.341165 gbps 8.408102 mpps - request metadata for 200 ./xskgen -m -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000128 packets 20960268288 bits, took 0.519929 sec, 40.313713 gbps 19.233642 mpps Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> ==================== Link: https://lore.kernel.org/r/20231127190319.1190813-1-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:41 -08:00
Stanislav Fomichev	60523115c1	selftests/bpf: Add TX side to xdp_hw_metadata When we get a packet on port 9091, we swap src/dst and send it out. At this point we also request the timestamp and checksum offloads. Checksum offload is verified by looking at the tcpdump on the other side. The tool prints pseudo-header csum and the final one it expects. The final checksum actually matches the incoming packets checksum because we only flip the src/dst and don't change the payload. Some other related changes: - switched to zerocopy mode by default; new flag can be used to force old behavior - request fixed tx_metadata_len headroom - some other small fixes (umem size, fill idx+i, etc) mvbz3:~# ./xdp_hw_metadata eth3 ... xsk_ring_cons__peek: 1 0x19546f8: rx_desc[0]->addr=80100 addr=80100 comp_addr=80100 rx_hash: 0x80B7EA8B with RSS type:0x2A rx_timestamp: 1697580171852147395 (sec:1697580171.8521) HW RX-time: 1697580171852147395 (sec:1697580171.8521), delta to User RX-time sec:0.2797 (279673.082 usec) XDP RX-time: 1697580172131699047 (sec:1697580172.1317), delta to User RX-time sec:0.0001 (121.430 usec) 0x19546f8: ping-pong with csum=3b8e (want d862) csum_start=54 csum_offset=6 0x19546f8: complete tx idx=0 addr=8 tx_timestamp: 1697580172056756493 (sec:1697580172.0568) HW TX-complete-time: 1697580172056756493 (sec:1697580172.0568), delta to User TX-complete-time sec:0.0852 (85175.537 usec) XDP RX-time: 1697580172131699047 (sec:1697580172.1317), delta to User TX-complete-time sec:0.0102 (10232.983 usec) HW RX-time: 1697580171852147395 (sec:1697580171.8521), delta to HW TX-complete-time sec:0.2046 (204609.098 usec) 0x19546f8: complete rx idx=128 addr=80100 mvbz4:~# nc -Nu -q1 ${MVBZ3_LINK_LOCAL_IP}%eth3 9091 mvbz4:~# tcpdump -vvx -i eth3 udp tcpdump: listening on eth3, link-type EN10MB (Ethernet), snapshot length 262144 bytes 12:26:09.301074 IP6 (flowlabel 0x35fa5, hlim 127, next-header UDP (17) payload length: 11) fe80::1270:fdff:fe48:1087.55807 > fe80::1270:fdff:fe48:1077.9091: [bad udp cksum 0x3b8e -> 0xde7e!] UDP, length 3 0x0000: 6003 5fa5 000b 117f fe80 0000 0000 0000 0x0010: 1270 fdff fe48 1087 fe80 0000 0000 0000 0x0020: 1270 fdff fe48 1077 d9ff 2383 000b 3b8e 0x0030: 7864 70 12:26:09.301976 IP6 (flowlabel 0x35fa5, hlim 127, next-header UDP (17) payload length: 11) fe80::1270:fdff:fe48:1077.9091 > fe80::1270:fdff:fe48:1087.55807: [udp sum ok] UDP, length 3 0x0000: 6003 5fa5 000b 117f fe80 0000 0000 0000 0x0010: 1270 fdff fe48 1077 fe80 0000 0000 0000 0x0020: 1270 fdff fe48 1087 2383 d9ff 000b de7e 0x0030: 7864 70 Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-14-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:41 -08:00
Stanislav Fomichev	12b4b7963d	selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP This is the recommended way to run AF_XDP, so let's use it in the test. Also, some unrelated changes to now blow up the log too much: - change default mode to zerocopy and add -c to use copy mode - small fixes for the flags/sizes/prints - add print_tstamp_delta to print timestamp + reference Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-13-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:41 -08:00
Stanislav Fomichev	40808a237d	selftests/bpf: Add TX side to xdp_metadata Request TX timestamp and make sure it's not empty. Request TX checksum offload (SW-only) and make sure it's resolved to the correct one. Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-12-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:41 -08:00
Stanislav Fomichev	f6642de0c3	selftests/bpf: Add csum helpers Checksum helpers will be used to calculate pseudo-header checksum in AF_XDP metadata selftests. The helpers are mirroring existing kernel ones: - csum_tcpudp_magic : IPv4 pseudo header csum - csum_ipv6_magic : IPv6 pseudo header csum - csum_fold : fold csum and do one's complement Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-11-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:41 -08:00
Stanislav Fomichev	df3ed0003e	selftests/xsk: Support tx_metadata_len Add new config field and propagate to UMEM registration setsockopt. Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-10-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:41 -08:00
Stanislav Fomichev	11614723af	xsk: Add option to calculate TX checksum in SW For XDP_COPY mode, add a UMEM option XDP_UMEM_TX_SW_CSUM to call skb_checksum_help in transmit path. Might be useful to debugging issues with real hardware. I also use this mode in the selftests. Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-9-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Stanislav Fomichev	ce59f9686e	xsk: Validate xsk_tx_metadata flags Accept only the flags that the kernel knows about to make sure we can extend this field in the future. Note that only in XDP_COPY mode we propagate the error signal back to the user (via sendmsg). For zerocopy mode we silently skip the metadata for the descriptors that have wrong flags (since we process the descriptors deep in the driver). Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-8-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Stanislav Fomichev	9620e956d5	xsk: Document tx_metadata_len layout - how to use - how to query features - pointers to the examples Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-7-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Song Yoong Siang	1347b41931	net: stmmac: Add Tx HWTS support to XDP ZC This patch enables transmit hardware timestamp support to XDP zero copy via XDP Tx metadata framework. This patchset is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel Tiger Lake platform. Below are the test steps and results. Command on DUT: sudo ./xdp_hw_metadata <interface name> sudo hwstamp_ctl -i <interface name> -t 1 -r 1 Command on Link Partner: echo -n xdp \| nc -u -q1 <destination IPv4 addr> 9091 Result: xsk_ring_cons__peek: 1 0x55bbbf08b6d0: rx_desc[2]->addr=8c100 addr=8c100 comp_addr=8c100 EoP No rx_hash err=-95 rx_timestamp: 1677762688429141540 (sec:1677762688.4291) HW RX-time: 1677762688429141540 (sec:1677762688.4291) delta to User RX-time sec:0.0003 (250.665 usec) XDP RX-time: 1677762688429375597 (sec:1677762688.4294) delta to User RX-time sec:0.0000 (16.608 usec) 0x55bbbf08b6d0: ping-pong with csum=561c (want f488) csum_start=34 csum_offset=6 0x55bbbf08b6d0: complete tx idx=2 addr=2008 tx_timestamp: 1677762688431127273 (sec:1677762688.4311) HW TX-complete-time: 1677762688431127273 (sec:1677762688.4311) delta to User TX-complete-time sec:0.0083 (8331.655 usec) XDP RX-time: 1677762688429375597 (sec:1677762688.4294) delta to User TX-complete-time sec:0.0101 (10083.331 usec) HW RX-time: 1677762688429141540 (sec:1677762688.4291) delta to HW TX-complete-time sec:0.0020 (1985.733 usec) 0x55bbbf08b6d0: complete rx idx=130 addr=8c100 Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-6-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Stanislav Fomichev	ec706a860e	net/mlx5e: Implement AF_XDP TX timestamp and checksum offload TX timestamp: - requires passing clock, not sure I'm passing the correct one (from cq->mdev), but the timestamp value looks convincing TX checksum: - looks like device does packet parsing (and doesn't accept custom start/offset), so I'm ignoring user offsets Cc: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-5-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Stanislav Fomichev	9276009d35	tools: ynl: Print xsk-features from the sample In a similar fashion we do for the other bit masks. Fix mask parsing (>= vs >) while we are it. Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231127190319.1190813-4-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Stanislav Fomichev	48eb03dd26	xsk: Add TX timestamp and TX checksum offload support This change actually defines the (initial) metadata layout that should be used by AF_XDP userspace (xsk_tx_metadata). The first field is flags which requests appropriate offloads, followed by the offload-specific fields. The supported per-device offloads are exported via netlink (new xsk-flags). The offloads themselves are still implemented in a bit of a framework-y fashion that's left from my initial kfunc attempt. I'm introducing new xsk_tx_metadata_ops which drivers are supposed to implement. The drivers are also supposed to call xsk_tx_metadata_request/xsk_tx_metadata_complete in the right places. Since xsk_tx_metadata_{request,_complete} are static inline, we don't incur any extra overhead doing indirect calls. The benefit of this scheme is as follows: - keeps all metadata layout parsing away from driver code - makes it easy to grep and see which drivers implement what - don't need any extra flags to maintain to keep track of what offloads are implemented; if the callback is implemented - the offload is supported (used by netlink reporting code) Two offloads are defined right now: 1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset 2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata area upon completion (tx_timestamp field) XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass metadata pointer). The struct is forward-compatible and can be extended in the future by appending more fields. Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Stanislav Fomichev	341ac980ea	xsk: Support tx_metadata_len For zerocopy mode, tx_desc->addr can point to an arbitrary offset and carry some TX metadata in the headroom. For copy mode, there is no way currently to populate skb metadata. Introduce new tx_metadata_len umem config option that indicates how many bytes to treat as metadata. Metadata bytes come prior to tx_desc address (same as in RX case). The size of the metadata has mostly the same constraints as XDP: - less than 256 bytes - 8-byte aligned (compared to 4-byte alignment on xdp, due to 8-byte timestamp in the completion) - non-zero This data is not interpreted in any way right now. Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231127190319.1190813-2-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Andrii Nakryiko	40d0eb0259	Merge branch 'selftests-bpf-use-pkg-config-to-determine-ld-flags' Akihiko Odaki says: ==================== selftests/bpf: Use pkg-config to determine ld flags When linking statically, libraries may require other dependencies to be included to ld flags. In particular, libelf may require libzstd. Use pkg-config to determine such dependencies. V4 -> V5: Introduced variables LIBELF_CFLAGS and LIBELF_LIBS. (Daniel Borkmann) Added patch "selftests/bpf: Choose pkg-config for the target". V3 -> V4: Added "2> /dev/null". V2 -> V3: Added missing "echo". V1 -> V2: Implemented fallback, referring to HOSTPKG_CONFIG. ==================== Link: https://lore.kernel.org/r/20231125084253.85025-1-akihiko.odaki@daynix.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2023-11-28 22:15:30 -08:00
Akihiko Odaki	8998a479fd	selftests/bpf: Use pkg-config for libelf When linking statically, libraries may require other dependencies to be included to ld flags. In particular, libelf may require libzstd. Use pkg-config to determine such dependencies. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231125084253.85025-4-akihiko.odaki@daynix.com	2023-11-28 22:15:29 -08:00
Akihiko Odaki	18f6f9de98	selftests/bpf: Override PKG_CONFIG for static builds A library may need to depend on additional archive files for static builds so pkg-config should be instructed to list them. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231125084253.85025-3-akihiko.odaki@daynix.com	2023-11-28 22:15:29 -08:00
Akihiko Odaki	2ce344b689	selftests/bpf: Choose pkg-config for the target pkg-config is used to build sign-file executable. It should use the library for the target instead of the host as it is called during tests. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231125084253.85025-2-akihiko.odaki@daynix.com	2023-11-28 22:15:29 -08:00
Andrii Nakryiko	d4e7dd4842	Merge branch 'bpf-add-link_info-support-for-uprobe-multi-link' Jiri Olsa says: ==================== bpf: Add link_info support for uprobe multi link hi, this patchset adds support to get bpf_link_info details for uprobe_multi links and adding support for bpftool link to display them. v4 changes: - move flags field up in bpf_uprobe_multi_link [Andrii] - include zero terminating byte in path_size [Andrii] - return d_path error directly [Yonghong] - use SEC(".probes") for semaphores [Yonghong] - fix ref_ctr_offsets leak in test [Yonghong] - other smaller fixes [Yonghong] thanks, jirka --- ==================== Link: https://lore.kernel.org/r/20231125193130.834322-1-jolsa@kernel.org Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2023-11-28 21:50:10 -08:00
Jiri Olsa	a7795698f8	bpftool: Add support to display uprobe_multi links Adding support to display details for uprobe_multi links, both plain: # bpftool link -p ... 24: uprobe_multi prog 126 uprobe.multi path /home/jolsa/bpf/test_progs func_cnt 3 pid 4143 offset ref_ctr_offset cookies 0xd1f88 0xf5d5a8 0xdead 0xd1f8f 0xf5d5aa 0xbeef 0xd1f96 0xf5d5ac 0xcafe and json: # bpftool link -p [{ ... },{ "id": 24, "type": "uprobe_multi", "prog_id": 126, "retprobe": false, "path": "/home/jolsa/bpf/test_progs", "func_cnt": 3, "pid": 4143, "funcs": [{ "offset": 860040, "ref_ctr_offset": 16111016, "cookie": 57005 },{ "offset": 860047, "ref_ctr_offset": 16111018, "cookie": 48879 },{ "offset": 860054, "ref_ctr_offset": 16111020, "cookie": 51966 } ] } ] Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20231125193130.834322-7-jolsa@kernel.org	2023-11-28 21:50:09 -08:00
Jiri Olsa	147c69307b	selftests/bpf: Add link_info test for uprobe_multi link Adding fill_link_info test for uprobe_multi link. Setting up uprobes with bogus ref_ctr_offsets and cookie values to test all the bpf_link_info::uprobe_multi fields. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <song@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20231125193130.834322-6-jolsa@kernel.org	2023-11-28 21:50:09 -08:00
Jiri Olsa	1703612885	selftests/bpf: Use bpf_link__destroy in fill_link_info tests The fill_link_info test keeps skeleton open and just creates various links. We are wrongly calling bpf_link__detach after each test to close them, we need to call bpf_link__destroy. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/bpf/20231125193130.834322-5-jolsa@kernel.org	2023-11-28 21:50:09 -08:00
Jiri Olsa	e56fdbfb06	bpf: Add link_info support for uprobe multi link Adding support to get uprobe_link details through bpf_link_info interface. Adding new struct uprobe_multi to struct bpf_link_info to carry the uprobe_multi link details. The uprobe_multi.count is passed from user space to denote size of array fields (offsets/ref_ctr_offsets/cookies). The actual array size is stored back to uprobe_multi.count (allowing user to find out the actual array size) and array fields are populated up to the user passed size. All the non-array fields (path/count/flags/pid) are always set. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20231125193130.834322-4-jolsa@kernel.org	2023-11-28 21:50:09 -08:00
Jiri Olsa	4930b7f53a	bpf: Store ref_ctr_offsets values in bpf_uprobe array We will need to return ref_ctr_offsets values through link_info interface in following change, so we need to keep them around. Storing ref_ctr_offsets values directly into bpf_uprobe array. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20231125193130.834322-3-jolsa@kernel.org	2023-11-28 21:50:09 -08:00
Jiri Olsa	48f0dfd8d3	libbpf: Add st_type argument to elf_resolve_syms_offsets function We need to get offsets for static variables in following changes, so making elf_resolve_syms_offsets to take st_type value as argument and passing it to elf_sym_iter_new. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20231125193130.834322-2-jolsa@kernel.org	2023-11-28 21:50:09 -08:00
Stanislav Fomichev	cf97916310	selftests/bpf: update test_offload to use new orphaned property - filter orphaned programs by default - when trying to query orphaned program, don't expect bpftool failure Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127182057.1081138-2-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-11-27 16:23:44 -08:00
Stanislav Fomichev	876843ce1e	bpftool: mark orphaned programs during prog show Commit `ef01f4e25c` ("bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD") stopped removing program's id from idr when the offloaded/bound netdev goes away. I was supposed to take a look and check in [0], but apparently I did not. Martin points out it might be useful to keep it that way for observability sake, but we at least need to mark those programs as unusable. Mark those programs as 'orphaned' and keep printing the list when we encounter ENODEV. 0: unspec tag 0000000000000000 xlated 0B not jited memlock 4096B orphaned [0]: https://lore.kernel.org/all/CAKH8qBtyR20ZWAc11z1-6pGb3Hd47AQUTbE_cfoktG59TqaJ7Q@mail.gmail.com/ v3: * use two spaces for " orphaned" (Quentin) Cc: netdev@vger.kernel.org Fixes: `ef01f4e25c` ("bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD") Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20231127182057.1081138-1-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-11-27 16:23:38 -08:00
Yonghong Song	b16904fd9f	bpf: Fix a few selftest failures due to llvm18 change With latest upstream llvm18, the following test cases failed: $ ./test_progs -j #13/2 bpf_cookie/multi_kprobe_link_api:FAIL #13/3 bpf_cookie/multi_kprobe_attach_api:FAIL #13 bpf_cookie:FAIL #77 fentry_fexit:FAIL #78/1 fentry_test/fentry:FAIL #78 fentry_test:FAIL #82/1 fexit_test/fexit:FAIL #82 fexit_test:FAIL #112/1 kprobe_multi_test/skel_api:FAIL #112/2 kprobe_multi_test/link_api_addrs:FAIL [...] #112 kprobe_multi_test:FAIL #356/17 test_global_funcs/global_func17:FAIL #356 test_global_funcs:FAIL Further analysis shows llvm upstream patch [1] is responsible for the above failures. For example, for function bpf_fentry_test7() in net/bpf/test_run.c, without [1], the asm code is: 0000000000000400 <bpf_fentry_test7>: 400: f3 0f 1e fa endbr64 404: e8 00 00 00 00 callq 0x409 <bpf_fentry_test7+0x9> 409: 48 89 f8 movq %rdi, %rax 40c: c3 retq 40d: 0f 1f 00 nopl (%rax) ... and with [1], the asm code is: 0000000000005d20 <bpf_fentry_test7.specialized.1>: 5d20: e8 00 00 00 00 callq 0x5d25 <bpf_fentry_test7.specialized.1+0x5> 5d25: c3 retq ... and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7> and this caused test failures for #13/#77 etc. except #356. For test case #356/17, with [1] (progs/test_global_func17.c)), the main prog looks like: 0000000000000000 <global_func17>: 0: b4 00 00 00 2a 00 00 00 w0 = 0x2a 1: 95 00 00 00 00 00 00 00 exit ... which passed verification while the test itself expects a verification failure. Let us add 'barrier_var' style asm code in both places to prevent function specialization which caused selftests failure. [1] https://github.com/llvm/llvm-project/pull/72903 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231127050342.1945270-1-yonghong.song@linux.dev	2023-11-27 14:53:39 +01:00
Andrii Nakryiko	e8a339b523	selftests/bpf: Add lazy global subprog validation tests Add a few test that validate BPF verifier's lazy approach to validating global subprogs. We check that global subprogs that are called transitively through another global subprog is validated. We also check that invalid global subprog is not validated, if it's not called from the main program. And we also check that main program is always validated first, before any of the subprogs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231124035937.403208-4-andrii@kernel.org	2023-11-24 10:40:06 +01:00
Andrii Nakryiko	2afae08c9d	bpf: Validate global subprogs lazily Slightly change BPF verifier logic around eagerness and order of global subprog validation. Instead of going over every global subprog eagerly and validating it before main (entry) BPF program is verified, turn it around. Validate main program first, mark subprogs that were called from main program for later verification, but otherwise assume it is valid. Afterwards, go over marked global subprogs and validate those, potentially marking some more global functions as being called. Continue this process until all (transitively) callable global subprogs are validated. It's a BFS traversal at its heart and will always converge. This is an important change because it allows to feature-gate some subprograms that might not be verifiable on some older kernel, depending on supported set of features. E.g., at some point, global functions were allowed to accept a pointer to memory, which size is identified by user-provided type. Unfortunately, older kernels don't support this feature. With BPF CO-RE approach, the natural way would be to still compile BPF object file once and guard calls to this global subprog with some CO-RE check or using .rodata variables. That's what people do to guard usage of new helpers or kfuncs, and any other new BPF-side feature that might be missing on old kernels. That's currently impossible to do with global subprogs, unfortunately, because they are eagerly and unconditionally validated. This patch set aims to change this, so that in the future when global funcs gain new features, those can be guarded using BPF CO-RE techniques in the same fashion as any other new kernel feature. Two selftests had to be adjusted in sync with these changes. test_global_func12 relied on eager global subprog validation failing before main program failure is detected (unknown return value). Fix by making sure that main program is always valid. verifier_subprog_precision's parent_stack_slot_precise subtest relied on verifier checkpointing heuristic to do a checkpoint at instruction #5, but that's no longer true because we don't have enough jumps validated before reaching insn #5 due to global subprogs being validated later. Other than that, no changes, as one would expect. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231124035937.403208-3-andrii@kernel.org	2023-11-24 10:40:06 +01:00
Andrii Nakryiko	491dd8edec	bpf: Emit global subprog name in verifier logs We have the name, instead of emitting just func#N to identify global subprog, augment verifier log messages with actual function name to make it more user-friendly. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231124035937.403208-2-andrii@kernel.org	2023-11-24 10:40:06 +01:00
Eduard Zingerman	b8d78cb2e2	libbpf: Start v1.4 development cycle Bump libbpf.map to v1.4.0 to start a new libbpf version cycle. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231123000439.12025-1-eddyz87@gmail.com	2023-11-23 22:49:41 +01:00
Jakub Kicinski	45c226dde7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/intel/ice/ice_main.c `c9663f79cd` ("ice: adjust switchdev rebuild path") `7758017911` ("ice: restore timestamp configuration after device reset") https://lore.kernel.org/all/20231121211259.3348630-1-anthony.l.nguyen@intel.com/ Adjacent changes: kernel/bpf/verifier.c `bb124da69c` ("bpf: keep track of max number of bpf_loop callback iterations") `5f99f312bd` ("bpf: add register bounds sanity checks and sanitization") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-23 12:20:58 -08:00
Linus Torvalds	d3fa86b1a7	Merge tag 'net-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf. Current release - regressions: - Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E" - kselftest: rtnetlink: fix ip route command typo Current release - new code bugs: - s390/ism: make sure ism driver implies smc protocol in kconfig - two build fixes for tools/net Previous releases - regressions: - rxrpc: couple of ACK/PING/RTT handling fixes Previous releases - always broken: - bpf: verify bpf_loop() callbacks as if they are called unknown number of times - improve stability of auto-bonding with Hyper-V - account BPF-neigh-redirected traffic in interface statistics Misc: - net: fill in some more MODULE_DESCRIPTION()s" * tag 'net-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits) tools: ynl: fix duplicate op name in devlink tools: ynl: fix header path for nfsd net: ipa: fix one GSI register field width tls: fix NULL deref on tls_sw_splice_eof() with empty record net: axienet: Fix check for partial TX checksum vsock/test: fix SEQPACKET message bounds test i40e: Fix adding unsupported cloud filters ice: restore timestamp configuration after device reset ice: unify logic for programming PFINT_TSYN_MSK ice: remove ptp_tx ring parameter flag amd-xgbe: propagate the correct speed and duplex status amd-xgbe: handle the corner-case during tx completion amd-xgbe: handle corner-case during sfp hotplug net: veth: fix ethtool stats reporting octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx queue than its PF net: usb: qmi_wwan: claim interface 4 for ZTE MF290 Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E" net/smc: avoid data corruption caused by decline nfc: virtual_ncidev: Add variable to check if ndev is running dpll: Fix potential msg memleak when genlmsg_put_reply failed ...	2023-11-23 10:40:13 -08:00
Jakub Kicinski	39f04b1406	tools: ynl: fix duplicate op name in devlink We don't support CRUD-inspired message types in YNL too well. One aspect that currently trips us up is the fact that single message ID can be used in multiple commands (as the response). This leads to duplicate entries in the id-to-string tables: devlink-user.c:19:34: warning: initialized field overwritten [-Woverride-init] 19 \| [DEVLINK_CMD_PORT_NEW] = "port-new", \| ^~~~~~~~~~ devlink-user.c:19:34: note: (near initialization for ‘devlink_op_strmap[7]’) Fixes tag points at where the code was generated, the "real" problem is that the code generator does not support CRUD. Fixes: `f2f9dd164d` ("netlink: specs: devlink: add the remaining command to generate complete split_ops") Link: https://lore.kernel.org/r/20231123030558.1611831-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-23 08:52:23 -08:00
Jakub Kicinski	2be35a6194	tools: ynl: fix header path for nfsd The makefile dependency is trying to include the wrong header: <command-line>: fatal error: ../../../../include/uapi//linux/nfsd.h: No such file or directory The guard also looks wrong. Fixes: `f14122b2c2` ("tools: ynl: Add source files for nfsd netlink protocol") Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20231123030624.1611925-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-23 08:52:12 -08:00
Alex Elder	37f0205538	net: ipa: fix one GSI register field width The width of the R_LENGTH field of the EV_CH_E_CNTXT_1 GSI register is 24 bits (not 20 bits) starting with IPA v5.0. Fix this. Fixes: `faf0678ec8` ("net: ipa: add IPA v5.0 GSI register definitions") Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20231122231708.896632-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-23 08:52:00 -08:00
Jann Horn	53f2cb491b	tls: fix NULL deref on tls_sw_splice_eof() with empty record syzkaller discovered that if tls_sw_splice_eof() is executed as part of sendfile() when the plaintext/ciphertext sk_msg are empty, the send path gets confused because the empty ciphertext buffer does not have enough space for the encryption overhead. This causes tls_push_record() to go on the `split = true` path (which is only supposed to be used when interacting with an attached BPF program), and then get further confused and hit the tls_merge_open_record() path, which then assumes that there must be at least one populated buffer element, leading to a NULL deref. It is possible to have empty plaintext/ciphertext buffers if we previously bailed from tls_sw_sendmsg_locked() via the tls_trim_both_msgs() path. tls_sw_push_pending_record() already handles this case correctly; let's do the same check in tls_sw_splice_eof(). Fixes: `df720d288d` ("tls/sw: Use splice_eof() to flush") Cc: stable@vger.kernel.org Reported-by: syzbot+40d43509a099ea756317@syzkaller.appspotmail.com Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20231122214447.675768-1-jannh@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-23 08:51:45 -08:00
Samuel Holland	fd0413bbf8	net: axienet: Fix check for partial TX checksum Due to a typo, the code checked the RX checksum feature in the TX path. Fixes: `8a3b7a252d` ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://lore.kernel.org/r/20231122004219.3504219-1-samuel.holland@sifive.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-23 08:51:11 -08:00
Arseniy Krasnov	f0863888f6	vsock/test: fix SEQPACKET message bounds test Tune message length calculation to make this test work on machines where 'getpagesize()' returns >32KB. Now maximum message length is not hardcoded (on machines above it was smaller than 'getpagesize()' return value, thus we get negative value and test fails), but calculated at runtime and always bigger than 'getpagesize()' result. Reproduced on aarch64 with 64KB page size. Fixes: `5c338112e4` ("test/vsock: rework message bounds test") Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reported-by: Bogdan Marcynkov <bmarcynk@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20231121211642.163474-1-avkrasnov@salutedevices.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-23 08:49:16 -08:00
Ivan Vecera	4e20655e50	i40e: Fix adding unsupported cloud filters If a VF tries to add unsupported cloud filter through virtchnl then i40e_add_del_cloud_filter(_big_buf) returns -ENOTSUPP but this error code is stored in 'ret' instead of 'aq_ret' that is used as error code sent back to VF. In this scenario where one of the mentioned functions fails the value of 'aq_ret' is zero so the VF will incorrectly receive a 'success'. Use 'aq_ret' to store return value and remove 'ret' local variable. Additionally fix the issue when filter allocation fails, in this case no notification is sent back to the VF. Fixes: `e284fc2804` ("i40e: Add and delete cloud filter") Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20231121211338.3348677-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-23 08:46:58 -08:00
Paolo Abeni	e50a8061fe	Merge branch 'ice-restore-timestamp-config-after-reset' Tony Nguyen says: ==================== ice: restore timestamp config after reset Jake Keller says: We recently discovered during internal validation that the ice driver has not been properly restoring Tx timestamp configuration after a device reset, which resulted in application failures after a device reset. After some digging, it turned out this problem is two-fold. Since the introduction of the PTP support the driver has been clobbering the storage of the current timestamp configuration during reset. Thus after a reset, the driver will no longer perform Tx or Rx timestamps, and will report timestamp configuration as disabled if SIOCGHWTSTAMP ioctl is issued. In addition, the recently merged auxiliary bus support code missed that PFINT_TSYN_MSK must be reprogrammed on the clock owner for E822 devices. Failure to restore this register configuration results in the driver no longer responding to interrupts from other ports. Depending on the traffic pattern, this can either result in increased latency responding to timestamps on the non-owner ports, or it can result in the driver never reporting any timestamps. The configuration of PFINT_TSYN_MSK was only done during initialization. Due to this, the Tx timestamp issue persists even if userspace reconfigures timestamping. This series fixes both issues, as well as removes a redundant Tx ring field since we can rely on the skb flag as the primary detector for a Tx timestamp request. Note that I don't think this series will directly apply to older stable releases (even v6.6) as we recently refactored a lot of the PTP code to support auxiliary bus. Patch 2/3 only matters for the post-auxiliary bus implementation. The principle of patch 1/3 and 3/3 could apply as far back as the initial PTP support, but I don't think it will apply cleanly as-is. ==================== Link: https://lore.kernel.org/r/20231121211259.3348630-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-23 15:27:35 +01:00
Jacob Keller	7758017911	ice: restore timestamp configuration after device reset The driver calls ice_ptp_cfg_timestamp() during ice_ptp_prepare_for_reset() to disable timestamping while the device is resetting. This operation destroys the user requested configuration. While the driver does call ice_ptp_cfg_timestamp in ice_rebuild() to restore some hardware settings after a reset, it unconditionally passes true or false, resulting in failure to restore previous user space configuration. This results in a device reset forcibly disabling timestamp configuration regardless of current user settings. This was not detected previously due to a quirk of the LinuxPTP ptp4l application. If ptp4l detects a missing timestamp, it enters a fault state and performs recovery logic which includes executing SIOCSHWTSTAMP again, restoring the now accidentally cleared configuration. Not every application does this, and for these applications, timestamps will mysteriously stop after a PF reset, without being restored until an application restart. Fix this by replacing ice_ptp_cfg_timestamp() with two new functions: 1) ice_ptp_disable_timestamp_mode() which unconditionally disables the timestamping logic in ice_ptp_prepare_for_reset() and ice_ptp_release() 2) ice_ptp_restore_timestamp_mode() which calls ice_ptp_restore_tx_interrupt() to restore Tx timestamping configuration, calls ice_set_rx_tstamp() to restore Rx timestamping configuration, and issues an immediate TSYN_TX interrupt to ensure that timestamps which may have occurred during the device reset get processed. Modify the ice_ptp_set_timestamp_mode to directly save the user configuration and then call ice_ptp_restore_timestamp_mode. This way, reset no longer destroys the saved user configuration. This obsoletes the ice_set_tx_tstamp() function which can now be safely removed. With this change, all devices should now restore Tx and Rx timestamping functionality correctly after a PF reset without application intervention. Fixes: `77a781155a` ("ice: enable receive hardware timestamping") Fixes: `ea9b847cda` ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-23 15:27:33 +01:00
Jacob Keller	7d606a1e2d	ice: unify logic for programming PFINT_TSYN_MSK Commit `d938a8cca8` ("ice: Auxbus devices & driver for E822 TS") modified how Tx timestamps are handled for E822 devices. On these devices, only the clock owner handles reading the Tx timestamp data from firmware. To do this, the PFINT_TSYN_MSK register is modified from the default value to one which enables reacting to a Tx timestamp on all PHY ports. The driver currently programs PFINT_TSYN_MSK in different places depending on whether the port is the clock owner or not. For the clock owner, the PFINT_TSYN_MSK value is programmed during ice_ptp_init_owner just before calling ice_ptp_tx_ena_intr to program the PHY ports. For the non-clock owner ports, the PFINT_TSYN_MSK is programmed during ice_ptp_init_port. If a large enough device reset occurs, the PFINT_TSYN_MSK register will be reset to the default value in which only the PHY associated directly with the PF will cause the Tx timestamp interrupt to trigger. The driver lacks logic to reprogram the PFINT_TSYN_MSK register after a device reset. For the E822 device, this results in the PF no longer responding to interrupts for other ports. This results in failure to deliver Tx timestamps to user space applications. Rename ice_ptp_configure_tx_tstamp to ice_ptp_cfg_tx_interrupt, and unify the logic for programming PFINT_TSYN_MSK and PFINT_OICR_ENA into one place. This function will program both registers according to the combination of user configuration and device requirements. This ensures that PFINT_TSYN_MSK is always restored when we configure the Tx timestamp interrupt. Fixes: `d938a8cca8` ("ice: Auxbus devices & driver for E822 TS") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-23 15:27:33 +01:00
Jacob Keller	0ffb08b1a4	ice: remove ptp_tx ring parameter flag Before performing a Tx timestamp in ice_stamp(), the driver checks a ptp_tx ring variable to see if timestamping is enabled on that ring. This value is set for all rings whenever userspace configures Tx timestamping. Ostensibly this was done to avoid wasting cycles checking other fields when timestamping has not been enabled. However, for Tx timestamps we already get an individual per-SKB flag indicating whether userspace wants to request a timestamp on that packet. We do not gain much by also having a separate flag to check for whether timestamping was enabled. In fact, the driver currently fails to restore the field after a PF reset. Because of this, if a PF reset occurs, timestamps will be disabled. Since this flag doesn't add value in the hotpath, remove it and always provide a timestamp if the SKB flag has been set. A following change will fix the reset path to properly restore user timestamping configuration completely. This went unnoticed for some time because one of the most common applications using Tx timestamps, ptp4l, will reconfigure the socket as part of its fault recovery logic. Fixes: `ea9b847cda` ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-23 15:27:32 +01:00
Paolo Abeni	d9775fb6d0	Merge branch 'amd-xgbe-fixes-to-handle-corner-cases' Raju Rangoju says: ==================== amd-xgbe: fixes to handle corner-cases This series include bug fixes to amd-xgbe driver. ==================== Link: https://lore.kernel.org/r/20231121191435.4049995-1-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-23 13:47:25 +01:00
Raju Rangoju	7a2323ac24	amd-xgbe: propagate the correct speed and duplex status xgbe_get_link_ksettings() does not propagate correct speed and duplex information to ethtool during cable unplug. Due to which ethtool reports incorrect values for speed and duplex. Address this by propagating correct information. Fixes: `7c12aa0877` ("amd-xgbe: Move the PHY support into amd-xgbe") Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-23 13:47:23 +01:00
Raju Rangoju	7121205d53	amd-xgbe: handle the corner-case during tx completion The existing implementation uses software logic to accumulate tx completions until the specified time (1ms) is met and then poll them. However, there exists a tiny gap which leads to a race between resetting and checking the tx_activate flag. Due to this the tx completions are not reported to upper layer and tx queue timeout kicks-in restarting the device. To address this, introduce a tx cleanup mechanism as part of the periodic maintenance process. Fixes: `c5aa9e3b81` ("amd-xgbe: Initial AMD 10GbE platform driver") Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-23 13:47:23 +01:00
Raju Rangoju	676ec53844	amd-xgbe: handle corner-case during sfp hotplug Force the mode change for SFI in Fixed PHY configurations. Fixed PHY configurations needs PLL to be enabled while doing mode set. When the SFP module isn't connected during boot, driver assumes AN is ON and attempts auto-negotiation. However, if the connected SFP comes up in Fixed PHY configuration the link will not come up as PLL isn't enabled while the initial mode set command is issued. So, force the mode change for SFI in Fixed PHY configuration to fix link issues. Fixes: `e57f7a3fea` ("amd-xgbe: Prepare for working with more than one type of phy") Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-23 13:47:23 +01:00

1 2 3 4 5 ...

1234254 Commits