Conversion of the driver to gpiod API done in 468ba54bd6 ("fec:
convert to gpio descriptor") incorrectly made reset line mandatory and
resulted in aborting driver probe in cases where reset line was not
specified (note: this way of specifying PHY reset line is actually
deprecated).
Switch to using devm_gpiod_get_optional() and skip manipulating reset
line if it can not be located.
Fixes: 468ba54bd6 ("fec: convert to gpio descriptor")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230201215320.528319-1-dmitry.torokhov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and netfilter.
Current release - regressions:
- phy: fix null-deref in phy_attach_direct
- mac802154: fix possible double free upon parsing error
Previous releases - regressions:
- bpf: preserve reg parent/live fields when copying range info,
prevent mis-verification of programs as safe
- ip6: fix GRE tunnels not generating IPv6 link local addresses
- phy: dp83822: fix null-deref on DP83825/DP83826 devices
- sctp: do not check hb_timer.expires when resetting hb_timer
- eth: mtk_sock: fix SGMII configuration after phylink conversion
Previous releases - always broken:
- eth: xdp: execute xdp_do_flush() before napi_complete_done()
- skb: do not mix page pool and page referenced frags in GRO
- bpf:
- fix a possible task gone issue with bpf_send_signal[_thread]()
- fix an off-by-one bug in bpf_mem_cache_idx() to select the right
cache
- add missing btf_put to register_btf_id_dtor_kfuncs
- sockmap: fon't let sock_map_{close,destroy,unhash} call itself
- gso: fix null-deref in skb_segment_list()
- mctp: purge receive queues on sk destruction
- fix UaF caused by accept on already connected socket in exotic
socket families
- tls: don't treat list head as an entry in tls_is_tx_ready()
- netfilter: br_netfilter: disable sabotage_in hook after first
suppression
- wwan: t7xx: fix runtime PM implementation
Misc:
- MAINTAINERS: spring cleanup of networking maintainers"
* tag 'net-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
mtk_sgmii: enable PCS polling to allow SFP work
net: mediatek: sgmii: fix duplex configuration
net: mediatek: sgmii: ensure the SGMII PHY is powered down on configuration
MAINTAINERS: update SCTP maintainers
MAINTAINERS: ipv6: retire Hideaki Yoshifuji
mailmap: add John Crispin's entry
MAINTAINERS: bonding: move Veaceslav Falico to CREDITS
net: openvswitch: fix flow memory leak in ovs_flow_cmd_new
net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC
virtio-net: Keep stop() to follow mirror sequence of open()
selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking
selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs
selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided
selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning
can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq
can: isotp: split tx timer into transmission and timeout
can: isotp: handle wait_event_interruptible() return values
can: raw: fix CAN FD frame transmissions over CAN XL devices
can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate
hv_netvsc: Fix missed pagebuf entries in netvsc_dma_map/unmap()
...
Pull KUnit fixes from Shuah Khan:
"Three fixes to bugs that cause kernel crash, link error during build,
and a third to fix kunit_test_init_section_suites() extra indirection
issue"
* tag 'linux-kselftest-kunit-fixes-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: fix kunit_test_init_section_suites(...)
kunit: fix bug in KUNIT_EXPECT_MEMEQ
kunit: Export kunit_running()
Pull ARM SoC fixes from Arnd Bergmann:
"The majority of bugfixes is once more for the NXP i.MX platform,
addressing issue with i.MX8M (UART, watchdog and ethernet) as well as
imx8dxl power button and the USB modem on an imx7 board.
The reason that i.MX always shows up here is obviously not that they
are more buggy than the others, but they have the most boards and are
good about getting fixes in quickly.
The other DT fixes are for the Nuvoton wpcm450 flash controller and
the i2c mux on an ASpeed board.
Lastly, there are updates to the MAINTAINERS entries for Mediatek,
AMD/Seattle and NXP SoCs, as well as a lone code fix for error
handling in the allwinner 'rsb' bus driver"
* tag 'soc-fixes-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: dts: wpcm450: Add nuvoton,shm = <&shm> to FIU node
MAINTAINERS: Update entry for MediaTek SoC support
MAINTAINERS: amd: drop inactive Brijesh Singh
ARM: dts: imx7d-smegw01: Fix USB host over-current polarity
arm64: dts: imx8mm-verdin: Do not power down eth-phy
MAINTAINERS: match freescale ARM64 DT directory in i.MX entry
arm64: dts: imx8mm: Fix pad control for UART1_DTE_RX
ARM: dts: aspeed: Fix pca9849 compatible
arm64: dts: freescale: imx8dxl: fix sc_pwrkey's property name linux,keycode
arm64: dts: imx8m-venice: Remove incorrect 'uart-has-rtscts'
arm64: dts: imx8mm: Reinstate GPIO watchdog always-running property on eDM SBC
bus: sunxi-rsb: Fix error handling in sunxi_rsb_init()
Pull s390 fixes from Heiko Carstens:
- With CONFIG_VMAP_STACK enabled it is not possible to load the s390
specific diag288_wdt watchdog module. The reason is that a pointer to
a string is passed to an inline assembly; this string however is
located on the stack, while the instruction within the inline
assembly expects a physicial address. Fix this by copying the string
to a kmalloc'ed buffer.
- The diag288_wdt watchdog module does not indicate that it accesses
memory from an inline assembly, which it does. Add "memory" to the
clobber list to prevent the compiler from optimizing code incorrectly
away.
- Pass size of the uncompressed kernel image to __decompress() call.
Otherwise the kernel image decompressor may corrupt/overwrite an
initrd. This was reported to happen on s390 after commit 2aa14b1ab2
("zstd: import usptream v1.5.2").
* tag 's390-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/decompressor: specify __decompress() buf len to avoid overflow
watchdog: diag288_wdt: fix __diag288() inline assembly
watchdog: diag288_wdt: do not use stack buffers for hardware data
Pull x86 platform driver fixes from Hans de Goede:
"A set of AMD PMF fixes + a few other small fixes"
* tag 'platform-drivers-x86-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: touchscreen_dmi: Add Chuwi Vi8 (CWI501) DMI match
platform/x86: thinkpad_acpi: Fix thinklight LED brightness returning 255
platform/x86/amd: pmc: add CONFIG_SERIO dependency
platform/x86/amd/pmf: Ensure mutexes are initialized before use
platform/x86/amd/pmf: Fix to update SPS thermals when power supply change
platform/x86/amd/pmf: Fix to update SPS default pprof thermals
platform/x86/amd/pmf: update to auto-mode limits only after AMT event
platform/x86/amd/pmf: Add helper routine to check pprof is balanced
platform/x86/amd/pmf: Add helper routine to update SPS thermals
Bjørn Mork says:
====================
Fix mtk_eth_soc sgmii configuration.
This has been tested on a MT7986 with a Maxlinear GPY211C phy
permanently attached to the second SoC mac.
====================
Link: https://lore.kernel.org/r/20230201182331.943411-1-bjorn@mork.no
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently there is no IRQ handling (even the SGMII supports it).
Enable polling to support SFP ports.
Fixes: 14a44ab033 ("net: mtk_eth_soc: partially convert to phylink_pcs")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
[ bmork: changed "1" => "true" ]
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The logic of the duplex bit is inverted. Setting it means half
duplex, not full duplex.
Fix and rename macro to avoid confusion.
Fixes: 7e53837269 ("net: ethernet: mediatek: Re-add support SGMII")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The code expect the PHY to be in power down which is only true after reset.
Allow changes of the SGMII parameters more than once.
Only power down when reconfiguring to avoid bouncing the link when there's
no reason to - based on code from Russell King.
There are cases when the SGMII_PHYA_PWD register contains 0x9 which
prevents SGMII from working. The SGMII still shows link but no traffic
can flow. Writing 0x0 to the PHYA_PWD register fix the issue. 0x0 was
taken from a good working state of the SGMII interface.
Fixes: 42c03844e9 ("net-next: mediatek: add support for MediaTek MT7622 SoC")
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
[ bmork: rebased and squashed into one patch ]
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde says:
====================
can 2023-02-02
The first patch is by Ziyang Xuan and removes a errant WARN_ON_ONCE()
in the CAN J1939 protocol.
The next 3 patches are by Oliver Hartkopp. The first 2 target the CAN
ISO-TP protocol and fix the state machine with respect to signals and
a regression found by the syzbot.
The last patch is by me an missing assignment during the ethtool ring
configuration callback.
* tag 'linux-can-fixes-for-6.2-20230202' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq
can: isotp: split tx timer into transmission and timeout
can: isotp: handle wait_event_interruptible() return values
can: raw: fix CAN FD frame transmissions over CAN XL devices
can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate
====================
Link: https://lore.kernel.org/r/20230202094135.2293939-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski says:
====================
MAINTAINERS: spring refresh of networking maintainers
Use Jon Corbet's script for generating statistics about maintainer
coverage to identify inactive maintainers of relatively active code.
Move them to CREDITS.
====================
Link: https://lore.kernel.org/r/20230201182014.2362044-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We very rarely hear from Hideaki Yoshifuji and the IPv4/IPv6
entry covers a lot of code. Asking people to CC someone who
rarely responds feels wrong.
Note that Hideaki Yoshifuji already has an entry in CREDITS
for IPv6 so not adding another one.
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cited commit in fixes tag frees rxq xdp info while RQ NAPI is
still enabled and packet processing may be ongoing.
Follow the mirror sequence of open() in the stop() callback.
This ensures that when rxq info is unregistered, no rx
packet processing is ongoing.
Fixes: 754b8a21a9 ("virtio_net: setup xdp_rxq_info")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://lore.kernel.org/r/20230202163516.12559-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Raju Rangoju says:
====================
amd-xgbe: add support for 2.5GbE and rx-adaptation
This patch series adds support for 2.GbE in 10GBaseT mode and
rx-adaptation support for Yellow Carp devices.
1) Support for 2.5GbE:
Add the necessary changes to the driver to fully recognize and enable
2.5GbE speed in 10GBaseT mode.
2) Support for rx-adaptation:
In order to support the 10G backplane mode without Auto-negotiation
and to support the longer-length DAC cables, it requires PHY to
perform RX Adaptation sequence as mentioned in the Synopsys databook.
Add the necessary changes to Yellow Carp devices to ensure seamless
RX Adaptation for 10G-SFI (LONG DAC), and 10G-KR modes without
Auto-Negotiation (CL72 not present)
====================
Link: https://lore.kernel.org/r/20230201054932.212700-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The existing implementation for non-Autonegotiation 10G speed modes does
not enable RX adaptation in the Driver and FW. The RX Equalization
settings (AFE settings alone) are manually configured and the existing
link-up sequence in the driver does not perform rx adaptation process as
mentioned in the Synopsys databook. There's a customer request for 10G
backplane mode without Auto-negotiation and for the DAC cables of more
significant length that follow the non-Autonegotiation mode. These modes
require PHY to perform RX Adaptation.
The proposed logic adds the necessary changes to Yellow Carp devices to
ensure seamless RX Adaptation for 10G-SFI (LONG DAC) and 10G-KR without
AN (CL72 not present). The RX adaptation core algorithm is executed by
firmware, however, to achieve that a new mailbox sub-command is required
to be sent by the driver.
Co-developed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test tool can check that the zerocopy number of completions value is
valid taking into consideration the number of datagram send calls. This can
catch the system into a state where the datagrams are still in the system
(for example in a qdisk, waiting for the network interface to return a
completion notification, etc).
This change adds a retry logic of computing the number of completions up to
a configurable (via CLI) timeout (default: 2 seconds).
Fixes: 79ebc3c260 ("net/udpgso_bench_tx: options to exercise TX CMSG")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-4-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
"udpgro_bench.sh" invokes udpgso_bench_rx/udpgso_bench_tx programs
subsequently and while doing so, there is a chance that the rx one is not
ready to accept socket connections. This racing bug could fail the test
with at least one of the following:
./udpgso_bench_tx: connect: Connection refused
./udpgso_bench_tx: sendmsg: Connection refused
./udpgso_bench_tx: write: Connection refused
This change addresses this by making udpgro_bench.sh wait for the rx
program to be ready before firing off the tx one - up to a 10s timeout.
Fixes: 3a687bef14 ("selftests: udp gso benchmark")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-3-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This change fixes the following compiler warning:
/usr/include/x86_64-linux-gnu/bits/error.h:40:5: warning: ‘gso_size’ may
be used uninitialized [-Wmaybe-uninitialized]
40 | __error_noreturn (__status, __errnum, __format,
__va_arg_pack ());
|
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
udpgso_bench_rx.c: In function ‘main’:
udpgso_bench_rx.c:253:23: note: ‘gso_size’ was declared here
253 | int ret, len, gso_size, budget = 256;
Fixes: 3327a9c463 ("selftests: add functionals test for UDP GRO")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-1-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pedro Tammela says:
====================
net/sched: transition act_pedit to rcu and percpu stats
The software pedit action didn't get the same love as some of the
other actions and it's still using spinlocks and shared stats.
Therefore, transition the action to rcu and percpu stats which
improves the action's performance.
We test this change with a very simple packet forwarding setup:
tc filter add dev ens2f0 ingress protocol ip matchall \
action pedit ex munge eth src set b8:ce:f6:4b:68:35 pipe \
action pedit ex munge eth dst set ac:1f:6b:e4:ff:93 pipe \
action mirred egress redirect dev ens2f1
tc filter add dev ens2f1 ingress protocol ip matchall \
action pedit ex munge eth src set b8:ce:f6:4b:68:34 pipe \
action pedit ex munge eth dst set ac:1f:6b:e4:ff:92 pipe \
action mirred egress redirect dev ens2f0
Using TRex with a http-like profile, in our setup with a 25G NIC
and a 26 cores Intel CPU, we observe the following in perf:
before:
11.59% 2.30% [kernel] [k] tcf_pedit_act
2.55% tcf_pedit_act
8.38% _raw_spin_lock
6.43% native_queued_spin_lock_slowpath
after:
1.46% 1.46% [kernel] [k] tcf_pedit_act
tdc results for pedit after the patch:
1..69
ok 1 319a - Add pedit action that mangles IP TTL
ok 2 7e67 - Replace pedit action with invalid goto chain
ok 3 377e - Add pedit action with RAW_OP offset u32
ok 4 a0ca - Add pedit action with RAW_OP offset u32 (INVALID)
ok 5 dd8a - Add pedit action with RAW_OP offset u16 u16
ok 6 53db - Add pedit action with RAW_OP offset u16 (INVALID)
ok 7 5c7e - Add pedit action with RAW_OP offset u8 add value
ok 8 2893 - Add pedit action with RAW_OP offset u8 quad
ok 9 3a07 - Add pedit action with RAW_OP offset u8-u16-u8
ok 10 ab0f - Add pedit action with RAW_OP offset u16-u8-u8
ok 11 9d12 - Add pedit action with RAW_OP offset u32 set u16 clear u8 invert
ok 12 ebfa - Add pedit action with RAW_OP offset overflow u32 (INVALID)
ok 13 f512 - Add pedit action with RAW_OP offset u16 at offmask shift set
ok 14 c2cb - Add pedit action with RAW_OP offset u32 retain value
ok 15 1762 - Add pedit action with RAW_OP offset u8 clear value
ok 16 bcee - Add pedit action with RAW_OP offset u8 retain value
ok 17 e89f - Add pedit action with RAW_OP offset u16 retain value
ok 18 c282 - Add pedit action with RAW_OP offset u32 clear value
ok 19 c422 - Add pedit action with RAW_OP offset u16 invert value
ok 20 d3d3 - Add pedit action with RAW_OP offset u32 invert value
ok 21 57e5 - Add pedit action with RAW_OP offset u8 preserve value
ok 22 99e0 - Add pedit action with RAW_OP offset u16 preserve value
ok 23 1892 - Add pedit action with RAW_OP offset u32 preserve value
ok 24 4b60 - Add pedit action with RAW_OP negative offset u16/u32 set value
ok 25 a5a7 - Add pedit action with LAYERED_OP eth set src
ok 26 86d4 - Add pedit action with LAYERED_OP eth set src & dst
ok 27 f8a9 - Add pedit action with LAYERED_OP eth set dst
ok 28 c715 - Add pedit action with LAYERED_OP eth set src (INVALID)
ok 29 8131 - Add pedit action with LAYERED_OP eth set dst (INVALID)
ok 30 ba22 - Add pedit action with LAYERED_OP eth type set/clear sequence
ok 31 dec4 - Add pedit action with LAYERED_OP eth set type (INVALID)
ok 32 ab06 - Add pedit action with LAYERED_OP eth add type
ok 33 918d - Add pedit action with LAYERED_OP eth invert src
ok 34 a8d4 - Add pedit action with LAYERED_OP eth invert dst
ok 35 ee13 - Add pedit action with LAYERED_OP eth invert type
ok 36 7588 - Add pedit action with LAYERED_OP ip set src
ok 37 0fa7 - Add pedit action with LAYERED_OP ip set dst
ok 38 5810 - Add pedit action with LAYERED_OP ip set src & dst
ok 39 1092 - Add pedit action with LAYERED_OP ip set ihl & dsfield
ok 40 02d8 - Add pedit action with LAYERED_OP ip set ttl & protocol
ok 41 3e2d - Add pedit action with LAYERED_OP ip set ttl (INVALID)
ok 42 31ae - Add pedit action with LAYERED_OP ip ttl clear/set
ok 43 486f - Add pedit action with LAYERED_OP ip set duplicate fields
ok 44 e790 - Add pedit action with LAYERED_OP ip set ce, df, mf, firstfrag, nofrag fields
ok 45 cc8a - Add pedit action with LAYERED_OP ip set tos
ok 46 7a17 - Add pedit action with LAYERED_OP ip set precedence
ok 47 c3b6 - Add pedit action with LAYERED_OP ip add tos
ok 48 43d3 - Add pedit action with LAYERED_OP ip add precedence
ok 49 438e - Add pedit action with LAYERED_OP ip clear tos
ok 50 6b1b - Add pedit action with LAYERED_OP ip clear precedence
ok 51 824a - Add pedit action with LAYERED_OP ip invert tos
ok 52 106f - Add pedit action with LAYERED_OP ip invert precedence
ok 53 6829 - Add pedit action with LAYERED_OP beyond ip set dport & sport
ok 54 afd8 - Add pedit action with LAYERED_OP beyond ip set icmp_type & icmp_code
ok 55 3143 - Add pedit action with LAYERED_OP beyond ip set dport (INVALID)
ok 56 815c - Add pedit action with LAYERED_OP ip6 set src
ok 57 4dae - Add pedit action with LAYERED_OP ip6 set dst
ok 58 fc1f - Add pedit action with LAYERED_OP ip6 set src & dst
ok 59 6d34 - Add pedit action with LAYERED_OP ip6 dst retain value (INVALID)
ok 60 94bb - Add pedit action with LAYERED_OP ip6 traffic_class
ok 61 6f5e - Add pedit action with LAYERED_OP ip6 flow_lbl
ok 62 6795 - Add pedit action with LAYERED_OP ip6 set payload_len, nexthdr, hoplimit
ok 63 1442 - Add pedit action with LAYERED_OP tcp set dport & sport
ok 64 b7ac - Add pedit action with LAYERED_OP tcp sport set (INVALID)
ok 65 cfcc - Add pedit action with LAYERED_OP tcp flags set
ok 66 3bc4 - Add pedit action with LAYERED_OP tcp set dport, sport & flags fields
ok 67 f1c8 - Add pedit action with LAYERED_OP udp set dport & sport
ok 68 d784 - Add pedit action with mixed RAW/LAYERED_OP #1
ok 69 70ca - Add pedit action with mixed RAW/LAYERED_OP #2
====================
Link: https://lore.kernel.org/r/20230131190512.3805897-1-pctammela@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The software pedit action didn't get the same love as some of the
other actions and it's still using spinlocks and shared stats in the
datapath.
Transition the action to rcu and percpu stats as this improves the
action's performance dramatically on multiple cpu deployments.
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
David Howells says:
====================
Here's the fifth part of patches in the process of moving rxrpc from doing
a lot of its stuff in softirq context to doing it in an I/O thread in
process context and thereby making it easier to support a larger SACK
table.
The full description is in the description for the first part[1] which is
now upstream. The second and third parts are also upstream[2]. A subset
of the original fourth part[3] got applied as a fix for a race[4].
The fifth part includes some cleanups:
(1) Miscellaneous trace header cleanups: fix a trace string, display the
security index in rx_packet rather than displaying the type twice,
remove some whitespace to make checkpatch happier and remove some
excess tabulation.
(2) Convert ->recvmsg_lock to a spinlock as it's only ever locked
exclusively.
(3) Make ->ackr_window and ->ackr_nr_unacked non-atomic as they're only
used in the I/O thread.
(4) Don't use call->tx_lock to access ->tx_buffer as that is only accessed
inside the I/O thread. sendmsg() loads onto ->tx_sendmsg and the I/O
thread decants from that to the buffer.
(5) Remove local->defrag_sem as DATA packets are transmitted serially by
the I/O thread.
(6) Remove the service connection bundle is it was only used for its
channel_lock - which has now gone.
And some more significant changes:
(7) Add a debugging option to allow a delay to be injected into packet
reception to help investigate the behaviour over longer links than
just a few cm.
(8) Generate occasional PING ACKs to probe for RTT information during a
receive heavy call.
(9) Simplify the SACK table maintenance and ACK generation. Now that both
parts are done in the same thread, there's no possibility of a race
and no need to try and be cunning to avoid taking a BH spinlock whilst
copying the SACK table (which in the future will be up to 2K) and no
need to rotate the copy to fit the ACK packet table.
(10) Use SKB_CONSUMED when freeing received DATA packets (stop dropwatch
complaining).
* tag 'rxrpc-next-20230131' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
rxrpc: Kill service bundle
rxrpc: Change rx_packet tracepoint to display securityIndex not type twice
rxrpc: Show consumed and freed packets as non-dropped in dropwatch
rxrpc: Remove local->defrag_sem
rxrpc: Don't lock call->tx_lock to access call->tx_buffer
rxrpc: Simplify ACK handling
rxrpc: De-atomic call->ackr_window and call->ackr_nr_unacked
rxrpc: Generate extra pings for RTT during heavy-receive call
rxrpc: Allow a delay to be injected into packet reception
rxrpc: Convert call->recvmsg_lock to a spinlock
rxrpc: Shrink the tabulation in the rxrpc trace header a bit
rxrpc: Remove whitespace before ')' in trace header
rxrpc: Fix trace string
====================
Link: https://lore.kernel.org/all/20230131171227.3912130-1-dhowells@redhat.com/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The timer for the transmission of isotp PDUs formerly had two functions:
1. send two consecutive frames with a given time gap
2. monitor the timeouts for flow control frames and the echo frames
This led to larger txstate checks and potentially to a problem discovered
by syzbot which enabled the panic_on_warn feature while testing.
The former 'txtimer' function is split into 'txfrtimer' and 'txtimer'
to handle the two above functionalities with separate timer callbacks.
The two simplified timers now run in one-shot mode and make the state
transitions (especially with isotp_rcv_echo) better understandable.
Fixes: 866337865f ("can: isotp: fix tx state handling for echo tx processing")
Reported-by: syzbot+5aed6c3aaba661f5b917@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org # >= v6.0
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230104145701.2422-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The conclusion "j1939_session_deactivate() should be called with a
session ref-count of at least 2" is incorrect. In some concurrent
scenarios, j1939_session_deactivate can be called with the session
ref-count less than 2. But there is not any problem because it
will check the session active state before session putting in
j1939_session_deactivate_locked().
Here is the concurrent scenario of the problem reported by syzbot
and my reproduction log.
cpu0 cpu1
j1939_xtp_rx_eoma
j1939_xtp_rx_abort_one
j1939_session_get_by_addr [kref == 2]
j1939_session_get_by_addr [kref == 3]
j1939_session_deactivate [kref == 2]
j1939_session_put [kref == 1]
j1939_session_completed
j1939_session_deactivate
WARN_ON_ONCE(kref < 2)
=====================================================
WARNING: CPU: 1 PID: 21 at net/can/j1939/transport.c:1088 j1939_session_deactivate+0x5f/0x70
CPU: 1 PID: 21 Comm: ksoftirqd/1 Not tainted 5.14.0-rc7+ #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
RIP: 0010:j1939_session_deactivate+0x5f/0x70
Call Trace:
j1939_session_deactivate_activate_next+0x11/0x28
j1939_xtp_rx_eoma+0x12a/0x180
j1939_tp_recv+0x4a2/0x510
j1939_can_recv+0x226/0x380
can_rcv_filter+0xf8/0x220
can_receive+0x102/0x220
? process_backlog+0xf0/0x2c0
can_rcv+0x53/0xf0
__netif_receive_skb_one_core+0x67/0x90
? process_backlog+0x97/0x2c0
__netif_receive_skb+0x22/0x80
Fixes: 0c71437dd5 ("can: j1939: j1939_session_deactivate(): clarify lifetime of session object")
Reported-by: syzbot+9981a614060dcee6eeca@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20210906094200.95868-1-william.xuanziyang@huawei.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
netvsc_dma_map() and netvsc_dma_unmap() currently check the cp_partial
flag and adjust the page_count so that pagebuf entries for the RNDIS
portion of the message are skipped when it has already been copied into
a send buffer. But this adjustment has already been made by code in
netvsc_send(). The duplicate adjustment causes some pagebuf entries to
not be mapped. In a normal VM, this doesn't break anything because the
mapping doesn’t change the PFN. But in a Confidential VM,
dma_map_single() does bounce buffering and provides a different PFN.
Failing to do the mapping causes the wrong PFN to be passed to Hyper-V,
and various errors ensue.
Fix this by removing the duplicate adjustment in netvsc_dma_map() and
netvsc_dma_unmap().
Fixes: 846da38de0 ("net: netvsc: Add Isolation VM support for netvsc driver")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1675135986-254490-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When set to zero, the neighbor sysctl proxy_delay value
does not cause an immediate reply for ARP/ND requests
as expected, it instead causes a random delay between
[0, U32_MAX). Looking at this comment from
__get_random_u32_below() explains the reason:
/*
* This function is technically undefined for ceil == 0, and in fact
* for the non-underscored constant version in the header, we build bug
* on that. But for the non-constant case, it's convenient to have that
* evaluate to being a straight call to get_random_u32(), so that
* get_random_u32_inclusive() can work over its whole range without
* undefined behavior.
*/
Added helper function that does not call get_random_u32_below()
if proxy_delay is zero and just uses the current value of
jiffies instead, causing pneigh_enqueue() to respond
immediately.
Also added definition of proxy_delay to ip-sysctl.txt since
it was missing.
Signed-off-by: Brian Haley <haleyb.dev@gmail.com>
Link: https://lore.kernel.org/r/20230130171428.367111-1-haleyb.dev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long says:
====================
net: support ipv4 big tcp
This is similar to the BIG TCP patchset added by Eric for IPv6:
https://lwn.net/Articles/895398/
Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header
doesn't have exthdrs(options) for the BIG TCP packets' length. To make
it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to
indicate this might be a BIG TCP packet and use skb->len as the real
IPv4 total length.
This will work safely, as all BIG TCP packets are GSO/GRO packets and
processed on the same host as they were created; There is no padding
in GSO/GRO packets, and skb->len - network_offset is exactly the IPv4
packet total length; Also, before implementing the feature, all those
places that may get iph tot_len from BIG TCP packets are taken care
with some new APIs:
Patch 1 adds some APIs for iph tot_len setting and getting, which are
used in all these places where IPv4 BIG TCP packets may reach in Patch
2-7, Patch 8 adds a GSO_TCP tp_status for af_packet users, and Patch 9
add new netlink attributes to make IPv4 BIG TCP independent from IPv6
BIG TCP on configuration, and Patch 10 implements this feature.
Note that the similar change as in Patch 2-6 are also needed for IPv6
BIG TCP packets, and will be addressed in another patchset.
The similar performance test is done for IPv4 BIG TCP with 25Gbit NIC
and 1.5K MTU:
No BIG TCP:
for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
168 322 337 3776.49
143 236 277 4654.67
128 258 288 4772.83
171 229 278 4645.77
175 228 243 4678.93
149 239 279 4599.86
164 234 268 4606.94
155 276 289 4235.82
180 255 268 4418.95
168 241 249 4417.82
Enable BIG TCP:
ip link set dev ens1f0np0 gro_ipv4_max_size 128000 gso_ipv4_max_size 128000
for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
161 241 252 4821.73
174 205 217 5098.28
167 208 220 5001.43
164 228 249 4883.98
150 233 249 4914.90
180 233 244 4819.66
154 208 219 5004.92
157 209 247 4999.78
160 218 246 4842.31
174 206 217 5080.99
Thanks for the feedback from Eric and David Ahern.
====================
Link: https://lore.kernel.org/r/cover.1674921359.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to Eric's IPv6 BIG TCP, this patch is to enable IPv4 BIG TCP.
Firstly, allow sk->sk_gso_max_size to be set to a value greater than
GSO_LEGACY_MAX_SIZE by not trimming gso_max_size in sk_trim_gso_size()
for IPv4 TCP sockets.
Then on TX path, set IP header tot_len to 0 when skb->len > IP_MAX_MTU
in __ip_local_out() to allow to send BIG TCP packets, and this implies
that skb->len is the length of a IPv4 packet; On RX path, use skb->len
as the length of the IPv4 packet when the IP header tot_len is 0 and
skb->len > IP_MAX_MTU in ip_rcv_core(). As the API iph_set_totlen() and
skb_ip_totlen() are used in __ip_local_out() and ip_rcv_core(), we only
need to update these APIs.
Also in GRO receive, add the check for ETH_P_IP/IPPROTO_TCP, and allows
the merged packet size >= GRO_LEGACY_MAX_SIZE in skb_gro_receive(). In
GRO complete, set IP header tot_len to 0 when the merged packet size
greater than IP_MAX_MTU in iph_set_totlen() so that it can be processed
on RX path.
Note that by checking skb_is_gso_tcp() in API iph_totlen(), it makes
this implementation safe to use iph->len == 0 indicates IPv4 BIG TCP
packets.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch introduces gso_ipv4_max_size and gro_ipv4_max_size
per device and adds netlink attributes for them, so that IPV4
BIG TCP can be guarded by a separate tunable in the next patch.
To not break the old application using "gso/gro_max_size" for
IPv4 GSO packets, this patch updates "gso/gro_ipv4_max_size"
in netif_set_gso/gro_max_size() if the new size isn't greater
than GSO_LEGACY_MAX_SIZE, so that nothing will change even if
userspace doesn't realize the new netlink attributes.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce TP_STATUS_GSO_TCP tp_status flag to tell the af_packet user
that this is a TCP GSO packet. When parsing IPv4 BIG TCP packets in
tcpdump/libpcap, it can use tp_len as the IPv4 packet len when this
flag is set, as iph tot_len is set to 0 for IPv4 BIG TCP packets.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ipvlan devices calls netif_inherit_tso_max() to get the tso_max_size/segs
from the lower device, so when lower device supports BIG TCP, the ipvlan
devices support it too. We also should consider its iph tot_len accessing.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>