Commit Graph

1428018 Commits

Author SHA1 Message Date
Jakub Kicinski
8bbcfce5db tools: ynl: add short doc to class YnlFamily
The class is quite long. It's getting hard to find the user-facing
methods. Add a short doc at the class level explaining the main API.

Link: https://patch.msgid.link/20260310005337.3594225-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:30:03 -07:00
Jakub Kicinski
c26fda6212 tools: ynl: move policy decoding out of NlMsg
We'll soon need to decode policies from dump so move _decode_policy()
out of class NlMsg.

Link: https://patch.msgid.link/20260310005337.3594225-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:30:03 -07:00
Jakub Kicinski
7b1309c339 tools: ynl: handle pad type during decode
Apparently Python code only handled the 'pad' type in structs
until now. Add it to attr decoding. nlctrl policy dumps need it.

Link: https://patch.msgid.link/20260310005337.3594225-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:30:03 -07:00
Rosen Penev
73a8643525 net: mvneta: support EPROBE_DEFER when reading MAC address
If nvmem loads after the ethernet driver, mac address assignments will
not take effect. of_get_ethdev_address returns EPROBE_DEFER in such a
case so we need to handle that to avoid eth_hw_addr_random.

Add extra goto section to just free stats as they are allocated right
above.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260307031709.640141-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:02:25 -07:00
Dimitri Daskalakis
690043b95c selftests: drv-net: rss: Add retries to test_rss_key_indir to reduce flakes
The test generates 16 flows, and verifies that traffic is distributed
across two queues via the NICs RSS indirection table. The likelihood of the
flows skewing to a single queue is high, so we retry sending traffic up to
3 times.

Alternatively, we could increase the number of generated flows. But
debug kernels may struggle to ramp this many flows.

During manual testing, the test passed for 10,000 consecutive runs.

Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20260309204215.2110486-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:01:51 -07:00
Fernando Fernandez Mancera
7da62262ec inet: add ip_local_port_step_width sysctl to improve port usage distribution
With the current port selection algorithm, ports after a reserved port
range or long time used port are used more often than others [1]. This
causes an uneven port usage distribution. This combines with cloud
environments blocking connections between the application server and the
database server if there was a previous connection with the same source
port, leading to connectivity problems between applications on cloud
environments.

The real issue here is that these firewalls cannot cope with
standards-compliant port reuse. This is a workaround for such situations
and an improvement on the distribution of ports selected.

The proposed solution is to implement a variant of RFC 6056 Algorithm 5.
The step size is selected randomly on every connect() call ensuring it
is a coprime with respect to the size of the range of ports we want to
scan. This way, we can ensure that all ports within the range are
scanned before returning an error. To enable this algorithm, the user
must configure the new sysctl option "net.ipv4.ip_local_port_step_width".

In addition, on graphs generated we can observe that the distribution of
source ports is more even with the proposed approach. [2]

[1] https://0xffsoftware.com/port_graph_current_alg.html

[2] https://0xffsoftware.com/port_graph_random_step_alg.html

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260309023946.5473-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 18:59:39 -07:00
Jakub Kicinski
ae95cbaedb Merge branch 'selftests-rds-ksft-cleanups'
Allison Henderson says:

====================
selftests: rds: ksft cleanups

This set addresses a few rds selftests clean ups and bugs encountered
when running in the ksft framework.  The first patch is a clean up
patch that addresses pylint warnings, but otherwise no functional
changes.  The next patch moves the test time out to a ksft settings
file so that the time out is set appropriately.  And lastly we fix a
tcpdump segfault caused by deprecated a os.fork() call.
====================

Link: https://patch.msgid.link/20260308055835.1338257-1-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 18:54:25 -07:00
Allison Henderson
87fdf57ded selftests: rds: Fix tcpdump segfault in rds selftests
net/rds/test.py sees a segfault in tcpdump when executed through the
ksft runner.

[   21.903713] tcpdump[1469]: segfault at 0 ip 000072100e99126d
sp 00007ffccf740fd0 error 4
[   21.903721]  in libc.so.6[16a26d,7798b149a000+188000]
[   21.905074]  in libc.so.6[16a26d,72100e84f000+188000] likely on
CPU 5 (core 5, socket 0)
[   21.905084] Code: 00 0f 85 a0 00 00 00 48 83 c4 38 89 d8 5b 41 5c
41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 05 91 8b 09 00 8b 4d ac
64 89 08 <41> 0f b6 07 83 e8 2b a8 fd 0f 84 54 ff ff ff 49 8b 36 4c 89
ff e8
[   21.906760]  likely on CPU 9 (core 9, socket 0)
[   21.913469] Code: 00 0f 85 a0 00 00 00 48 83 c4 38 89 d8 5b 41 5c 41
5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 05 91 8b 09 00 8b 4d ac 64 89
08 <41> 0f b6 07 83 e8 2b a8 fd 0f 84 54 ff ff ff 49 8b 36 4c 89 ff e8

The os.fork() call creates extra complexity because it forks the entire
process including the python interpreter.  ip() then calls cmd() which
creates a subprocess.Popen.  We can avoid the extra layering by simply
calling subprocess.Popen directly. Track the process handles directly
and terminate them at cleanup rather than relying on killall. Further
tcpdump's -Z flag attempts to change savefile ownership, which is not
supported by the 9p protocol.  Fix this by writing pcap captures to
"/tmp" during the test and move them to the log directory after tcpdump
exits.

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260308055835.1338257-4-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 18:54:24 -07:00
Allison Henderson
b873b4e160 selftests: rds: Add ksft timeout
rds/run.sh sets a timer of 400s when calling test.py.  However when
tests are run through ksft, a default 45s timer is applied.  Fix this
by adding a ksft timeout in tools/testing/selftests/net/rds/settings

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260308055835.1338257-3-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 18:54:15 -07:00
Allison Henderson
5a0c5702bd selftests: rds: Fix pylint warnings
Tidy up all exiting pylint errors in test.py.  No functional
changes are introduced in this patch

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260308055835.1338257-2-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 18:53:42 -07:00
Jakub Kicinski
b8a0e5eb6a tools: ynl: cli: order set->list conversion in JSON output
NIPA tries to make sure that HW tests don't modify system state.
It dumps some well known configs before and after the test and
compares the outputs.

Make sure that YNL json output is stable. Converting sets to lists
with a naive list(o) results in a random order.

Link: https://patch.msgid.link/20260307175916.1652518-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 17:54:02 -07:00
Jakub Kicinski
16767c72a4 Merge branch 'smc-sysctl-formatting-and-missing-entries'
Kyoji Ogasawara says:

====================
smc-sysctl formatting and missing entries

update SMC sysctl documentation in two small steps.

- patch 1 fixes indentation in the smcr_buf_type section
- patch 2 documents missing sysctl parameters limit_smc_hs and hs_ctrl,
  including values/defaults and hs_ctrl usage notes
====================

Link: https://patch.msgid.link/20260309124541.22723-1-sawara04.o@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 17:53:10 -07:00
Kyoji Ogasawara
aa5ec9d03b net/smc: Add documentation for limit_smc_hs and hs_ctrl
Document missing SMC sysctl parameters limit_smc_hs and hs_ctrl

Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com>
Reviewed-by: D. Wythe<alibuda@linux.alibaba.com>
Link: https://patch.msgid.link/20260309124541.22723-3-sawara04.o@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 17:53:03 -07:00
Kyoji Ogasawara
4a51ac9056 net/smc: fix indentation in smcr_buf_type section
smcr_buf_type section used inconsistent indentation compared
with the rest of this document.

Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com>
Reviewed-by: D. Wythe<alibuda@linux.alibaba.com>
Link: https://patch.msgid.link/20260309124541.22723-2-sawara04.o@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 17:52:36 -07:00
Paolo Abeni
05e059510e Merge branch 'eth-fbnic-add-fbnic-self-tests'
Mike Marciniszyn says:

====================
eth fbnic: Add fbnic self tests

From: "Mike Marciniszyn (Meta)" <mike.marciniszyn@gmail.com>

This series adds self tests to test the registers, the
msix interrupts, the tlv, and the firmware mailbox.

This series assumes that the
[PATCH net-next 0/2] Add debugfs hooks [1]
is present.

When the self tests are run the with ethtool -t:

        ethtool -t eth0
        The test result is PASS
        The test extra info:
        Register test (offline)  0
        MSI-X Interrupt test (offline)   0
        FW mailbox test (on/offline)     0
====================

Link: https://patch.msgid.link/20260307105847.1438-1-mike.marciniszyn@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10 13:53:55 +01:00
Mike Marciniszyn (Meta)
8e5218199d eth fbnic: Add mailbox self test
The mailbox self test ensures the interface to and from
the firmware is healthy by sending a test message and
fielding the response from the firmware.

This patch uses the new completion API [1][2] that allocates a
completion structure, binds the completion to the TEST
message, and uses a new FW parsing routine that wraps the
completion processing around the TLV parser.

Link: https://patch.msgid.link/20250516164804.741348-1-lee@trager.us [1]
Link: https://patch.msgid.link/20260115003353.4150771-6-mohsin.bashr@gmail.com [2]

Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com>
Link: https://patch.msgid.link/20260307105847.1438-6-mike.marciniszyn@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10 13:53:53 +01:00
Mike Marciniszyn (Meta)
d522b1b004 eth fbnic: TLV support for use by MBX self test
The TLV (Type-Value-Length) self uses a known set of data to create a
TLV message.  These routines support the MBX self test by creating
the test messages and parsing the response message coming back
from the firmware.

Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com>
Link: https://patch.msgid.link/20260307105847.1438-5-mike.marciniszyn@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10 13:53:53 +01:00
Mike Marciniszyn (Meta)
99fc8d3d00 eth fbnic: Add msix self test
This function is meant to test the global interrupt registers and the
PCIe IP MSI-X functionality. It essentially goes through and tests
various combinations of the set, clear, and mask bits in order to
verify the behavior is as we expect it to be from the driver.

Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com>
Link: https://patch.msgid.link/20260307105847.1438-4-mike.marciniszyn@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10 13:53:53 +01:00
Mike Marciniszyn (Meta)
b43498b7e9 eth fbnic: Add register self test
The register test will be used to verify hardware is behaving as expected.

The test itself will have us writing to registers that should have no
side effects due to us resetting after the test has been completed.

While the test is being run the interface should be offline.

This patch counts on the first patch of this series to export netif_open()
and also ensures that the half close calls netif_close() to
avoid deadlock.

Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com>
Link: https://patch.msgid.link/20260307105847.1438-3-mike.marciniszyn@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10 13:53:53 +01:00
Mike Marciniszyn (Meta)
3fdd33697c net: export netif_open for self_test usage
dev_open() already is exported, but drivers which use the netdev
instance lock need to use netif_open() instead. netif_close() is
also already exported [1] so this completes the pairing.

This export is required for the following fbnic self tests to
avoid calling ndo_stop() and ndo_open() in favor of the
more appropriate netif_open() and netif_close() that notifies
any listeners that the interface went down to test and is now
coming back up.

Link: https://patch.msgid.link/20250309215851.2003708-1-sdf@fomichev.me [1]
Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com>
Link: https://patch.msgid.link/20260307105847.1438-2-mike.marciniszyn@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10 13:53:52 +01:00
Erni Sri Satya Vennela
89fe91c659 net: mana: hardening: Validate doorbell ID from GDMA_REGISTER_DEVICE response
As a part of MANA hardening for CVM, add validation for the doorbell
ID (db_id) received from hardware in the GDMA_REGISTER_DEVICE response
to prevent out-of-bounds memory access when calculating the doorbell
page address.

In mana_gd_ring_doorbell(), the doorbell page address is calculated as:
  addr = db_page_base + db_page_size * db_index
       = (bar0_va + db_page_off) + db_page_size * db_index

A hardware could return values that cause this address to fall outside
the BAR0 MMIO region. In Confidential VM environments, hardware responses
cannot be fully trusted.

Add the following validations:
- Store the BAR0 size (bar0_size) in gdma_context during probe.
- Validate the doorbell page offset (db_page_off) read from device
  registers does not exceed bar0_size during initialization, converting
  mana_gd_init_registers() to return an error code.
- Validate db_id from GDMA_REGISTER_DEVICE response against the
  maximum number of doorbell pages that fit within BAR0.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Link: https://patch.msgid.link/20260306211212.543376-1-ernis@linux.microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10 13:39:51 +01:00
Lorenzo Bianconi
46097d011f net: airoha: Move GDM forward port configuration in ndo_open/ndo_stop callbacks
This change allows to set GDM forward port configuration to
FE_PSE_PORT_DROP stopping the network device. Hw design requires to stop
packet forwarding putting the interface down.  Moreover, PPE firmware
requires to use PPE1 for GDM3 or GDM4.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260306-airoha-gdm-forward-ndo-open-stop-v1-1-7b7a20dd9ef0@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10 13:23:07 +01:00
Jakub Kicinski
52ede1bce5 Merge branch 'net-stmmac-further-ptp-cleanups'
Russell King says:

====================
net: stmmac: further ptp cleanups

The first uses a local variable when setting n_ext_ts which is a minor
simplification of the code. The second removes the now unnecessary
"available" flag for the PPS outputs.
====================

Link: https://patch.msgid.link/aawDiK7DjcSXSs1X@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:45:31 -07:00
Russell King (Oracle)
687e7863f0 net: stmmac: ptp: remove redundant priv->pps[].available
priv->pps[].available is set in stmmac_ptp_register() for all PPS
outputs reported by hardware up to STMMAC_PPS_MAX.

Since we now set priv->ptp_clock_ops.n_per_out to the number of PPS
outputs that both the hardware and driver can support to prevent
array overflow in stmmac_enable(), this makes priv->pps[].available
redundant. Remove this struct member.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vypHc-0000000CSbl-1X6v@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:45:27 -07:00
Russell King (Oracle)
b560d4434f net: stmmac: ptp: rearrange n_ext_ts initialisation
Use local variables for n_ext_ts rather than referencing the DMA
capability several times.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vypHX-0000000CSbc-123K@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:45:26 -07:00
Russell King (Oracle)
c127d40879 net: stmmac: remove stmmac_dwmac4_get_mac_addr()
stmmac_dwmac4_get_mac_addr() is identical to stmmac_get_mac_addr().
Remove stmmac_dwmac4_get_mac_addr() to avoid this code duplication.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vypJM-0000000CSiJ-48yO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:44:11 -07:00
Victor Nogueira
56acc7f519 selftests/tc-testing: Adapt test's output to HFSC's iproute2 printing changes
To make the printing of HFSC's defcls consistent with HTB's,
iproute2 is now printing defcls prepended with "0x".

This commit adapts test a4c3 to this change.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260307220724.2501212-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:42:19 -07:00
Eric Dumazet
d6d4ff335d tcp: inline tcp_chrono_start()
tcp_chrono_start() is small enough, and used in TCP sendmsg()
fast path (from tcp_skb_entail()).

Note clang is already inlining it from functions in tcp_output.c.

Inlining it improves performance and reduces bloat :

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/2 grow/shrink: 1/0 up/down: 1/-84 (-83)
Function                                     old     new   delta
tcp_skb_entail                               280     281      +1
__pfx_tcp_chrono_start                        16       -     -16
tcp_chrono_start                              68       -     -68
Total: Before=25192434, After=25192351, chg -0.00%

Note that tcp_chrono_stop() is too big.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20260308123549.2924460-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:34:00 -07:00
Eric Dumazet
4b78c9cbd8 tcp: move tp->chrono_type next tp->chrono_stat[]
chrono_type is currently in tcp_sock_read_txrx group, which
is supposed to hold read-mostly fields.

But chrono_type is mostly written in tx path, it should
be moved to tcp_sock_write_tx group, close to other
chrono fields (chrono_stat[], chrono_start).

Note this adds holes, but data locality is far more important.

Use a full u8 for the time being, compiler can generate
more efficient code.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20260308122302.2895067-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:33:43 -07:00
Eric Dumazet
f2db7b80b0 net/sched: refine indirect call mitigation in tc_wrapper.h
Some modern cpus disable X86_FEATURE_RETPOLINE feature,
even if a direct call can still be beneficial.

Even when IBRS is present, an indirect call is more expensive
than a direct one:

Direct Calls:
  Compilers can perform powerful optimizations like inlining,
  where the function body is directly inserted at the call site,
  eliminating call overhead entirely.

Indirect Calls:
  Inlining is much harder, if not impossible, because the compiler
  doesn't know the target function at compile time.
  Techniques like Indirect Call Promotion can help by using
  profile-guided optimization to turn frequently taken indirect calls
  into conditional direct calls, but they still add complexity
  and potential overhead compared to a truly direct call.

In this patch, I split tc_skip_wrapper in two different
static keys, one for tc_act() (tc_skip_wrapper_act)
and one for tc_classify() (tc_skip_wrapper_cls).

Then I enable the tc_skip_wrapper_cls only if the count
of builtin classifiers is above one.

I enable tc_skip_wrapper_act only it the count of builtin
actions is above one.

In our production kernels, we only have CONFIG_NET_CLS_BPF=y
and CONFIG_NET_ACT_BPF=y. Other are modules or are not compiled.

Tested on AMD Turin cpus, cls_bpf_classify() cost went
from 1% down to 0.18 %, and FDO will be able to inline
it in tcf_classify() for further gains.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260307133601.3863071-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:31:41 -07:00
Eric Dumazet
e8eb33d650 tcp: move sysctl_tcp_shrink_window to netns_ipv4_read_txrx group
Commit 18fd64d254 ("netns-ipv4: reorganize netns_ipv4 fast path
variables") missed that __tcp_select_window() is reading
net->ipv4.sysctl_tcp_shrink_window.

Move this field to netns_ipv4_read_txrx group, as __tcp_select_window()
is used both in tx and rx paths.

Saves a potential cache line miss.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260307092214.2433548-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:31:01 -07:00
Lorenzo Bianconi
bf3471e6e6 net: airoha: Make flow control source port mapping dependent on nbq parameter
Flow control source port mapping for USB serdes needs to be configured
according to the GDM port nbq parameter. This is a preliminary patch
since nbq parameter is specific for the given port serdes and needs to
be read from the DTS (in the current codebase is assigned statically).

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260306-airoha-fix-loopback-for-usb-serdes-v2-1-319de9c96826@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:18:49 -07:00
Ankit Garg
014c607f86 gve: add support for UDP GSO for DQO format
Enable support for UDP GSO when using DQO format. Advertise the feature
flag during device initialization and enable offload by default.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20260306224816.3391551-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:17:52 -07:00
Alok Tiwari
dc9c9193c7 selftests: fib_tests: fix link-local retrieval in fib6_nexthop()
fib6_nexthop() retrieves the link-local address for two interfaces used
in the test. However, both lldummy and llv1 are obtained from dummy0.

llv1 is expected to be retrieved from veth1, which is the interface used
later in the test. The subsequent check and error message also expect
the address to be retrieved from veth1.

Fix this by retrieving llv1 from veth1.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20260306180830.2329477-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:17:48 -07:00
Jakub Kicinski
51aaf65bbd Merge tag 'ib-gpio-remove-of-gpio-h-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into mbox
Bartosz Golaszewski says:

====================
Immutable branch between GPIO and net

Convert remaining users of of_gpio.h to using GPIO descriptors and
remove the header.

* tag 'ib-gpio-remove-of-gpio-h-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: remove of_get_named_gpio() and <linux/of_gpio.h>
  nfc: nfcmrvl: convert to gpio descriptors
  nfc: s3fwrn5: convert to gpio descriptors
====================

Link: https://patch.msgid.link/20260309093153.10446-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:11:21 -07:00
Qingfang Deng
abb0eb0b03 ppp: simplify input error handling
Currently, ppp_input_error() indicates an error by allocating a 0-length
skb and calling ppp_do_recv(). It takes an error code argument, which is
stored in skb->cb, but not used by ppp_receive_frame().

Simplify the error handling by removing the unused parameter and the
unnecessary skb allocation. Instead, call ppp_receive_error() directly
from ppp_input_error() under the recv lock, and the length check in
ppp_receive_frame() can be removed.

Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:07:38 -07:00
Eric Dumazet
58e4d35ae7 net/sched: use rtnl_kfree_skbs() in pfifo_fast_reset()
rtnl_kfree_skbs() reduces RTNL and qdisc spinlock hold time.

skbs are freed later after RTNL has been released.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260306133154.678730-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:01:53 -07:00
Sebastian Andrzej Siewior
a23c657e33 net: ethernet: ti: am65-cpsw: Use also port number to identify timestamps
The driver uses packet-type (RX/TX) PTP-message type and PTP-sequence
number to identify a matching timestamp packet for a skb. If the same
PTP packet arrives on both ports (as in a PRP environment) then it is
not obvious which event belongs to which skb.

The event contains also the port number on which it was received.
Instead of masking it out, use it for matching.

Tested-by: Chintan Vankar <c-vankar@ti.com>
Reviewed-by: Martin Kaistra <martin.kaistra@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20260306144439.cVwaaopR@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 19:01:08 -07:00
Eric Dumazet
47e8dbb6e7 net/sched: do not reset queues in graft operations
Following typical script is extremely disruptive,
because each graft operation calls dev_deactivate()
which resets all the queues of the device.

QPARAM="limit 100000 flow_limit 1000 buckets 4096"
TXQS=64
for ETH in eth1
do
 tc qd del dev $ETH root 2>/dev/null
 tc qd add dev $ETH root handle 1: mq
 for i in `seq 1 $TXQS`
 do
   slot=$( printf %x $(( i )) )
   tc qd add dev $ETH parent 1:$slot fq $QPARAM
 done
done

One can add "ip link set dev $ETH down/up" to reduce the disruption time:

QPARAM="limit 100000 flow_limit 1000 buckets 4096"
TXQS=64
for ETH in eth1
do
 ip link set dev $ETH down
 tc qd del dev $ETH root 2>/dev/null
 tc qd add dev $ETH root handle 1: mq
 for i in `seq 1 $TXQS`
 do
   slot=$( printf %x $(( i )) )
   tc qd add dev $ETH parent 1:$slot fq $QPARAM
 done
 ip link set dev $ETH up
done

Or we can add a @reset_needed flag to dev_deactivate() and
dev_deactivate_many().

This flag is set to true at device dismantle or linkwatch_do_dev(),
and to false for graft operations.

In the future, we might only stop one queue instead of the whole
device, ie call dev_deactivate_queue() instead of dev_deactivate().

I think the problem (quadratic behavior) was added in commit
2fb541c862 ("net: sch_generic: aviod concurrent reset and enqueue op
for lockless qdisc") but this does not look serious enough to deserve
risky backports.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260307163430.470644-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 18:55:55 -07:00
Eric Dumazet
82f36517a1 tcp: avoid dst->ops->check() call in tcp_v{4,6}_do_rcv()
If incoming skb dst matches the socket cached one,
there is no need to call again dst->ops->check().

Network layer already validated the skb dst for us,
usually from tcp_v{4,6}_early_demux().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260306154322.1086539-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 18:52:56 -07:00
Rosen Penev
6927430735 net: rocker: kzalloc + kcalloc to kzalloc_flex
Combining the allocations simplifies things, especially the free path.

Remove ofdpa_group_tbl_entry_free as a result. kfree is shorter.

Add __counted_by for extra runtime analysis.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260306025449.12333-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 18:51:07 -07:00
Eric Dumazet
50636e5ff8 tcp: move tcp_v4_early_demux() to net/ipv4/ip_input.c
tcp_v4_early_demux() has a single caller : ip_rcv_finish_core().

Move it to net/ipv4/ip_input.c and mark it static, for possible
compiler/linker optimizations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260306131130.654991-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 18:50:24 -07:00
Keita Morisaki
8ca8eb0576 tcp: remove unused hash_size from struct tcp_out_options
hash_size is declared but never read. The MD5 path always uses a
fixed size of 16, and the TCP-AO path uses tcp_ao_maclen().

This closes a 7-byte hole and reduces the struct size from 96 to
88 bytes.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Keita Morisaki <kmta1236@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260307051619.51685-1-kmta1236@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 18:49:38 -07:00
Tim Bird
2ed4b46b4f net: Add SPDX ids to some source files
Add SPDX-License-Identifier lines to several source
files under the network sub-directory.  Work on files
in the core, dns_resolver, ipv4, ipv6 and
netfilter sub-dirs.  Remove boilerplate
and license reference text to avoid ambiguity.

Rusty Russell has expressed that his contributions
were intended to be GPL-2.0-or-later.

Signed-off-by: Tim Bird <tim.bird@sony.com>
Link: https://patch.msgid.link/20260305004724.87469-1-tim.bird@sony.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 18:32:45 -07:00
Jakub Kicinski
86020d7db0 Merge branch 'tools-ynl-convert-samples-into-selftests'
Jakub Kicinski says:

====================
tools: ynl: convert samples into selftests

The "samples" were always poor man's tests, used to manually
confirm that C YNL works as expected. Since a proper tests/
directory now exists move the samples and use the kselftest
harness to turn them into selftests outputting KTAP.
====================

Link: https://patch.msgid.link/20260307033630.1396085-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 17:02:30 -07:00
Jakub Kicinski
aa234faa5a tools: ynl: convert rt-route sample to selftest
Convert rt-route.c to use kselftest_harness.h with FIXTURE/TEST_F.
This is the last test to convert so clean up the Makefile.

Validate that the connected routes for 192.168.1.0/24 and
2001:db8::/64 appear in the dump.

Output:

  TAP version 13
  1..1
  # Starting 1 tests from 1 test cases.
  #  RUN           rt_route.dump ...
  # oif: nsim0            dst: 192.168.1.0/24
  # oif: lo               dst: ::1/128
  # oif: nsim0            dst: 2001:db8::1/128
  # oif: nsim0            dst: 2001:db8::/64
  # oif: nsim0            dst: fe80::/64
  # oif: nsim0            dst: ff00::/8
  #            OK  rt_route.dump
  ok 1 rt_route.dump
  # PASSED: 1 / 1 tests passed.
  # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Tested-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260307033630.1396085-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 17:02:27 -07:00
Jakub Kicinski
e7a39b8f5f tools: ynl: convert rt-addr sample to selftest
Convert rt-addr.c to use kselftest_harness.h with FIXTURE/TEST_F.

Validate that the addresses configured by the wrapper (192.168.1.1
and 2001:db8::1) appear in the dump.

Output:

  TAP version 13
  1..1
  # Starting 1 tests from 1 test cases.
  #  RUN           rt_addr.dump ...
  #               lo: 127.0.0.1
  #            nsim0: 192.168.1.1
  #               lo: ::1
  #            nsim0: 2001:db8::1
  #            nsim0: fe80::7c66:c9ff:fe5f:bf01
  #            OK  rt_addr.dump
  ok 1 rt_addr.dump
  # PASSED: 1 / 1 tests passed.
  # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Tested-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260307033630.1396085-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 17:02:27 -07:00
Jakub Kicinski
1419fbf5a8 tools: ynl: convert ethtool sample to selftest
Convert ethtool.c to use kselftest_harness.h with FIXTURE/TEST_F.
Move ethtool from BINS to TEST_GEN_FILES and add ethtool.sh wrapper
which sets up a netdevsim device before running the test binary.

Output:

  TAP version 13
  1..2
  # Starting 2 tests from 1 test cases.
  #  RUN           ethtool.channels ...
  #    nsim0: combined 1
  #            OK  ethtool.channels
  ok 1 ethtool.channels
  #  RUN           ethtool.rings ...
  #    nsim0: rx 512 tx 512
  #            OK  ethtool.rings
  ok 2 ethtool.rings
  # PASSED: 2 / 2 tests passed.
  # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Tested-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260307033630.1396085-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 17:02:26 -07:00
Jakub Kicinski
db20b374e7 tools: ynl: convert devlink sample to selftest
Convert devlink.c to use kselftest_harness.h with FIXTURE/TEST_F.
Move devlink from BINS to TEST_GEN_FILES in the Makefile since
it's invoked via the devlink.sh wrapper which sets up netdevsim.

Output:

  TAP version 13
  1..2
  # Starting 2 tests from 1 test cases.
  #  RUN           devlink.dump ...
  # netdevsim/netdevsim1337
  #            OK  devlink.dump
  ok 1 devlink.dump
  #  RUN           devlink.info ...
  # netdevsim/netdevsim1337:
  #   driver: netdevsim
  #   running fw:
  #     fw.mgmt: 10.20.30
  #            OK  devlink.info
  ok 2 devlink.info
  # PASSED: 2 / 2 tests passed.
  # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Tested-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260307033630.1396085-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 17:02:26 -07:00
Jakub Kicinski
7a95e52562 tools: ynl: add netdevsim wrapper library for YNL tests
Some tests need netdevsim setup which is painful to do from C.

Add ynl_nsim_lib.sh, a shared library providing nsim_setup and
nsim_cleanup functions for tests that need a netdevsim device.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Tested-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260307033630.1396085-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09 17:02:26 -07:00