Commit Graph

1266704 Commits

Author SHA1 Message Date
Serge Semin
f951a64922 net: stmmac: Move MAC caps init to phylink MAC caps getter
After a set of recent fixes the stmmac_phy_setup() and
stmmac_reinit_queues() methods have turned to having some duplicated code.
Let's get rid from the duplication by moving the MAC-capabilities
initialization to the PHYLINK MAC-capabilities getter. The getter is
called during each network device interface open/close cycle. So the
MAC-capabilities will be initialized in generic device open procedure and
in case of the Tx/Rx queues re-initialization as the original code
semantics implies.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23 12:25:35 +02:00
Serge Semin
dc144baeb4 net: stmmac: Rename phylink_get_caps() callback to update_caps()
Since recent commits the stmmac_ops::phylink_get_caps() callback has no
longer been responsible for the phylink MAC capabilities getting, but
merely updates the MAC capabilities in the mac_device_info:🔗:caps
field. Rename the callback to comply with the what the method does now.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23 12:25:35 +02:00
Paolo Abeni
30b3fe0672 Merge branch 'enable-rx-hw-timestamp-for-ptp-packets-using-cpts-fifo'
Chintan Vankar says:

====================
Enable RX HW timestamp for PTP packets using CPTS FIFO

The CPSW offers two mechanisms for communicating packet ingress timestamp
information to the host.

The first mechanism is via the CPTS Event FIFO which records timestamp
when triggered by certain events. One such event is the reception of an
Ethernet packet with a specified EtherType field. This is used to capture
ingress timestamps for PTP packets. With this mechanism the host must
read the timestamp (from the CPTS FIFO) separately from the packet payload
which is delivered via DMA.

In the second mechanism of timestamping, CPSW driver enables hardware
timestamping for all received packets by setting the TSTAMP_EN bit in
CPTS_CONTROL register, which directs the CPTS module to timestamp all
received packets, followed by passing timestamp via DMA descriptors.
This mechanism is responsible for triggering errata i2401:
"CPSW: Host Timestamps Cause CPSW Port to Lock up."

The errata affects all K3 SoCs. Link to errata for AM64x:
https://www.ti.com/lit/er/sprz457h/sprz457h.pdf

As a workaround we can use first mechanism to timestamp received
packets.
====================

Link: https://lore.kernel.org/r/20240419082626.57225-1-c-vankar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23 12:07:25 +02:00
Chintan Vankar
c03a6fd398 net: ethernet: ti: am65-cpsw/ethtool: Enable RX HW timestamp only for PTP packets
In the current mechanism of timestamping, am65-cpsw-nuss driver
enables hardware timestamping for all received packets by setting
the TSTAMP_EN bit in CPTS_CONTROL register, which directs the CPTS
module to timestamp all received packets, followed by passing
timestamp via DMA descriptors. This mechanism causes CPSW Port to
Lock up.

To prevent port lock up, don't enable rx packet timestamping by
setting TSTAMP_EN bit in CPTS_CONTROL register. The workaround for
timestamping received packets is to utilize the CPTS Event FIFO
that records timestamps corresponding to certain events. The CPTS
module is configured to generate timestamps for Multicast Ethernet,
UDP/IPv4 and UDP/IPv6 PTP packets.

Update supported hwtstamp_rx_filters values for CPSW's timestamping
capability.

Fixes: b1f66a5bee ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support")
Signed-off-by: Chintan Vankar <c-vankar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23 12:07:24 +02:00
Chintan Vankar
c459f606f6 net: ethernet: ti: am65-cpts: Enable RX HW timestamp for PTP packets using CPTS FIFO
Add a new function "am65_cpts_rx_timestamp()" which checks for PTP
packets from header and timestamps them.

Add another function "am65_cpts_find_rx_ts()" which finds CPTS FIFO
Event to get the timestamp of received PTP packet.

Signed-off-by: Chintan Vankar <c-vankar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23 12:07:23 +02:00
Paolo Abeni
9b9fd023e9 Merge branch 'read-phy-address-of-switch-from-device-tree-on-mt7530-dsa-subdriver'
Arınç ÜNAL says:

====================
Read PHY address of switch from device tree on MT7530 DSA subdriver

This patch series makes the driver read the PHY address the switch listens
on from the device tree which, in result, brings support for MT7530
switches listening on a different PHY address than 31. And the patch series
simplifies the core operations.

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
====================

Link: https://lore.kernel.org/r/20240418-b4-for-netnext-mt7530-phy-addr-from-dt-and-simplify-core-ops-v3-0-3b5fb249b004@arinc9.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23 10:32:43 +02:00
Arınç ÜNAL
7c5e37d7ee net: dsa: mt7530: simplify core operations
The core_rmw() function calls core_read_mmd_indirect() to read the
requested register, and then calls core_write_mmd_indirect() to write the
requested value to the register. Because Clause 22 is used to access Clause
45 registers, some operations on core_write_mmd_indirect() are
unnecessarily run. Get rid of core_read_mmd_indirect() and
core_write_mmd_indirect(), and run only the necessary operations on
core_write() and core_rmw().

Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23 10:32:40 +02:00
Arınç ÜNAL
868ff5f494 net: dsa: mt7530-mdio: read PHY address of switch from device tree
Read the PHY address the switch listens on from the reg property of the
switch node on the device tree. This change brings support for MT7530
switches on boards with such bootstrapping configuration where the switch
listens on a different PHY address than the hardcoded PHY address on the
driver, 31.

As described on the "MT7621 Programming Guide v0.4" document, the MT7530
switch and its PHYs can be configured to listen on the range of 7-12,
15-20, 23-28, and 31 and 0-4 PHY addresses.

There are operations where the switch PHY registers are used. For the PHY
address of the control PHY, transform the MT753X_CTRL_PHY_ADDR constant
into a macro and use it. The PHY address for the control PHY is 0 when the
switch listens on 31. In any other case, it is one greater than the PHY
address the switch listens on.

Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23 10:32:40 +02:00
Asbjørn Sloth Tønnesen
077633afe0 net: ethernet: mtk_eth_soc: flower: validate control flags
This driver currently doesn't support any control flags.

Use flow_rule_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.

In case any control flags are masked, flow_rule_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.

Only compile-tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240418161821.189263-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 17:42:34 -07:00
Asbjørn Sloth Tønnesen
af7dfa94c2 dpaa2-switch: flower: validate control flags
This driver currently doesn't support any control flags.

Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.

In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.

Only compile-tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20240418161802.189247-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 17:42:33 -07:00
Asbjørn Sloth Tønnesen
93a8540aac cxgb4: flower: validate control flags
This driver currently doesn't support any control flags.

Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.

In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.

Only compile-tested.

Only compile tested, no hardware available.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240418161751.189226-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 17:42:31 -07:00
Jun Gu
2540088b83 net: openvswitch: Check vport netdev name
Ensure that the provided netdev name is not one of its aliases to
prevent unnecessary creation and destruction of the vport by
ovs-vswitchd.

Signed-off-by: Jun Gu <jun.gu@easystack.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20240419061425.132723-1-jun.gu@easystack.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 17:34:02 -07:00
Jakub Kicinski
2557e2ec94 Merge branch 'netlink-add-nftables-spec-w-multi-messages'
Donald Hunter says:

====================
netlink: Add nftables spec w/ multi messages

This series adds a ynl spec for nftables and extends ynl with a --multi
command line option that makes it possible to send transactional batches
for nftables.

This series includes a patch for nfnetlink which adds ACK processing for
batch begin/end messages. If you'd prefer that to be sent separately to
nf-next then I can do so, but I included it here so that it gets seen in
context.

An example of usage is:

./tools/net/ynl/cli.py \
 --spec Documentation/netlink/specs/nftables.yaml \
 --multi batch-begin '{"res-id": 10}' \
 --multi newtable '{"name": "test", "nfgen-family": 1}' \
 --multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \
 --multi batch-end '{"res-id": 10}'
[None, None, None, None]

It can also be used for bundling get requests:

./tools/net/ynl/cli.py \
 --spec Documentation/netlink/specs/nftables.yaml \
 --multi gettable '{"name": "test", "nfgen-family": 1}' \
 --multi getchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \
 --output-json
[{"name": "test", "use": 1, "handle": 1, "flags": [],
 "nfgen-family": 1, "version": 0, "res-id": 2},
 {"table": "test", "name": "chain", "handle": 1, "use": 0,
 "nfgen-family": 1, "version": 0, "res-id": 2}]

There are 2 issues that may be worth resolving:

 - ynl reports errors by raising an NlError exception so only the first
   error gets reported. This could be changed to add errors to the list
   of responses so that multiple errors could be reported.

 - If any message does not get a response (e.g. batch-begin w/o patch 2)
   then ynl waits indefinitely. A recv timeout could be added which
   would allow ynl to terminate.
====================

Link: https://lore.kernel.org/r/20240418104737.77914-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 17:20:46 -07:00
Donald Hunter
bf2ac490d2 netfilter: nfnetlink: Handle ACK flags for batch messages
The NLM_F_ACK flag is ignored for nfnetlink batch begin and end
messages. This is a problem for ynl which wants to receive an ack for
every message it sends, not just the commands in between the begin/end
messages.

Add processing for ACKs for begin/end messages and provide responses
when requested.

I have checked that iproute2, pyroute2 and systemd are unaffected by
this change since none of them use NLM_F_ACK for batch begin/end.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240418104737.77914-5-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 17:20:42 -07:00
Donald Hunter
ba8be00f68 tools/net/ynl: Add multi message support to ynl
Add a "--multi <do-op> <json>" command line to ynl that makes it
possible to add several operations to a single netlink request payload.
The --multi command line option is repeated for each operation.

This is used by the nftables family for transaction batches. For
example:

./tools/net/ynl/cli.py \
 --spec Documentation/netlink/specs/nftables.yaml \
 --multi batch-begin '{"res-id": 10}' \
 --multi newtable '{"name": "test", "nfgen-family": 1}' \
 --multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \
 --multi batch-end '{"res-id": 10}'
[None, None, None, None]

It can also be used for bundling get requests:

./tools/net/ynl/cli.py \
 --spec Documentation/netlink/specs/nftables.yaml \
 --multi gettable '{"name": "test", "nfgen-family": 1}' \
 --multi getchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \
 --output-json
[{"name": "test", "use": 1, "handle": 1, "flags": [],
 "nfgen-family": 1, "version": 0, "res-id": 2},
 {"table": "test", "name": "chain", "handle": 1, "use": 0,
 "nfgen-family": 1, "version": 0, "res-id": 2}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240418104737.77914-4-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 17:20:42 -07:00
Donald Hunter
0a966d606c tools/net/ynl: Fix extack decoding for directional ops
NetlinkProtocol.decode() was looking up ops by response value which breaks
when it is used for extack decoding of directional ops. Instead, pass
the op to decode().

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240418104737.77914-3-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 17:20:42 -07:00
Donald Hunter
1ee7316871 doc/netlink/specs: Add draft nftables spec
Add a spec for nftables that has nearly complete coverage of the ops,
but limited coverage of rule types and subexpressions.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240418104737.77914-2-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 17:20:42 -07:00
Jakub Kicinski
af046fd169 Merge branch 'for-uring-ubufops' into HEAD
Pavel Begunkov says:

====================
implement io_uring notification (ubuf_info) stacking (net part)

To have per request buffer notifications each zerocopy io_uring send
request allocates a new ubuf_info. However, as an skb can carry only
one uarg, it may force the stack to create many small skbs hurting
performance in many ways.

The patchset implements notification, i.e. an io_uring's ubuf_info
extension, stacking. It attempts to link ubuf_info's into a list,
allowing to have multiple of them per skb.

liburing/examples/send-zerocopy shows up 6 times performance improvement
for TCP with 4KB bytes per send, and levels it with MSG_ZEROCOPY. Without
the patchset it requires much larger sends to utilise all potential.

bytes  | before | after (Kqps)
1200   | 195    | 1023
4000   | 193    | 1386
8000   | 154    | 1058
====================

Link: https://lore.kernel.org/all/cover.1713369317.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 17:15:39 -07:00
Pavel Begunkov
65bada80de net: add callback for setting a ubuf_info to skb
At the moment an skb can only have one ubuf_info associated with it,
which might be a performance problem for zerocopy sends in cases like
TCP via io_uring. Add a callback for assigning ubuf_info to skb, this
way we will implement smarter assignment later like linking ubuf_info
together.

Note, it's an optional callback, which should be compatible with
skb_zcopy_set(), that's because the net stack might potentially decide
to clone an skb and take another reference to ubuf_info whenever it
wishes. Also, a correct implementation should always be able to bind to
an skb without prior ubuf_info, otherwise we could end up in a situation
when the send would not be able to progress.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/all/b7918aadffeb787c84c9e72e34c729dc04f3a45d.1713369317.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 16:21:59 -07:00
Pavel Begunkov
7ab4f16f9e net: extend ubuf_info callback to ops structure
We'll need to associate additional callbacks with ubuf_info, introduce
a structure holding ubuf_info callbacks. Apart from a more smarter
io_uring notification management introduced in next patches, it can be
used to generalise msg_zerocopy_put_abort() and also store
->sg_from_iter, which is currently passed in struct msghdr.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/all/a62015541de49c0e2a8a0377a1d5d0a5aeb07016.1713369317.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 16:21:35 -07:00
Jakub Kicinski
65f1df1140 Merge branch 'tcp-avoid-sending-too-small-packets'
Eric Dumazet says:

====================
tcp: avoid sending too small packets

tcp_sendmsg() cooks 'large' skbs, that are later split
if needed from tcp_write_xmit().

After a split, the leftover skb size is smaller than the optimal
size, and this causes a performance drop.

In this series, tcp_grow_skb() helper is added to shift
payload from the second skb in the write queue to the first
skb to always send optimal sized skbs.

This increases TSO efficiency, and decreases number of ACK
packets.
====================

Link: https://lore.kernel.org/r/20240418214600.1291486-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:25:32 -07:00
Eric Dumazet
8ee602c635 tcp: try to send bigger TSO packets
While investigating TCP performance, I found that TCP would
sometimes send big skbs followed by a single MSS skb,
in a 'locked' pattern.

For instance, BIG TCP is enabled, MSS is set to have 4096 bytes
of payload per segment. gso_max_size is set to 181000.

This means that an optimal TCP packet size should contain
44 * 4096 = 180224 bytes of payload,

However, I was seeing packets sizes interleaved in this pattern:

172032, 8192, 172032, 8192, 172032, 8192, <repeat>

tcp_tso_should_defer() heuristic is defeated, because after a split of
a packet in write queue for whatever reason (this might be a too small
CWND or a small enough pacing_rate),
the leftover packet in the queue is smaller than the optimal size.

It is time to try to make 'leftover packets' bigger so that
tcp_tso_should_defer() can give its full potential.

After this patch, we can see the following output:

14:13:34.009273 IP6 sender > receiver: Flags [P.], seq 4048380:4098360, ack 1, win 256, options [nop,nop,TS val 3425678144 ecr 1561784500], length 49980
14:13:34.010272 IP6 sender > receiver: Flags [P.], seq 4098360:4148340, ack 1, win 256, options [nop,nop,TS val 3425678145 ecr 1561784501], length 49980
14:13:34.011271 IP6 sender > receiver: Flags [P.], seq 4148340:4198320, ack 1, win 256, options [nop,nop,TS val 3425678146 ecr 1561784502], length 49980
14:13:34.012271 IP6 sender > receiver: Flags [P.], seq 4198320:4248300, ack 1, win 256, options [nop,nop,TS val 3425678147 ecr 1561784503], length 49980
14:13:34.013272 IP6 sender > receiver: Flags [P.], seq 4248300:4298280, ack 1, win 256, options [nop,nop,TS val 3425678148 ecr 1561784504], length 49980
14:13:34.014271 IP6 sender > receiver: Flags [P.], seq 4298280:4348260, ack 1, win 256, options [nop,nop,TS val 3425678149 ecr 1561784505], length 49980
14:13:34.015272 IP6 sender > receiver: Flags [P.], seq 4348260:4398240, ack 1, win 256, options [nop,nop,TS val 3425678150 ecr 1561784506], length 49980
14:13:34.016270 IP6 sender > receiver: Flags [P.], seq 4398240:4448220, ack 1, win 256, options [nop,nop,TS val 3425678151 ecr 1561784507], length 49980
14:13:34.017269 IP6 sender > receiver: Flags [P.], seq 4448220:4498200, ack 1, win 256, options [nop,nop,TS val 3425678152 ecr 1561784508], length 49980
14:13:34.018276 IP6 sender > receiver: Flags [P.], seq 4498200:4548180, ack 1, win 256, options [nop,nop,TS val 3425678153 ecr 1561784509], length 49980
14:13:34.019259 IP6 sender > receiver: Flags [P.], seq 4548180:4598160, ack 1, win 256, options [nop,nop,TS val 3425678154 ecr 1561784510], length 49980

With 200 concurrent flows on a 100Gbit NIC, we can see a reduction
of TSO packets (and ACK packets) of about 30 %.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240418214600.1291486-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:25:28 -07:00
Eric Dumazet
d5b38a71d3 tcp: call tcp_set_skb_tso_segs() from tcp_write_xmit()
tcp_write_xmit() calls tcp_init_tso_segs()
to set gso_size and gso_segs on the packet.

tcp_init_tso_segs() requires the stack to maintain
an up to date tcp_skb_pcount(), and this makes sense
for packets in rtx queue. Not so much for packets
still in the write queue.

In the following patch, we don't want to deal with
tcp_skb_pcount() when moving payload from 2nd
skb to 1st skb in the write queue.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240418214600.1291486-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:25:27 -07:00
Eric Dumazet
22555032c5 tcp: remove dubious FIN exception from tcp_cwnd_test()
tcp_cwnd_test() has a special handing for the last packet in
the write queue if it is smaller than one MSS and has the FIN flag.

This is in violation of TCP RFC, and seems quite dubious.

This packet can be sent only if the current CWND is bigger
than the number of packets in flight.

Making tcp_cwnd_test() result independent of the first skb
in the write queue is needed for the last patch of the series.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240418214600.1291486-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:25:27 -07:00
Jakub Kicinski
f62a5e7127 Merge branch 'mlx5e-per-queue-coalescing'
Tariq Toukan says:

====================
mlx5e per-queue coalescing

This patchset adds ethtool per-queue coalescing support for the mlx5e
driver.

The series introduce some changes needed as preparations for the final
patch which adds the support and implements the callbacks.  Main
changes:
- DIM code movements into its own header file.
- Switch to dynamic allocation of the DIM struct in the RQs/SQs.
- Allow coalescing config change without channels reset when possible.
====================

Link: https://lore.kernel.org/r/20240419080445.417574-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:22:22 -07:00
Rahul Rameshbabu
651ebaad6e net/mlx5e: Implement ethtool callbacks for supporting per-queue coalescing
Use mlx5 on-the-fly coalescing configuration support to enable individual
channel configuration.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:22:16 -07:00
Rahul Rameshbabu
445a25f6e1 net/mlx5e: Support updating coalescing configuration without resetting channels
When CQE mode or DIM state is changed, gracefully reconfigure channels to
handle new configuration. Previously, would create new channels that would
reflect the changes rather than update the original channels.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:22:16 -07:00
Rahul Rameshbabu
a5e89a3f35 net/mlx5e: Dynamically allocate DIM structure for SQs/RQs
Make it possible for the DIM structure to be torn down while an SQ or RQ is
still active. Changing the CQ period mode is an example where the previous
sampling done with the DIM structure would need to be invalidated.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:22:16 -07:00
Rahul Rameshbabu
eca1e8a628 net/mlx5e: Use DIM constants for CQ period mode parameter
Use core DIM CQ period mode enum values for the CQ parameter for the period
mode. Translate the value to the specific mlx5 device constant for the
selected period mode when creating a CQ. Avoid needing to translate mlx5
device constants to DIM constants for core DIM functionality.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:22:16 -07:00
Rahul Rameshbabu
7ec56914d3 net/mlx5e: Move DIM function declarations to en/dim.h
Create a header specifically for DIM-related declarations. Move existing
DIM-specific functionality from en.h. Future DIM-related functionality will
be declared in en/dim.h in subsequent patches.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:22:16 -07:00
Jakub Kicinski
b240fc56b8 Merge branch 'net-dsa-vsc73xx-convert-to-phylink-and-do-some-cleanup'
Pawel Dembicki says:

====================
net: dsa: vsc73xx: convert to PHYLINK and do some cleanup

This patch series is a result of splitting a larger patch series [0],
where some parts needed to be refactored.

The first patch switches from a poll loop to read_poll_timeout.

The second patch is a simple conversion to phylink because adjust_link
won't work anymore.

The third patch is preparation for future use. Using the
"phy_interface_mode_is_rgmii" macro allows for the proper recognition
of all RGMII modes.

Patches 4-5 involve some cleanup: The fourth patch introduces
a definition with the maximum number of ports to avoid using
magic numbers. The next one fills in documentation.

[0] https://patchwork.kernel.org/project/netdevbpf/list/?series=841034&state=%2A&archive=both
====================

Link: https://lore.kernel.org/r/20240417205048.3542839-1-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:21:32 -07:00
Pawel Dembicki
96944aafaa net: dsa: vsc73xx: add structure descriptions
This commit adds updates to the documentation describing the structures
used in vsc73xx. This will help prevent kdoc-related issues in the future.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Link: https://lore.kernel.org/r/20240417205048.3542839-6-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:21:32 -07:00
Pawel Dembicki
6cc5280a08 net: dsa: vsc73xx: Add define for max num of ports
This patch introduces a new define: VSC73XX_MAX_NUM_PORTS, which can be
used in the future instead of a hardcoded value.

Currently, the only hardcoded value is vsc->ds->num_ports. It is being
replaced with the new define.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20240417205048.3542839-5-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:21:32 -07:00
Pawel Dembicki
12af94b295 net: dsa: vsc73xx: use macros for rgmii recognition
It's preparation for future use. At this moment, the RGMII port is used
only for a connection to the MAC interface, but in the future, someone
could connect a PHY to it. Using the "phy_interface_mode_is_rgmii" macro
allows for the proper recognition of all RGMII modes.

Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240417205048.3542839-4-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:21:32 -07:00
Pawel Dembicki
21fc3416ea net: dsa: vsc73xx: convert to PHYLINK
This patch replaces the adjust_link api with the phylink apis that provide
equivalent functionality.

The remaining functionality from the adjust_link is now covered in the
mac_link_* and mac_config from phylink_mac_ops structure.

Removes:
.adjust_link
Adds phylink_mac_ops structure:
.mac_config
.mac_link_up
.mac_link_down

Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Link: https://lore.kernel.org/r/20240417205048.3542839-3-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:21:30 -07:00
Pawel Dembicki
eb7e33d01d net: dsa: vsc73xx: use read_poll_timeout instead delay loop
Switch the delay loop during the Arbiter empty check from
vsc73xx_adjust_link() to use read_poll_timeout(). Functionally,
one msleep() call is eliminated at the end of the loop in the timeout
case.

As Russell King suggested:

"This [change] avoids the issue that on the last iteration, the code reads
the register, tests it, finds the condition that's being waiting for is
false, _then_ waits and end up printing the error message - that last
wait is rather useless, and as the arbiter state isn't checked after
waiting, it could be that we had success during the last wait."

Suggested-by: Russell King <linux@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Link: https://lore.kernel.org/r/20240417205048.3542839-2-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:20:13 -07:00
Eric Dumazet
c51db4ac10 tcp: do not export tcp_twsk_purge()
After commit 1eeb504357 ("tcp/dccp: do not care about
families in inet_twsk_purge()") tcp_twsk_purge() is
no longer potentially called from a module.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22 11:53:51 +01:00
Geetha sowjanya
6a57f09162 octeontx2-pf: Add support for offload tc with skbedit mark action
Support offloading of skbedit mark action.

For example, to mark with 0x0008, with dest ip 60.60.60.2 on eth2
interface:

 # tc qdisc add dev eth2 ingress
 # tc filter add dev eth2 ingress protocol ip flower \
      dst_ip 60.60.60.2 action skbedit mark 0x0008 skip_sw

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22 10:14:14 +01:00
Julien Panis
80b7aae9e3 net: ethernet: ti: am65-cpsw: Fix xdp_rxq error for disabled port
When an ethX port is disabled in the device tree, an error is returned
by xdp_rxq_info_reg() function while transitioning the CPSW device to
the up state. The message 'Missing net_device from driver' is output.

This patch fixes the issue by registering xdp_rxq info only if ethX
port is enabled (i.e. ndev pointer is not NULL).

Fixes: 8acacc40f7 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support")
Link: https://lore.kernel.org/all/260d258f-87a1-4aac-8883-aab4746b32d8@ti.com/
Reported-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Closes: https://gist.github.com/Siddharth-Vadapalli-at-TI/5ed0e436606001c247a7da664f75edee
Signed-off-by: Julien Panis <jpanis@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22 09:13:59 +01:00
Thomas Weißschuh
bfa858f220 sysctl: treewide: constify ctl_table_header::ctl_table_arg
To be able to constify instances of struct ctl_tables it is necessary to
remove ways through which non-const versions are exposed from the
sysctl core.
One of these is the ctl_table_arg member of struct ctl_table_header.

Constify this reference as a prerequisite for the full constification of
struct ctl_table instances.
No functional change.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22 08:56:31 +01:00
Jakub Kicinski
8442f8ba26 Merge branch 'testing-make-netfilter-selftests-functional-in-vng-environment'
Florian Westphal says:

====================
testing: make netfilter selftests functional in vng environment

This is the second batch of the netfilter selftest move.

Changes since v1:
- makefile and kernel config are updated to have all required features
- fix makefile with missing bits to make kselftest-install work
- test it via vng as per
   https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
   (Thanks Jakub!)
- squash a few fixes, e.g. nft_queue.sh v1 had a race w. NFNETLINK_QUEUE=m
- add a settings file with 8m timeout, for nft_concat_range.sh sake.
  That script can be sped up a bit, I think, but its not contained in
  this batch yet.
- toss the first two bogus rebase artifacts (Matthieu Baerts)

scripts are moved to lib.sh infra. This allows to use busywait helper
and ditch various 'sleep 2' all over the place.

Tested on Fedora 39:

vng --build  --config tools/testing/selftests/net/netfilter/config
make -C tools/testing/selftests/ TARGETS=net/netfilter
vng -v --run . --user root --cpus 2 -- \
        make -C tools/testing/selftests TARGETS=net/netfilter run_tests

... all tests pass except nft_audit.sh which SKIPs due to nft version mismatch
(Fedora is on nft 1.0.7 which lacks reset keyword support).

Missing/WIP bits:
- speed up nf_concat_range.sh test
- extend flowtable selftest
- shellcheck fixups for remaining scripts
====================

Link: https://lore.kernel.org/r/20240418152744.15105-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-19 20:10:53 -07:00
Florian Westphal
0b2e1db97b selftests: netfilter: update makefiles and kernel config
Jakub reports the Makefile missed a few updates to make kselftest-install
work for the netfilter tests and points out that config file lacks many
dependencies such as VETH support.

The settings file (timeout 8m) is added for nft_concat_range.sh script
which can take several minutes to complete.

Fixes: 3f189349e5 ("selftests: netfilter: move to net subdir")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/all/20240412175413.04e5e616@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-13-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-19 20:10:51 -07:00
Florian Westphal
1f50b0fef9 selftests: netfilter: nft_audit.sh: add more skip checks
This testcase doesn't work if auditd is running, audit_logread will not
receive any data in that case.

Add a nftables feature test for the reset keyword and skip this test
if that fails.

While at it, do a few minor shellcheck cleanups.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-12-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-19 20:10:51 -07:00
Florian Westphal
4d7730154e selftests: netfilter: nft_meta.sh: small shellcheck cleanup
shellcheck complains about missing "", so add those.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-11-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-19 20:10:50 -07:00
Florian Westphal
9b443c769b selftests: netfilter: nft_fib.sh: shellcheck cleanups
no functional change intended.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-10-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-19 20:10:50 -07:00
Florian Westphal
05af10a88e selftests: netfilter: conntrack_ipip_mtu.sh: shellcheck cleanups
No functional change intended.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-9-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-19 20:10:50 -07:00
Florian Westphal
d6905f088d selftests: netfilter: nft_nat_zones.sh: shellcheck cleanups
While at it: No need for iperf here, use socat.
This also reduces the script runtime.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-8-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-19 20:10:50 -07:00
Florian Westphal
c0f9a2b705 selftests: netfilter: xt_string.sh: shellcheck cleanups
no functional change intended.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-7-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-19 20:10:50 -07:00
Florian Westphal
5067fec094 selftests: netfilter: xt_string.sh: move to lib.sh infra
Intentional changes:
- Use socat instead of netcat
- Use a temporary file instead of pipe, else packets do not match
  "-m string" rules, multiple writes to the pipe cause multiple packets,
  but this needs only one to work.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-6-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-19 20:10:50 -07:00
Florian Westphal
c1a9d47b59 selftests: netfilter: nft_zones_many.sh: move to lib.sh infra
Also do shellcheck cleanups here, no functional changes intended.
When running tests via vng tool, the packetpath insertion test fails:
dd: failed to open '/dev/stdout': Device or resource busy

Just omit 'of=' and this will work as intended.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-5-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-19 20:10:50 -07:00