Commit Graph

1323688 Commits

Author SHA1 Message Date
David Howells
9e3cccd176 rxrpc: Fix CPU time starvation in I/O thread
Starvation can happen in the rxrpc I/O thread because it goes back to the
top of the I/O loop after it does any one thing without trying to give any
other connection or call CPU time.  Also, because it processes one call
packet at a time, it tries to do the retransmission loop after each ACK
without checking to see if there are other ACKs already in the queue that
can update the SACK state.

Fix this by:

 (1) Add a received-packet queue on each call.

 (2) Distribute packets from the master Rx queue to the individual call,
     conn and error queues and 'poking' calls to add them to the attend
     queue first thing in the I/O thread.

 (3) Go through all the attention-seeking connections and calls before
     going back to the top of the I/O thread.  Each queue is extracted as a
     whole and then gone through so that new additions to insert themselves
     into the queue.

 (4) Make the call event handler go through all the packets currently on
     the call's rx_queue before transmitting and retransmitting DATA
     packets.

 (5) Drop the skb argument from the call event handler as this is now
     replaced with the rx_queue.  Instead, keep track of whether we
     received a packet or an ACK for the tests that used to rely on that.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-14-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:26 -08:00
David Howells
149d002bee rxrpc: Add a tracepoint to show variables pertinent to jumbo packet size
Add a tracepoint to be called right before packets are transmitted for the
first time that shows variable values that are pertinent to how many
subpackets will be added to a jumbo DATA packet.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-13-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:26 -08:00
David Howells
b7313009c2 rxrpc: Prepare to be able to send jumbo DATA packets
Prepare to be able to send jumbo DATA packets if the we decide to, but
don't enable that yet.  This will allow larger chunks of data to be sent
without reducing the retryability as the subpackets in a jumbo packet can
also be retransmitted individually.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-12-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:26 -08:00
David Howells
3d2bdf73ce rxrpc: Separate the packet length from the data length in rxrpc_txbuf
Separate the packet length from the data length (txb->len) stored in the
rxrpc_txbuf to make security calculations easier.  Also store the
allocation size as that's an upper bound on the size of the security
wrapper and change a number of fields to unsigned short as the amount of
data can't exceed the capacity of a UDP packet.

Also, whilst we're at it, use kzalloc() for txbufs.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-11-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:25 -08:00
David Howells
eeaedc5449 rxrpc: Implement path-MTU probing using padded PING ACKs (RFC8899)
Implement path-MTU probing (along the lines of RFC8899) by padding some of
the PING ACKs we send.  PING ACKs get their own individual responses quite
apart from the acking of data (though, as ACKs, they fulfil that role
also).

The probing concentrates on packet sizes that correspond how many
subpackets can be stuffed inside a jumbo packet as jumbo DATA packets are
just aggregations of individual DATA packets and can be split easily for
retransmission purposes.

If we want to perform probing, we advertise this by setting the maximum
number of jumbo subpackets to 0 in the ack trailer when we send an ACK and
see if the peer is also advertising the service.  This is interpreted by
non-supporting Rx stacks as an indication that jumbo packets aren't
supported.

The MTU sizes advertised in the ACK trailer AF_RXRPC transmits are pegged
at a maximum of 1444 unless pmtud is supported by both sides.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-10-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:25 -08:00
David Howells
420f8af502 rxrpc: Use a large kvec[] in rxrpc_local rather than every rxrpc_txbuf
Use a single large kvec[] in the rxrpc_local struct rather than one in
every rxrpc_txbuf struct to build large packets to save on memory.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-9-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:25 -08:00
David Howells
8b5823ea43 rxrpc: Request an ACK on impending Tx stall
Set the REQUEST-ACK flag on the DATA packet we're about to send if we're
about to stall transmission because the app layer isn't keeping up
supplying us with data to transmit.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-8-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:24 -08:00
David Howells
ff992adbc4 rxrpc: Show stats counter for received reason-0 ACKs
In /proc/net/rxrpc/stats, show the stats counter for received ACKs that
have the reason code set to 0 as some implementations do this.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-7-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:24 -08:00
David Howells
cbe0d89095 rxrpc: Don't set the MORE-PACKETS rxrpc wire header flag
The MORE-PACKETS rxrpc header flag hasn't actually been looked at by
anything since 1988 and not all implementations generate it.

Change rxrpc so that it doesn't set MORE-PACKETS at all rather than setting
it inconsistently.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-6-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:24 -08:00
David Howells
efa95c3235 rxrpc: Clean up Tx header flags generation handling
Clean up the generation of the header flags when building packet headers
for transmission:

 (1) Assemble the flags in a local variable rather than in the txb->flags.

 (2) Do the flags masking and JUMBO-PACKET setting in one bit of code for
     both the main header and the jumbo headers.

 (3) Generate the REQUEST-ACK flag afresh each time.  There's a possibility
     we might want to do jumbo retransmission packets in future.

 (4) Pass the local flags variable to the rxrpc_tx_data tracepoint rather
     than the combination of the txb flags and the wire header flags (the
     latter belong only to the first subpacket).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-5-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:24 -08:00
David Howells
29e03ec757 rxrpc: Use umin() and umax() rather than min_t()/max_t() where possible
Use umin() and umax() rather than min_t()/max_t() where the type specified
is an unsigned type.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-4-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:23 -08:00
David Howells
0e56ebde24 rxrpc: Fix handling of received connection abort
Fix the handling of a connection abort that we've received.  Though the
abort is at the connection level, it needs propagating to the calls on that
connection.  Whilst the propagation bit is performed, the calls aren't then
woken up to go and process their termination, and as no further input is
forthcoming, they just hang.

Also add some tracing for the logging of connection aborts.

Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-3-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:23 -08:00
David Howells
d1fd972914 ktime: Add us_to_ktime()
Add a us_to_ktime() helper to go with ms_to_ktime() and ns_to_ktime().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Thomas Gleixner <tglx@linutronix.de>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:23 -08:00
David S. Miller
6145fefc1e Merge branch 'cn10k-ipswec-outbound-inline-support'
Bharat Bhushan says:

====================
cn10k-ipsec: Add outbound inline ipsec support

This patch series adds outbound inline ipsec support on Marvell
cn10k series of platform. One crypto hardware logical function
(cpt-lf) per netdev is required for inline ipsec outbound
functionality. Software prepare and submit crypto hardware
(CPT) instruction for outbound inline ipsec crypto mode offload.
The CPT instruction have details for encryption and authentication
Crypto hardware encrypt, authenticate and provide the ESP packet
to network hardware logic to transmit ipsec packet.

First patch makes dma memory writable for in-place encryption,
Second patch moves code to common file, Third patch disable
backpressure on crypto (CPT) and network (NIX) hardware.
Patch four onwards enables inline outbound ipsec.

v9->v10:
 - Removed unlikely() in data-patch and used static_branch when at least
   a SA is configured.
 - Added missing READ_ONCE() as per comment on previous patch
 - Removed "\n" from end of extack messages
 - Poll for context write status check reduced to 100ms from 10s

v8->v9:
 - Removed mutex lock to use hardware, now using hardware state
 - Previous versions were supporting only 64 SAs and a bitmap was
   used for same. That limitation is removed from this version.
 - Replaced netdev_err with NL_SET_ERR_MSG_MOD in state add flow
   as per comment in previous version

v7->v8:
 - spell correction in patch 1/8 (s/sdk/skb)

v6->v7:
 - skb data was mapped as device writeable but it was not ensured
   that skb is writeable. This version calls skb_unshare() to make
   skb data writeable (Thanks Jakub Kicinski for pointing out).

v4->v5:
 - Fixed un-initialized warning and pointer check
   (comment from Kalesh Anakkur Purayil)

v3->v4:
 - Few error messages in data-path removed and some moved
   under netif_msg_tx_err().
 - Added check for crypto offload (XFRM_DEV_OFFLOAD_CRYPTO)
   Thanks "Leon Romanovsky" for pointing out
 - Fixed codespell error as per comment from Simon Horman
 - Added some other cleanup comment from Kalesh Anakkur Purayil

v2->v3:
 - Fix smatch and sparse errors (Comment from Simon Horman)
 - Fix build error with W=1 (Comment from Simon Horman)
   https://patchwork.kernel.org/project/netdevbpf/patch/20240513105446.297451-6-bbhushan2@marvell.com/
 - Some other minor cleanup as per comment
   https://www.spinics.net/lists/netdev/msg997197.html

v1->v2:
 - Fix compilation error to build driver a module
 - Use dma_wmb() instead of architecture specific barrier
 - Fix couple of other compilation warnings
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
b3ae3dc3a3 cn10k-ipsec: Enable outbound ipsec crypto offload
Hardware is initialized and netdev transmit flow is
hooked up for outbound ipsec crypto offload, so finally
enable ipsec offload.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
32188be805 cn10k-ipsec: Allow ipsec crypto offload for skb with SA
Allow to use hardware offload for outbound ipsec crypto
mode if security association (SA) is set for a given skb.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
6a77a15884 cn10k-ipsec: Process outbound ipsec crypto offload
Prepare and submit crypto hardware (CPT) instruction for
outbound ipsec crypto offload. The CPT instruction have
authentication offset, IV offset and encapsulation offset
in input packet. Also provide SA context pointer which have
details about algo, keys, salt etc. Crypto hardware encrypt,
authenticate and provide the ESP packet to networking hardware.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
c45211c236 cn10k-ipsec: Add SA add/del support for outb ipsec crypto offload
This patch adds support to add and delete Security Association
(SA) xfrm ops. Hardware maintains SA context in memory allocated
by software. Each SA context is 128 byte aligned and size of
each context is multiple of 128-byte. Add support for transport
and tunnel ipsec mode, ESP protocol, aead aes-gcm-icv16, key size
128/192/256-bits with 32bit salt.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
fe079ab05d cn10k-ipsec: Init hardware for outbound ipsec crypto offload
One crypto hardware logical function (cpt-lf) per netdev is
required for outbound ipsec crypto offload. Allocate, attach
and initialize one crypto hardware function when enabling
outbound ipsec crypto offload. Crypto hardware function will
be detached and freed on disabling outbound ipsec crypto
offload.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
a7ef63dbd5 octeontx2-af: Disable backpressure between CPT and NIX
NIX can assert backpressure to CPT on the NIX<=>CPT link.
Keep the backpressure disabled for now. NIX block anyways
handles backpressure asserted by MAC due to PFC or flow
control pkts.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
c460b7442a octeontx2-pf: Move skb fragment map/unmap to common code
Move skb fragment map/unmap function to common file
so as to reuse same for outbound IPsec crypto offload

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
195c3d4631 octeontx2-pf: map skb data as device writeable
Crypto hardware need write permission for in-place encrypt
or decrypt operation on skb-data to support IPsec crypto
offload. That patch uses skb_unshare to make skb data writeable
for ipsec crypto offload and map skb fragment memory as
device read-write.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:30 +00:00
Justin Lai
7ea2745766 rtase: Refine the if statement
Refine the if statement to improve readability.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20241206084851.760475-1-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 18:26:28 -08:00
Jakub Kicinski
72e2e2f5ee Merge branch 'net-net-add-negotiation-of-in-band-capabilities-remainder'
Russell King says:

====================
net: net: add negotiation of in-band capabilities (remainder)

Here are the last three patches which were not included in the non-RFC
posting, but were in the RFC posting. These add the .pcs_inband()
method to the Lynx, MTK Lynx and XPCS drivers.
====================

Link: https://patch.msgid.link/Z1F1b8eh8s8T627j@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:49:47 -08:00
Russell King (Oracle)
484d0170d6 net: pcs: xpcs: implement pcs_inband_caps() method
Report the PCS inband capabilities to phylink for XPCS.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tJ8NW-006L5V-I9@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:49:43 -08:00
Russell King (Oracle)
520d29bdda net: pcs: pcs-mtk-lynxi: implement pcs_inband_caps() method
Report the PCS in-band capabilities to phylink for the LynxI PCS.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tJ8NR-006L5P-E3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:49:43 -08:00
Russell King (Oracle)
6561f0e547 net: pcs: pcs-lynx: implement pcs_inband_caps() method
Report the PCS in-band capabilities to phylink for the Lynx PCS.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tJ8NM-006L5J-AH@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:49:43 -08:00
Stas Sergeev
3ca459eaba tun: fix group permission check
Currently tun checks the group permission even if the user have matched.
Besides going against the usual permission semantic, this has a
very interesting implication: if the tun group is not among the
supplementary groups of the tun user, then effectively no one can
access the tun device. CAP_SYS_ADMIN still can, but its the same as
not setting the tun ownership.

This patch relaxes the group checking so that either the user match
or the group match is enough. This avoids the situation when no one
can access the device even though the ownership is properly set.

Also I simplified the logic by removing the redundant inversions:
tun_not_capable() --> !tun_capable()

Signed-off-by: Stas Sergeev <stsp2@yandex.ru>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241205073614.294773-1-stsp2@yandex.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:43:02 -08:00
Johannes Berg
81d89e6e88 tools: ynl-gen-c: don't require -o argument
Without -o the tool currently crashes, but it's not marked
as required. The only thing we can't do without it is to
generate the correct #include for user source files, but
we can put a placeholder instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20241206113100.89d35bf124d6.I9228fb704e6d5c9d8e046ef15025a47a48439c1e@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:27:59 -08:00
Johannes Berg
00ab246750 tools: ynl-gen-c: annotate valid choices for --mode
This makes argparse validate the input and helps users
understand which modes are possible.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20241206113100.e2ab5cf6937c.Ie149a0ca5df713860964b44fe9d9ae547f2e1553@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:27:59 -08:00
Jakub Kicinski
860dbab69a Merge branch 'net-convert-some-udp-tunnel-drivers-to-netdev_pcpu_stat_dstats'
Guillaume Nault says:

====================
net: Convert some UDP tunnel drivers to NETDEV_PCPU_STAT_DSTATS.

VXLAN, Geneve and Bareudp use various device counters for managing
RX and TX statistics:

  * VXLAN uses the device core_stats for RX and TX drops, tstats for
    regular RX/TX counters and DEV_STATS_INC() for various types of
    RX/TX errors.

  * Geneve uses tstats for regular RX/TX counters and DEV_STATS_INC()
    for everything else, include RX/TX drops.

  * Bareudp, was recently converted to follow VXLAN behaviour, that is,
    device core_stats for RX and TX drops, tstats for regular RX/TX
    counters and DEV_STATS_INC() for other counter types.

Let's consolidate statistics management around the dstats counters
instead. This avoids using core_stats in VXLAN and Bareudp, as
core_stats is supposed to be used by core networking code only (and not
in drivers).  This also allows Geneve to avoid using atomic increments
when updating RX and TX drop counters, as dstats is per-cpu. Finally,
this also simplifies the code as all three modules now handle stats in
the same way and with only two different sets of counters (the per-cpu
dstats and the atomic DEV_STATS_INC()).

Patch 1 creates dstats helper functions that can be used outside of VRF
(until then, dstats was VRF-specific).
Then patches 2 to 4, convert VXLAN, Geneve and Bareudp, one by one.
====================

Link: https://patch.msgid.link/cover.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:56:57 -08:00
Guillaume Nault
c77200c074 bareudp: Handle stats using NETDEV_PCPU_STAT_DSTATS.
Bareudp uses the TSTATS infrastructure (dev_sw_netstats_*()) for RX
packet counters. It was also recently converted to use the device core
stats (dev_core_stats_*()) for RX and TX drops (see commit 788d5d655b
("bareudp: Use pcpu stats to update rx_dropped counter.")).

Since core stats are to be avoided in drivers, and for consistency with
VXLAN and Geneve, let's convert packet stats handling to DSTATS, which
can handle RX/TX stats and packet drops. Statistics that don't fit
DSTATS are still updated atomically with DEV_STATS_INC().

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/0f4f8448db3ff449ac6e939872b28cf3f8982da7.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:56:53 -08:00
Guillaume Nault
6fa6de3022 geneve: Handle stats using NETDEV_PCPU_STAT_DSTATS.
Geneve uses the TSTATS infrastructure (dev_sw_netstats_*()) for RX
packet counters. All other counters are handled using atomic increments
with DEV_STATS_INC().

Let's convert packet stats handling to DSTATS, which has a per-cpu
counter for packet drops too, to avoid the cost of atomic increments
in these cases. Statistics that don't fit DSTATS are still updated
atomically with DEV_STATS_INC().

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/7af5c09f3c26f0f231fbe383822ca5d1ce0278fa.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:56:52 -08:00
Guillaume Nault
be226352e8 vxlan: Handle stats using NETDEV_PCPU_STAT_DSTATS.
VXLAN uses the TSTATS infrastructure (dev_sw_netstats_*()) for RX and
TX packet counters. It also uses the device core stats
(dev_core_stats_*()) for RX and TX drops.

Let's consolidate that using the DSTATS infrastructure, which can
handle both packet counters and packet drops. Statistics that don't
fit DSTATS are still updated atomically with DEV_STATS_INC().

While there, convert the "len" variable of vxlan_encap_bypass() to
unsigned int, to respect the types of skb->len and
dev_dstats_[rt]x_add().

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/145558b184b3cda77911ca5682b6eb83c3ffed8e.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:56:52 -08:00
Guillaume Nault
18eabadd73 vrf: Make pcpu_dstats update functions available to other modules.
Currently vrf is the only module that uses NETDEV_PCPU_STAT_DSTATS.
In order to make this kind of statistics available to other modules,
we need to define the update functions in netdevice.h.

Therefore, let's define dev_dstats_*() functions for RX and TX packet
updates (packets, bytes and drops). Use these new functions in vrf.c
instead of vrf_rx_stats() and the other manual counter updates.

While there, update the type of the "len" variables to "unsigned int",
so that there're aligned with both skb->len and the new dstats update
functions.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/d7a552ee382c79f4854e7fcc224cf176cd21150d.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:56:52 -08:00
Jakub Kicinski
9ec780b269 Merge branch 'lan78xx-preparations-for-phylink'
Oleksij Rempel says:

====================
lan78xx: Preparations for PHYlink

This patch set is part of the preparatory work for migrating the lan78xx
USB Ethernet driver to the PHYlink framework. During extensive testing,
I observed that resetting the USB adapter can lead to various read/write
errors. While the errors themselves are acceptable, they generate
excessive log messages, resulting in significant log spam. This set
improves error handling to reduce logging noise by addressing errors
directly and returning early when necessary.

Key highlights of this series include:
- Enhanced error handling to reduce log spam while preserving the
  original error values, avoiding unnecessary overwrites.
- Improved error reporting using the `%pe` specifier for better clarity
  in log messages.
- Removal of redundant and problematic PHY fixups for LAN8835 and
  KSZ9031, with detailed explanations in the respective patches.
- Cleanup of code structure, including unified `goto` labels for better
  readability and maintainability, even in simple editors.
====================

Link: https://patch.msgid.link/20241204084142.1152696-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:30 -08:00
Oleksij Rempel
48fb3d3c4b net: usb: lan78xx: Improve error handling in dataport and multicast writes
Update `lan78xx_dataport_write` and `lan78xx_deferred_multicast_write`
to:
- Handle errors during register read/write operations.
- Exit immediately on errors and log them using `%pe` for clarity.
- Avoid silent failures by propagating error codes properly.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-11-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:08 -08:00
Oleksij Rempel
0da202e6a5 net: usb: lan78xx: Add error handling to lan78xx_irq_bus_sync_unlock
Update `lan78xx_irq_bus_sync_unlock` to handle errors in register
read/write operations. If an error occurs, log it and exit the function
appropriately.  This ensures proper handling of failures during IRQ
synchronization.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-10-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:08 -08:00
Oleksij Rempel
65520a70cb net: usb: lan78xx: Add error handling to set_rx_max_frame_length and set_mtu
Improve error handling in `lan78xx_set_rx_max_frame_length` by:
- Checking return values from register read/write operations and
  propagating errors.
- Exiting immediately on failure to ensure proper error reporting.

In `lan78xx_change_mtu`, log errors when changing MTU fails, using `%pe`
for clear error representation.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-9-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:08 -08:00
Oleksij Rempel
77586156b5 net: usb: lan78xx: Add error handling to lan78xx_init_ltm
Convert `lan78xx_init_ltm` to return error codes and handle errors
properly.  Previously, errors during the LTM initialization process were
not propagated, potentially leading to undetected issues. This patch
ensures:

- Errors in `lan78xx_read_reg` and `lan78xx_write_reg` are checked and
  handled.
- Errors are logged with detailed messages using `%pe` for clarity.
- The function exits immediately on error, returning the error code.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-8-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:07 -08:00
Oleksij Rempel
8b1b2ca83b net: usb: lan78xx: Improve error handling in EEPROM and OTP operations
Refine error handling in EEPROM and OTP read/write functions by:
- Return error values immediately upon detection.
- Avoid overwriting correct error codes with `-EIO`.
- Preserve initial error codes as they were appropriate for specific
  failures.
- Use `-ETIMEDOUT` for timeout conditions instead of `-EIO`.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-7-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:07 -08:00
Oleksij Rempel
32ee0dc764 net: usb: lan78xx: Fix error handling in MII read/write functions
Ensure proper error handling in `lan78xx_mdiobus_read` and
`lan78xx_mdiobus_write` by checking return values of register read/write
operations and returning errors to the caller.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-6-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:07 -08:00
Oleksij Rempel
9bcdc610cf net: usb: lan78xx: Improve error reporting with %pe specifier
Replace integer error codes with the `%pe` format specifier in register
read and write error messages. This change provides human-readable error
strings, making logs more informative and debugging easier.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-5-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:06 -08:00
Oleksij Rempel
39aa1d620d net: usb: lan78xx: move functions to avoid forward definitions
Move following functions to avoid forward declarations in the code:
- lan78xx_start_hw()
- lan78xx_stop_hw()
- lan78xx_flush_fifo()
- lan78xx_start_tx_path()
- lan78xx_stop_tx_path()
- lan78xx_flush_tx_fifo()
- lan78xx_start_rx_path()
- lan78xx_stop_rx_path()
- lan78xx_flush_rx_fifo()

These functions will be used in an upcoming PHYlink migration patch.

No modifications to the functionality of the code are made.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:06 -08:00
Oleksij Rempel
6782d06a47 net: usb: lan78xx: Remove KSZ9031 PHY fixup
Remove the KSZ9031RNX PHY fixup from the lan78xx driver. The fixup applied
specific RGMII pad skew configurations globally, but these settings violate the
RGMII specification and cause more harm than benefit.

Key issues with the fixup:
1. **Non-Compliant Timing**: The fixup's delay settings fall outside the RGMII
   specification requirements of 1.5 ns to 2.0 ns:
   - RX Path: Total delay of **2.16 ns** (PHY internal delay of 1.2 ns + 0.96
     ns skew).
   - TX Path: Total delay of **0.96 ns**, significantly below the RGMII minimum
     of 1.5 ns.

2. **Redundant or Incorrect Configurations**:
   - The RGMII skew registers written by the fixup do not meaningfully alter
     the PHY's default behavior and fail to account for its internal delays.
   - The TX_DATA pad skew was not configured, relying on power-on defaults
     that are insufficient for RGMII compliance.

3. **Micrel Driver Support**: By setting `PHY_INTERFACE_MODE_RGMII_ID`, the
   Micrel driver can calculate and assign appropriate skew values for the
   KSZ9031 PHY.  This ensures better timing configurations without relying on
   external fixups.

4. **System Interference**: The fixup applied globally, reconfiguring all
   KSZ9031 PHYs in the system, even those unrelated to the LAN78xx adapter.
   This could lead to unintended and harmful behavior on unrelated interfaces.

While the fixup is removed, a better mechanism is still needed to dynamically
determine the optimal combination of PHY and MAC delays to fully meet RGMII
requirements without relying on Device Tree or global fixups. This would allow
for robust operation across different hardware configurations.

The Micrel driver is capable of using the interface mode value to calculate and
apply better skew values, providing a configuration much closer to the RGMII
specification than the fixup. Removing the fixup ensures better default
behavior and prevents harm to other system interfaces.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:06 -08:00
Oleksij Rempel
7b60c3bf93 net: usb: lan78xx: Remove LAN8835 PHY fixup
Remove the PHY fixup for the LAN8835 PHY in the lan78xx driver due to
the following reasons:

- There is no publicly available information about the LAN8835 PHY.
  However, it appears to be the integrated PHY used in the LAN7800 and
  LAN7850 USB Ethernet controllers. These PHYs use the GMII interface,
  not RGMII as configured by the fixup.

- The correct driver for handling the LAN8835 PHY functionality is the
  Microchip PHY driver (`drivers/net/phy/microchip.c`), which properly
  supports these integrated PHYs.

- The PHY ID `0x0007C130` is actually used by the LAN8742A PHY, which
  only supports RMII. This interface is incompatible with the LAN78xx
  MAC, as the LAN7801 (the only LAN78xx version without an integrated
  PHY) supports only RGMII.

- The mask applied for this fixup is overly broad, inadvertently
  covering both Microchip LAN88xx PHYs and unrelated SMSC LAN8742A PHYs,
  leading to potential conflicts with other devices.

- Testing has shown that removing this fixup for LAN7800 and LAN7850
  does not result in any noticeable difference in functionality, as the
  Microchip PHY driver (`drivers/net/phy/microchip.c`) handles all
  necessary configurations for these integrated PHYs.

- Registering this fixup globally (not limited to USB devices) risks
  conflicts by unintentionally modifying other interfaces whenever a
  LAN7801 adapter is connected to the system.

Note that both LAN7800 and LAN7850 USB Ethernet controllers use an
integrated PHY with the ID `0x0007C132`. Additionally, the LAN7515, a
specialized part for Raspberry Pi, includes an integrated LAN7800 USB
Ethernet controller and USB hub in a multifunctional chip design, and it
also uses the same PHY ID (`0x0007C132`).

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:05 -08:00
Jakub Kicinski
7a2716ac9a Merge branch 'net-phylib-eee-cleanups'
Russell King says:

====================
net: phylib EEE cleanups

Clean up phylib's EEE support. Patches previously posted as RFC as part
of the phylink EEE series.

Patch 1 changes the Marvell driver to use the state we store in
struct phy_device, rather than manually calling
phydev->eee_cfg.eee_enabled.

Patch 2 avoids genphy_c45_ethtool_get_eee() setting ->eee_enabled, as
we copy that from phydev->eee_cfg.eee_enabled later, and after patch 3
mo one uses this after calling genphy_c45_ethtool_get_eee(). In fact,
the only caller of this function now is phy_ethtool_get_eee().

As all callers to genphy_c45_eee_is_active() now pass NULL as its
is_enabled flag, this is no longer useful. Remove the argument in
patch 3.

Patch 4 updates the phylib documentation to make it absolutely clear
that phy_ethtool_get_eee() now fills in all members of struct
ethtool_keee, which is why we now have so many buggy network drivers.
====================

Link: https://patch.msgid.link/Z1GDZlFyF2fsFa3S@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:47:35 -08:00
Russell King (Oracle)
f899c594e1 net: phy: update phy_ethtool_get_eee() documentation
Update the phy_ethtool_get_eee() documentation to make it clear that
all members of struct ethtool_keee are written by this function.

keee.supported, keee.advertised, keee.lp_advertised and keee.eee_active
are all written by genphy_c45_ethtool_get_eee().

keee.tx_lpi_timer, keee.tx_lpi_enabled and keee.eee_enabled are all
written by eeecfg_to_eee().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tJ9JH-006LIz-SO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:47:32 -08:00
Russell King (Oracle)
8f1c716090 net: phy: remove genphy_c45_eee_is_active()'s is_enabled arg
All callers to genphy_c45_eee_is_active() now pass NULL as the
is_enabled argument, which means we never use the value computed
in this function. Remove the argument and clean up this function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/E1tJ9JC-006LIt-Ne@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:47:31 -08:00
Russell King (Oracle)
92f7acb825 net: phy: avoid genphy_c45_ethtool_get_eee() setting eee_enabled
genphy_c45_ethtool_get_eee() is only called from phy_ethtool_get_eee(),
which then calls eeecfg_to_eee(). eeecfg_to_eee() will overwrite
keee.eee_enabled, so there's no point setting keee.eee_enabled in
genphy_c45_ethtool_get_eee(). Remove this assignment.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/E1tJ9J7-006LIn-Jr@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:47:31 -08:00