Commit Graph

1323696 Commits

Author SHA1 Message Date
David Howells
f003e4038f rxrpc: Display stats about jumbo packets transmitted and received
In /proc/net/rxrpc/stats, display statistics about the numbers of different
sizes of jumbo packets transmitted and received, showing counts for 1
subpacket (ie. a non-jumbo packet), 2 subpackets, 3, ... to 8 and then 9+.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-22-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:28 -08:00
David Howells
203457e11b rxrpc: Replace call->acks_first_seq with tracking of the hard ACK point
Replace the call->acks_first_seq variable (which holds ack.firstPacket from
the latest ACK packet and indicates the sequence number of the first ack
slot in the SACK table) with call->acks_hard_ack which will hold the
highest sequence hard ACK'd.  This is 1 less than call->acks_first_seq, but
it fits in the same schema as the other tracking variables which hold the
sequence of a packet, not one past it.

This will fix the rxrpc_congest tracepoint's calculation of SACK window
size which shows one fewer than it should - and will occasionally go to -1.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-21-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:28 -08:00
David Howells
692c4caa07 rxrpc: call->acks_hard_ack is now the same call->tx_bottom, so remove it
Now that packets are removed from the Tx queue in the rotation function
rather than being cleaned up later, call->acks_hard_ack now advances in
step with call->tx_bottom, so remove it.

Some of the places call->acks_hard_ack is used in the rxrpc tracepoints are
replaced by call->acks_first_seq instead as that's the peer's reported idea
of the hard-ACK point.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-20-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:28 -08:00
David Howells
b341a0263b rxrpc: Implement progressive transmission queue struct
We need to scan the buffers in the transmission queue occasionally when
processing ACKs, but the transmission queue is currently a linked list of
transmission buffers which, when we eventually expand the Tx window to 8192
packets will be very slow to walk.

Instead, pull the fields we need to examine a lot (last sent time,
retransmitted flag) into a new struct rxrpc_txqueue and make each one hold
an array of 32 or 64 packets.

The transmission queue is then a list of these structs, each pointing to a
contiguous set of packets.  Scanning is then a lot faster as the flags and
timestamps are concentrated in the CPU dcache.

The transmission timestamps are stored as a number of microseconds from a
base ktime to reduce memory requirements.  This should be fine provided we
manage to transmit an entire buffer within an hour.

This will make implementing RACK-TLP [RFC8985] easier as it will be less
costly to scan the transmission buffers.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-19-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:27 -08:00
David Howells
6396b48ac0 rxrpc: Don't need barrier for ->tx_bottom and ->acks_hard_ack
We don't need a barrier for the ->tx_bottom value (which indicates the
lowest sequence still in the transmission queue) and the ->acks_hard_ack
value (which tracks the DATA packets hard-ack'd by the latest ACK packet
received and thus indicates which DATA packets can now be discarded) as the
app thread doesn't use either value as a reference to memory to access.
Rather, the app thread merely uses these as a guide to how much space is
available in the transmission queue

Change the code to use READ/WRITE_ONCE() instead.

Also, change rxrpc_check_tx_space() to use the same value for tx_bottom
throughout.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-18-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:27 -08:00
David Howells
976b0ca5aa rxrpc: Timestamp DATA packets before transmitting them
Move to setting the timestamp on DATA packets before transmitting them as
part of the preparation.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-17-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:27 -08:00
David Howells
81e7761be5 rxrpc: Only set DF=1 on initial DATA transmission
Change how the DF flag is managed on DATA transmissions.  Set it on initial
transmission and don't set it on retransmissions.  Then remove the handling
for EMSGSIZE in rxrpc_send_data_packet() and just pretend it didn't happen,
leaving it to the retransmission path to retry.

The path-MTU discovery using PING ACKs is then used to probe for the
maximum DATA size - though notification by ICMP will be used if one is
received.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-16-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:27 -08:00
David Howells
cd69a07b6d rxrpc: Fix injection of packet loss
Fix the code that injects packet loss for testing to make sure
call->tx_transmitted is updated.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-15-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:26 -08:00
David Howells
9e3cccd176 rxrpc: Fix CPU time starvation in I/O thread
Starvation can happen in the rxrpc I/O thread because it goes back to the
top of the I/O loop after it does any one thing without trying to give any
other connection or call CPU time.  Also, because it processes one call
packet at a time, it tries to do the retransmission loop after each ACK
without checking to see if there are other ACKs already in the queue that
can update the SACK state.

Fix this by:

 (1) Add a received-packet queue on each call.

 (2) Distribute packets from the master Rx queue to the individual call,
     conn and error queues and 'poking' calls to add them to the attend
     queue first thing in the I/O thread.

 (3) Go through all the attention-seeking connections and calls before
     going back to the top of the I/O thread.  Each queue is extracted as a
     whole and then gone through so that new additions to insert themselves
     into the queue.

 (4) Make the call event handler go through all the packets currently on
     the call's rx_queue before transmitting and retransmitting DATA
     packets.

 (5) Drop the skb argument from the call event handler as this is now
     replaced with the rx_queue.  Instead, keep track of whether we
     received a packet or an ACK for the tests that used to rely on that.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-14-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:26 -08:00
David Howells
149d002bee rxrpc: Add a tracepoint to show variables pertinent to jumbo packet size
Add a tracepoint to be called right before packets are transmitted for the
first time that shows variable values that are pertinent to how many
subpackets will be added to a jumbo DATA packet.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-13-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:26 -08:00
David Howells
b7313009c2 rxrpc: Prepare to be able to send jumbo DATA packets
Prepare to be able to send jumbo DATA packets if the we decide to, but
don't enable that yet.  This will allow larger chunks of data to be sent
without reducing the retryability as the subpackets in a jumbo packet can
also be retransmitted individually.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-12-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:26 -08:00
David Howells
3d2bdf73ce rxrpc: Separate the packet length from the data length in rxrpc_txbuf
Separate the packet length from the data length (txb->len) stored in the
rxrpc_txbuf to make security calculations easier.  Also store the
allocation size as that's an upper bound on the size of the security
wrapper and change a number of fields to unsigned short as the amount of
data can't exceed the capacity of a UDP packet.

Also, whilst we're at it, use kzalloc() for txbufs.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-11-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:25 -08:00
David Howells
eeaedc5449 rxrpc: Implement path-MTU probing using padded PING ACKs (RFC8899)
Implement path-MTU probing (along the lines of RFC8899) by padding some of
the PING ACKs we send.  PING ACKs get their own individual responses quite
apart from the acking of data (though, as ACKs, they fulfil that role
also).

The probing concentrates on packet sizes that correspond how many
subpackets can be stuffed inside a jumbo packet as jumbo DATA packets are
just aggregations of individual DATA packets and can be split easily for
retransmission purposes.

If we want to perform probing, we advertise this by setting the maximum
number of jumbo subpackets to 0 in the ack trailer when we send an ACK and
see if the peer is also advertising the service.  This is interpreted by
non-supporting Rx stacks as an indication that jumbo packets aren't
supported.

The MTU sizes advertised in the ACK trailer AF_RXRPC transmits are pegged
at a maximum of 1444 unless pmtud is supported by both sides.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-10-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:25 -08:00
David Howells
420f8af502 rxrpc: Use a large kvec[] in rxrpc_local rather than every rxrpc_txbuf
Use a single large kvec[] in the rxrpc_local struct rather than one in
every rxrpc_txbuf struct to build large packets to save on memory.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-9-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:25 -08:00
David Howells
8b5823ea43 rxrpc: Request an ACK on impending Tx stall
Set the REQUEST-ACK flag on the DATA packet we're about to send if we're
about to stall transmission because the app layer isn't keeping up
supplying us with data to transmit.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-8-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:24 -08:00
David Howells
ff992adbc4 rxrpc: Show stats counter for received reason-0 ACKs
In /proc/net/rxrpc/stats, show the stats counter for received ACKs that
have the reason code set to 0 as some implementations do this.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-7-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:24 -08:00
David Howells
cbe0d89095 rxrpc: Don't set the MORE-PACKETS rxrpc wire header flag
The MORE-PACKETS rxrpc header flag hasn't actually been looked at by
anything since 1988 and not all implementations generate it.

Change rxrpc so that it doesn't set MORE-PACKETS at all rather than setting
it inconsistently.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-6-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:24 -08:00
David Howells
efa95c3235 rxrpc: Clean up Tx header flags generation handling
Clean up the generation of the header flags when building packet headers
for transmission:

 (1) Assemble the flags in a local variable rather than in the txb->flags.

 (2) Do the flags masking and JUMBO-PACKET setting in one bit of code for
     both the main header and the jumbo headers.

 (3) Generate the REQUEST-ACK flag afresh each time.  There's a possibility
     we might want to do jumbo retransmission packets in future.

 (4) Pass the local flags variable to the rxrpc_tx_data tracepoint rather
     than the combination of the txb flags and the wire header flags (the
     latter belong only to the first subpacket).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-5-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:24 -08:00
David Howells
29e03ec757 rxrpc: Use umin() and umax() rather than min_t()/max_t() where possible
Use umin() and umax() rather than min_t()/max_t() where the type specified
is an unsigned type.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-4-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:23 -08:00
David Howells
0e56ebde24 rxrpc: Fix handling of received connection abort
Fix the handling of a connection abort that we've received.  Though the
abort is at the connection level, it needs propagating to the calls on that
connection.  Whilst the propagation bit is performed, the calls aren't then
woken up to go and process their termination, and as no further input is
forthcoming, they just hang.

Also add some tracing for the logging of connection aborts.

Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-3-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:23 -08:00
David Howells
d1fd972914 ktime: Add us_to_ktime()
Add a us_to_ktime() helper to go with ms_to_ktime() and ns_to_ktime().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Thomas Gleixner <tglx@linutronix.de>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241204074710.990092-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:23 -08:00
David S. Miller
6145fefc1e Merge branch 'cn10k-ipswec-outbound-inline-support'
Bharat Bhushan says:

====================
cn10k-ipsec: Add outbound inline ipsec support

This patch series adds outbound inline ipsec support on Marvell
cn10k series of platform. One crypto hardware logical function
(cpt-lf) per netdev is required for inline ipsec outbound
functionality. Software prepare and submit crypto hardware
(CPT) instruction for outbound inline ipsec crypto mode offload.
The CPT instruction have details for encryption and authentication
Crypto hardware encrypt, authenticate and provide the ESP packet
to network hardware logic to transmit ipsec packet.

First patch makes dma memory writable for in-place encryption,
Second patch moves code to common file, Third patch disable
backpressure on crypto (CPT) and network (NIX) hardware.
Patch four onwards enables inline outbound ipsec.

v9->v10:
 - Removed unlikely() in data-patch and used static_branch when at least
   a SA is configured.
 - Added missing READ_ONCE() as per comment on previous patch
 - Removed "\n" from end of extack messages
 - Poll for context write status check reduced to 100ms from 10s

v8->v9:
 - Removed mutex lock to use hardware, now using hardware state
 - Previous versions were supporting only 64 SAs and a bitmap was
   used for same. That limitation is removed from this version.
 - Replaced netdev_err with NL_SET_ERR_MSG_MOD in state add flow
   as per comment in previous version

v7->v8:
 - spell correction in patch 1/8 (s/sdk/skb)

v6->v7:
 - skb data was mapped as device writeable but it was not ensured
   that skb is writeable. This version calls skb_unshare() to make
   skb data writeable (Thanks Jakub Kicinski for pointing out).

v4->v5:
 - Fixed un-initialized warning and pointer check
   (comment from Kalesh Anakkur Purayil)

v3->v4:
 - Few error messages in data-path removed and some moved
   under netif_msg_tx_err().
 - Added check for crypto offload (XFRM_DEV_OFFLOAD_CRYPTO)
   Thanks "Leon Romanovsky" for pointing out
 - Fixed codespell error as per comment from Simon Horman
 - Added some other cleanup comment from Kalesh Anakkur Purayil

v2->v3:
 - Fix smatch and sparse errors (Comment from Simon Horman)
 - Fix build error with W=1 (Comment from Simon Horman)
   https://patchwork.kernel.org/project/netdevbpf/patch/20240513105446.297451-6-bbhushan2@marvell.com/
 - Some other minor cleanup as per comment
   https://www.spinics.net/lists/netdev/msg997197.html

v1->v2:
 - Fix compilation error to build driver a module
 - Use dma_wmb() instead of architecture specific barrier
 - Fix couple of other compilation warnings
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
b3ae3dc3a3 cn10k-ipsec: Enable outbound ipsec crypto offload
Hardware is initialized and netdev transmit flow is
hooked up for outbound ipsec crypto offload, so finally
enable ipsec offload.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
32188be805 cn10k-ipsec: Allow ipsec crypto offload for skb with SA
Allow to use hardware offload for outbound ipsec crypto
mode if security association (SA) is set for a given skb.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
6a77a15884 cn10k-ipsec: Process outbound ipsec crypto offload
Prepare and submit crypto hardware (CPT) instruction for
outbound ipsec crypto offload. The CPT instruction have
authentication offset, IV offset and encapsulation offset
in input packet. Also provide SA context pointer which have
details about algo, keys, salt etc. Crypto hardware encrypt,
authenticate and provide the ESP packet to networking hardware.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
c45211c236 cn10k-ipsec: Add SA add/del support for outb ipsec crypto offload
This patch adds support to add and delete Security Association
(SA) xfrm ops. Hardware maintains SA context in memory allocated
by software. Each SA context is 128 byte aligned and size of
each context is multiple of 128-byte. Add support for transport
and tunnel ipsec mode, ESP protocol, aead aes-gcm-icv16, key size
128/192/256-bits with 32bit salt.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
fe079ab05d cn10k-ipsec: Init hardware for outbound ipsec crypto offload
One crypto hardware logical function (cpt-lf) per netdev is
required for outbound ipsec crypto offload. Allocate, attach
and initialize one crypto hardware function when enabling
outbound ipsec crypto offload. Crypto hardware function will
be detached and freed on disabling outbound ipsec crypto
offload.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
a7ef63dbd5 octeontx2-af: Disable backpressure between CPT and NIX
NIX can assert backpressure to CPT on the NIX<=>CPT link.
Keep the backpressure disabled for now. NIX block anyways
handles backpressure asserted by MAC due to PFC or flow
control pkts.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
c460b7442a octeontx2-pf: Move skb fragment map/unmap to common code
Move skb fragment map/unmap function to common file
so as to reuse same for outbound IPsec crypto offload

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:31 +00:00
Bharat Bhushan
195c3d4631 octeontx2-pf: map skb data as device writeable
Crypto hardware need write permission for in-place encrypt
or decrypt operation on skb-data to support IPsec crypto
offload. That patch uses skb_unshare to make skb data writeable
for ipsec crypto offload and map skb fragment memory as
device read-write.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-09 12:15:30 +00:00
Justin Lai
7ea2745766 rtase: Refine the if statement
Refine the if statement to improve readability.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20241206084851.760475-1-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 18:26:28 -08:00
Jakub Kicinski
72e2e2f5ee Merge branch 'net-net-add-negotiation-of-in-band-capabilities-remainder'
Russell King says:

====================
net: net: add negotiation of in-band capabilities (remainder)

Here are the last three patches which were not included in the non-RFC
posting, but were in the RFC posting. These add the .pcs_inband()
method to the Lynx, MTK Lynx and XPCS drivers.
====================

Link: https://patch.msgid.link/Z1F1b8eh8s8T627j@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:49:47 -08:00
Russell King (Oracle)
484d0170d6 net: pcs: xpcs: implement pcs_inband_caps() method
Report the PCS inband capabilities to phylink for XPCS.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tJ8NW-006L5V-I9@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:49:43 -08:00
Russell King (Oracle)
520d29bdda net: pcs: pcs-mtk-lynxi: implement pcs_inband_caps() method
Report the PCS in-band capabilities to phylink for the LynxI PCS.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tJ8NR-006L5P-E3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:49:43 -08:00
Russell King (Oracle)
6561f0e547 net: pcs: pcs-lynx: implement pcs_inband_caps() method
Report the PCS in-band capabilities to phylink for the Lynx PCS.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tJ8NM-006L5J-AH@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:49:43 -08:00
Stas Sergeev
3ca459eaba tun: fix group permission check
Currently tun checks the group permission even if the user have matched.
Besides going against the usual permission semantic, this has a
very interesting implication: if the tun group is not among the
supplementary groups of the tun user, then effectively no one can
access the tun device. CAP_SYS_ADMIN still can, but its the same as
not setting the tun ownership.

This patch relaxes the group checking so that either the user match
or the group match is enough. This avoids the situation when no one
can access the device even though the ownership is properly set.

Also I simplified the logic by removing the redundant inversions:
tun_not_capable() --> !tun_capable()

Signed-off-by: Stas Sergeev <stsp2@yandex.ru>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241205073614.294773-1-stsp2@yandex.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:43:02 -08:00
Johannes Berg
81d89e6e88 tools: ynl-gen-c: don't require -o argument
Without -o the tool currently crashes, but it's not marked
as required. The only thing we can't do without it is to
generate the correct #include for user source files, but
we can put a placeholder instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20241206113100.89d35bf124d6.I9228fb704e6d5c9d8e046ef15025a47a48439c1e@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:27:59 -08:00
Johannes Berg
00ab246750 tools: ynl-gen-c: annotate valid choices for --mode
This makes argparse validate the input and helps users
understand which modes are possible.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20241206113100.e2ab5cf6937c.Ie149a0ca5df713860964b44fe9d9ae547f2e1553@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07 17:27:59 -08:00
Jakub Kicinski
860dbab69a Merge branch 'net-convert-some-udp-tunnel-drivers-to-netdev_pcpu_stat_dstats'
Guillaume Nault says:

====================
net: Convert some UDP tunnel drivers to NETDEV_PCPU_STAT_DSTATS.

VXLAN, Geneve and Bareudp use various device counters for managing
RX and TX statistics:

  * VXLAN uses the device core_stats for RX and TX drops, tstats for
    regular RX/TX counters and DEV_STATS_INC() for various types of
    RX/TX errors.

  * Geneve uses tstats for regular RX/TX counters and DEV_STATS_INC()
    for everything else, include RX/TX drops.

  * Bareudp, was recently converted to follow VXLAN behaviour, that is,
    device core_stats for RX and TX drops, tstats for regular RX/TX
    counters and DEV_STATS_INC() for other counter types.

Let's consolidate statistics management around the dstats counters
instead. This avoids using core_stats in VXLAN and Bareudp, as
core_stats is supposed to be used by core networking code only (and not
in drivers).  This also allows Geneve to avoid using atomic increments
when updating RX and TX drop counters, as dstats is per-cpu. Finally,
this also simplifies the code as all three modules now handle stats in
the same way and with only two different sets of counters (the per-cpu
dstats and the atomic DEV_STATS_INC()).

Patch 1 creates dstats helper functions that can be used outside of VRF
(until then, dstats was VRF-specific).
Then patches 2 to 4, convert VXLAN, Geneve and Bareudp, one by one.
====================

Link: https://patch.msgid.link/cover.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:56:57 -08:00
Guillaume Nault
c77200c074 bareudp: Handle stats using NETDEV_PCPU_STAT_DSTATS.
Bareudp uses the TSTATS infrastructure (dev_sw_netstats_*()) for RX
packet counters. It was also recently converted to use the device core
stats (dev_core_stats_*()) for RX and TX drops (see commit 788d5d655b
("bareudp: Use pcpu stats to update rx_dropped counter.")).

Since core stats are to be avoided in drivers, and for consistency with
VXLAN and Geneve, let's convert packet stats handling to DSTATS, which
can handle RX/TX stats and packet drops. Statistics that don't fit
DSTATS are still updated atomically with DEV_STATS_INC().

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/0f4f8448db3ff449ac6e939872b28cf3f8982da7.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:56:53 -08:00
Guillaume Nault
6fa6de3022 geneve: Handle stats using NETDEV_PCPU_STAT_DSTATS.
Geneve uses the TSTATS infrastructure (dev_sw_netstats_*()) for RX
packet counters. All other counters are handled using atomic increments
with DEV_STATS_INC().

Let's convert packet stats handling to DSTATS, which has a per-cpu
counter for packet drops too, to avoid the cost of atomic increments
in these cases. Statistics that don't fit DSTATS are still updated
atomically with DEV_STATS_INC().

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/7af5c09f3c26f0f231fbe383822ca5d1ce0278fa.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:56:52 -08:00
Guillaume Nault
be226352e8 vxlan: Handle stats using NETDEV_PCPU_STAT_DSTATS.
VXLAN uses the TSTATS infrastructure (dev_sw_netstats_*()) for RX and
TX packet counters. It also uses the device core stats
(dev_core_stats_*()) for RX and TX drops.

Let's consolidate that using the DSTATS infrastructure, which can
handle both packet counters and packet drops. Statistics that don't
fit DSTATS are still updated atomically with DEV_STATS_INC().

While there, convert the "len" variable of vxlan_encap_bypass() to
unsigned int, to respect the types of skb->len and
dev_dstats_[rt]x_add().

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/145558b184b3cda77911ca5682b6eb83c3ffed8e.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:56:52 -08:00
Guillaume Nault
18eabadd73 vrf: Make pcpu_dstats update functions available to other modules.
Currently vrf is the only module that uses NETDEV_PCPU_STAT_DSTATS.
In order to make this kind of statistics available to other modules,
we need to define the update functions in netdevice.h.

Therefore, let's define dev_dstats_*() functions for RX and TX packet
updates (packets, bytes and drops). Use these new functions in vrf.c
instead of vrf_rx_stats() and the other manual counter updates.

While there, update the type of the "len" variables to "unsigned int",
so that there're aligned with both skb->len and the new dstats update
functions.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/d7a552ee382c79f4854e7fcc224cf176cd21150d.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:56:52 -08:00
Jakub Kicinski
9ec780b269 Merge branch 'lan78xx-preparations-for-phylink'
Oleksij Rempel says:

====================
lan78xx: Preparations for PHYlink

This patch set is part of the preparatory work for migrating the lan78xx
USB Ethernet driver to the PHYlink framework. During extensive testing,
I observed that resetting the USB adapter can lead to various read/write
errors. While the errors themselves are acceptable, they generate
excessive log messages, resulting in significant log spam. This set
improves error handling to reduce logging noise by addressing errors
directly and returning early when necessary.

Key highlights of this series include:
- Enhanced error handling to reduce log spam while preserving the
  original error values, avoiding unnecessary overwrites.
- Improved error reporting using the `%pe` specifier for better clarity
  in log messages.
- Removal of redundant and problematic PHY fixups for LAN8835 and
  KSZ9031, with detailed explanations in the respective patches.
- Cleanup of code structure, including unified `goto` labels for better
  readability and maintainability, even in simple editors.
====================

Link: https://patch.msgid.link/20241204084142.1152696-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:30 -08:00
Oleksij Rempel
48fb3d3c4b net: usb: lan78xx: Improve error handling in dataport and multicast writes
Update `lan78xx_dataport_write` and `lan78xx_deferred_multicast_write`
to:
- Handle errors during register read/write operations.
- Exit immediately on errors and log them using `%pe` for clarity.
- Avoid silent failures by propagating error codes properly.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-11-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:08 -08:00
Oleksij Rempel
0da202e6a5 net: usb: lan78xx: Add error handling to lan78xx_irq_bus_sync_unlock
Update `lan78xx_irq_bus_sync_unlock` to handle errors in register
read/write operations. If an error occurs, log it and exit the function
appropriately.  This ensures proper handling of failures during IRQ
synchronization.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-10-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:08 -08:00
Oleksij Rempel
65520a70cb net: usb: lan78xx: Add error handling to set_rx_max_frame_length and set_mtu
Improve error handling in `lan78xx_set_rx_max_frame_length` by:
- Checking return values from register read/write operations and
  propagating errors.
- Exiting immediately on failure to ensure proper error reporting.

In `lan78xx_change_mtu`, log errors when changing MTU fails, using `%pe`
for clear error representation.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-9-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:08 -08:00
Oleksij Rempel
77586156b5 net: usb: lan78xx: Add error handling to lan78xx_init_ltm
Convert `lan78xx_init_ltm` to return error codes and handle errors
properly.  Previously, errors during the LTM initialization process were
not propagated, potentially leading to undetected issues. This patch
ensures:

- Errors in `lan78xx_read_reg` and `lan78xx_write_reg` are checked and
  handled.
- Errors are logged with detailed messages using `%pe` for clarity.
- The function exits immediately on error, returning the error code.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-8-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:07 -08:00
Oleksij Rempel
8b1b2ca83b net: usb: lan78xx: Improve error handling in EEPROM and OTP operations
Refine error handling in EEPROM and OTP read/write functions by:
- Return error values immediately upon detection.
- Avoid overwriting correct error codes with `-EIO`.
- Preserve initial error codes as they were appropriate for specific
  failures.
- Use `-ETIMEDOUT` for timeout conditions instead of `-EIO`.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-7-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:07 -08:00
Oleksij Rempel
32ee0dc764 net: usb: lan78xx: Fix error handling in MII read/write functions
Ensure proper error handling in `lan78xx_mdiobus_read` and
`lan78xx_mdiobus_write` by checking return values of register read/write
operations and returning errors to the caller.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-6-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:53:07 -08:00