The set of bits that the VXLAN netdevice currently considers reserved is
defined by the features enabled at the netdevice construction. In order to
make this configurable, add an attribute, IFLA_VXLAN_RESERVED_BITS. The
payload is a pair of big-endian u32's covering the VXLAN header. This is
validated against the set of flags used by the various enabled VXLAN
features, and attempts to override bits used by an enabled feature are
bounced.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/c657275e5ceed301e62c69fe8e559e32909442e2.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In order to make it possible to configure which bits in VXLAN header should
be considered reserved, introduce a new field vxlan_config::reserved_bits.
Have it cover the whole header, except for the VNI-present bit and the bits
for VNI itself, and have individual enabled features clear more bits off
reserved_bits.
(This is expressed as first constructing a used_bits set, and then
inverting it to get the reserved_bits. The set of used_bits will be useful
on its own for validation of user-set reserved_bits in a following patch.)
The patch also moves a comment relevant to the validation from the unparsed
validation site up to the new site. Logically this patch should add the new
comment, and a later patch that removes the unparsed bits would remove the
old comment. But keeping both legs in the same patch is better from the
history spelunking point of view.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/984dbf98d5940d3900268dbffaf70961f731d4a4.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The functions vxlan_remcsum() and vxlan_parse_gbp_hdr() take both the SKB
and the unparsed VXLAN header. Now that unparsed adjustment is handled
directly by vxlan_rcv(), drop this argument, and have the function derive
it from the SKB on its own.
vxlan_parse_gpe_proto() does not take SKB, so keep the header parameter.
However const it so that it's clear that the intention is that it does not
get changed.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/5ea651f4e06485ba1a84a8eb556a457c39f0dfd4.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
vxlan_sock.flags is constructed from vxlan_dev.cfg.flags, as the subset of
flags (named VXLAN_F_RCV_FLAGS) that is important from the point of view of
socket sharing. Attempts to reconfigure these flags during the vxlan netdev
lifetime are also bounced. It is therefore immaterial whether we access the
flags through the vxlan_dev or through the socket.
Convert the socket accesses to netdevice accesses in this separate patch to
make the conversions that take place in the following patches more obvious.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/5d237ffd731055e524d7b7c436de43358d8743d2.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hardware is initialized and netdev transmit flow is
hooked up for outbound ipsec crypto offload, so finally
enable ipsec offload.
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow to use hardware offload for outbound ipsec crypto
mode if security association (SA) is set for a given skb.
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare and submit crypto hardware (CPT) instruction for
outbound ipsec crypto offload. The CPT instruction have
authentication offset, IV offset and encapsulation offset
in input packet. Also provide SA context pointer which have
details about algo, keys, salt etc. Crypto hardware encrypt,
authenticate and provide the ESP packet to networking hardware.
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to add and delete Security Association
(SA) xfrm ops. Hardware maintains SA context in memory allocated
by software. Each SA context is 128 byte aligned and size of
each context is multiple of 128-byte. Add support for transport
and tunnel ipsec mode, ESP protocol, aead aes-gcm-icv16, key size
128/192/256-bits with 32bit salt.
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One crypto hardware logical function (cpt-lf) per netdev is
required for outbound ipsec crypto offload. Allocate, attach
and initialize one crypto hardware function when enabling
outbound ipsec crypto offload. Crypto hardware function will
be detached and freed on disabling outbound ipsec crypto
offload.
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NIX can assert backpressure to CPT on the NIX<=>CPT link.
Keep the backpressure disabled for now. NIX block anyways
handles backpressure asserted by MAC due to PFC or flow
control pkts.
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move skb fragment map/unmap function to common file
so as to reuse same for outbound IPsec crypto offload
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Crypto hardware need write permission for in-place encrypt
or decrypt operation on skb-data to support IPsec crypto
offload. That patch uses skb_unshare to make skb data writeable
for ipsec crypto offload and map skb fragment memory as
device read-write.
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently tun checks the group permission even if the user have matched.
Besides going against the usual permission semantic, this has a
very interesting implication: if the tun group is not among the
supplementary groups of the tun user, then effectively no one can
access the tun device. CAP_SYS_ADMIN still can, but its the same as
not setting the tun ownership.
This patch relaxes the group checking so that either the user match
or the group match is enough. This avoids the situation when no one
can access the device even though the ownership is properly set.
Also I simplified the logic by removing the redundant inversions:
tun_not_capable() --> !tun_capable()
Signed-off-by: Stas Sergeev <stsp2@yandex.ru>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241205073614.294773-1-stsp2@yandex.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bareudp uses the TSTATS infrastructure (dev_sw_netstats_*()) for RX
packet counters. It was also recently converted to use the device core
stats (dev_core_stats_*()) for RX and TX drops (see commit 788d5d655b
("bareudp: Use pcpu stats to update rx_dropped counter.")).
Since core stats are to be avoided in drivers, and for consistency with
VXLAN and Geneve, let's convert packet stats handling to DSTATS, which
can handle RX/TX stats and packet drops. Statistics that don't fit
DSTATS are still updated atomically with DEV_STATS_INC().
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/0f4f8448db3ff449ac6e939872b28cf3f8982da7.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geneve uses the TSTATS infrastructure (dev_sw_netstats_*()) for RX
packet counters. All other counters are handled using atomic increments
with DEV_STATS_INC().
Let's convert packet stats handling to DSTATS, which has a per-cpu
counter for packet drops too, to avoid the cost of atomic increments
in these cases. Statistics that don't fit DSTATS are still updated
atomically with DEV_STATS_INC().
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/7af5c09f3c26f0f231fbe383822ca5d1ce0278fa.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
VXLAN uses the TSTATS infrastructure (dev_sw_netstats_*()) for RX and
TX packet counters. It also uses the device core stats
(dev_core_stats_*()) for RX and TX drops.
Let's consolidate that using the DSTATS infrastructure, which can
handle both packet counters and packet drops. Statistics that don't
fit DSTATS are still updated atomically with DEV_STATS_INC().
While there, convert the "len" variable of vxlan_encap_bypass() to
unsigned int, to respect the types of skb->len and
dev_dstats_[rt]x_add().
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/145558b184b3cda77911ca5682b6eb83c3ffed8e.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently vrf is the only module that uses NETDEV_PCPU_STAT_DSTATS.
In order to make this kind of statistics available to other modules,
we need to define the update functions in netdevice.h.
Therefore, let's define dev_dstats_*() functions for RX and TX packet
updates (packets, bytes and drops). Use these new functions in vrf.c
instead of vrf_rx_stats() and the other manual counter updates.
While there, update the type of the "len" variables to "unsigned int",
so that there're aligned with both skb->len and the new dstats update
functions.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/d7a552ee382c79f4854e7fcc224cf176cd21150d.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Improve error handling in `lan78xx_set_rx_max_frame_length` by:
- Checking return values from register read/write operations and
propagating errors.
- Exiting immediately on failure to ensure proper error reporting.
In `lan78xx_change_mtu`, log errors when changing MTU fails, using `%pe`
for clear error representation.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-9-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert `lan78xx_init_ltm` to return error codes and handle errors
properly. Previously, errors during the LTM initialization process were
not propagated, potentially leading to undetected issues. This patch
ensures:
- Errors in `lan78xx_read_reg` and `lan78xx_write_reg` are checked and
handled.
- Errors are logged with detailed messages using `%pe` for clarity.
- The function exits immediately on error, returning the error code.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-8-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Refine error handling in EEPROM and OTP read/write functions by:
- Return error values immediately upon detection.
- Avoid overwriting correct error codes with `-EIO`.
- Preserve initial error codes as they were appropriate for specific
failures.
- Use `-ETIMEDOUT` for timeout conditions instead of `-EIO`.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-7-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move following functions to avoid forward declarations in the code:
- lan78xx_start_hw()
- lan78xx_stop_hw()
- lan78xx_flush_fifo()
- lan78xx_start_tx_path()
- lan78xx_stop_tx_path()
- lan78xx_flush_tx_fifo()
- lan78xx_start_rx_path()
- lan78xx_stop_rx_path()
- lan78xx_flush_rx_fifo()
These functions will be used in an upcoming PHYlink migration patch.
No modifications to the functionality of the code are made.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the KSZ9031RNX PHY fixup from the lan78xx driver. The fixup applied
specific RGMII pad skew configurations globally, but these settings violate the
RGMII specification and cause more harm than benefit.
Key issues with the fixup:
1. **Non-Compliant Timing**: The fixup's delay settings fall outside the RGMII
specification requirements of 1.5 ns to 2.0 ns:
- RX Path: Total delay of **2.16 ns** (PHY internal delay of 1.2 ns + 0.96
ns skew).
- TX Path: Total delay of **0.96 ns**, significantly below the RGMII minimum
of 1.5 ns.
2. **Redundant or Incorrect Configurations**:
- The RGMII skew registers written by the fixup do not meaningfully alter
the PHY's default behavior and fail to account for its internal delays.
- The TX_DATA pad skew was not configured, relying on power-on defaults
that are insufficient for RGMII compliance.
3. **Micrel Driver Support**: By setting `PHY_INTERFACE_MODE_RGMII_ID`, the
Micrel driver can calculate and assign appropriate skew values for the
KSZ9031 PHY. This ensures better timing configurations without relying on
external fixups.
4. **System Interference**: The fixup applied globally, reconfiguring all
KSZ9031 PHYs in the system, even those unrelated to the LAN78xx adapter.
This could lead to unintended and harmful behavior on unrelated interfaces.
While the fixup is removed, a better mechanism is still needed to dynamically
determine the optimal combination of PHY and MAC delays to fully meet RGMII
requirements without relying on Device Tree or global fixups. This would allow
for robust operation across different hardware configurations.
The Micrel driver is capable of using the interface mode value to calculate and
apply better skew values, providing a configuration much closer to the RGMII
specification than the fixup. Removing the fixup ensures better default
behavior and prevents harm to other system interfaces.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the PHY fixup for the LAN8835 PHY in the lan78xx driver due to
the following reasons:
- There is no publicly available information about the LAN8835 PHY.
However, it appears to be the integrated PHY used in the LAN7800 and
LAN7850 USB Ethernet controllers. These PHYs use the GMII interface,
not RGMII as configured by the fixup.
- The correct driver for handling the LAN8835 PHY functionality is the
Microchip PHY driver (`drivers/net/phy/microchip.c`), which properly
supports these integrated PHYs.
- The PHY ID `0x0007C130` is actually used by the LAN8742A PHY, which
only supports RMII. This interface is incompatible with the LAN78xx
MAC, as the LAN7801 (the only LAN78xx version without an integrated
PHY) supports only RGMII.
- The mask applied for this fixup is overly broad, inadvertently
covering both Microchip LAN88xx PHYs and unrelated SMSC LAN8742A PHYs,
leading to potential conflicts with other devices.
- Testing has shown that removing this fixup for LAN7800 and LAN7850
does not result in any noticeable difference in functionality, as the
Microchip PHY driver (`drivers/net/phy/microchip.c`) handles all
necessary configurations for these integrated PHYs.
- Registering this fixup globally (not limited to USB devices) risks
conflicts by unintentionally modifying other interfaces whenever a
LAN7801 adapter is connected to the system.
Note that both LAN7800 and LAN7850 USB Ethernet controllers use an
integrated PHY with the ID `0x0007C132`. Additionally, the LAN7515, a
specialized part for Raspberry Pi, includes an integrated LAN7800 USB
Ethernet controller and USB hub in a multifunctional chip design, and it
also uses the same PHY ID (`0x0007C132`).
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241204084142.1152696-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update the phy_ethtool_get_eee() documentation to make it clear that
all members of struct ethtool_keee are written by this function.
keee.supported, keee.advertised, keee.lp_advertised and keee.eee_active
are all written by genphy_c45_ethtool_get_eee().
keee.tx_lpi_timer, keee.tx_lpi_enabled and keee.eee_enabled are all
written by eeecfg_to_eee().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tJ9JH-006LIz-SO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ucc_geth is quite capable in terms of supported interfaces, and even
includes an externally controlled PCS (well, TBI). Port that driver to
phylink.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A number of parallel MII interfaces also exist in a "Reduced" mode,
usually with higher clock rates and fewer data lines, to ease the
hardware design. This is what the 'R' stands for in RGMII, RMII, RTBI,
RXAUI, etc.
The UCC Geth controller has a special configuration bit that needs to be
set when the MII mode is one of the supported reduced modes.
Add a local helper for that.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The uec_configure_serdes() function deals with serialized linkmodes
settings. It's used during the link bringup sequence. It is planned to
be used during the phylink conversion for mac configuration, but it
needs to me moved around in the process. To make the phylink port
clearer, this commit moves the function without any feature change.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The preamble length can be configured in ucc_geth, however it just
ends-up always being configured to 7 bytes, as nothing ever changes the
default value of 7.
Make that value the default value when the MACCFG2 register gets
initialized, and remove the code to configure that value altogether.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The frame length check is configured when the phy interface is setup.
However, it's configured according to an internal flag that is always
false. So, just make so that we disable the relevant bit in the MACCFG2
register upon accessing it for other MAC configuration operations.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The WoL opts are represented through a bitmask stored in a u32. As this
mask is copied as-is in the driver, make sure we use the exact same type
to store them internally.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The get/set_wol ethtool ops rely on querying the PHY for its WoL
capabilities, checking for the presence of a PHY and a PHY interrupts
isn't enough. Address that by cleaning up the WoL configuration
sequence.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
As this driver pre-dates phylib, it uses a private pointer to get a
reference to the attached phy_device. Drop that pointer and use the
netdev's pointer instead.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preparing the phylink conversion, split the adjust_link callbaclk, by
clearly separating the mac configuration, link_up and link_down phases.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>