The driver allocates a cmd_buf array in addition to the desc_buf array.
This array stores an ice_sq_cd command details structure for each entry in
the control queue ring.
The contents of the structure are copied from the value passed in via
ice_sq_send_cmd, and include only a pointer to storage for the write back
descriptor contents.
Originally this array was intended to support asynchronous completion
including features such as a callback function. This support was never
implemented. All that exists today is needless copying and resetting of a
cmd_buf array that is otherwise functionally unused.
Since we do not plan to implement asynchronous completions, drop this
unnecessary memory and logic. This saves memory for each control queue, and
avoids the pointless copying and memset.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Many comments in ice_controlq.c use the term "Admin queue" despite the code
being intended for arbitrary control queues, not just the Admin queue.
Reword the comments to make it clear that this code is the generic control
queue logic that is shared by all of the control queues, and is not
specific to the Admin queue.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_debug_cq() function is called to generate a debug log of control
queue messages both sent and received. It currently does this over a
potential total of 6 different printk invocations.
The main logic prints over 4 calls to ice_debug():
1. The metadata including opcode, flags, datalength and return value.
2. The cookie in the descriptor.
3. The parameter values.
4. The address for the databuffer.
In addition, if the descriptor has a data buffer, it can be logged with two
additional prints:
5. A message indicating the start of the data buffer.
6. The actual data buffer, printed using print_hex_dump_debug.
This can lead to trouble in the event that two different PFs are logging
messages. The messages become intermixed and it may not be possible to
determine which part of the output belongs to which control queue message.
To fix this, it needs to be possible to unambiguously determine which
messages belong together. This is trivial for the messages that comprise
the main printing. Combine them together into a single invocation of
ice_debug().
The message containing a hex-dump of the data buffer is a bit more
complicated. This is printed separately as part of print_hex_dump_debug.
This function takes a prefix, which is currently always set to
KBUILD_MODNAME. Extend this prefix to include the buffer address for the
databuffer, which is printed as part of the main print, and which is
guaranteed to be unique for each buffer.
Refactor the ice_debug_array(), introducing an ice_debug_array_w_prefix().
Build the prefix by combining KBUILD_MODNAME with the databuffer address
using snprintf().
These changes make it possible to unambiguously determine what data belongs
to what control queue message.
Reported-by: Jacek Wierzbicki <jacek.wierzbicki@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, debug logs are unnecessarily cluttered with the contents of
command data buffers even if the receiver of that command (i.e. FW or MBX)
are not told to read the buffer. Change to only log command data buffers
when the RD flag (indicates receiver needs to read the buffer) is set.
Continue to log response data buffer when the returned datalen is non-zero.
Also, rename a local variable to reflect what is in the hardware
specification and how it is used elsewhere in the code, use local variables
instead of duplicating endian conversions unnecessarily and remove an
unnecessary assignment.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_debug_cq function is called to print debug data for a control queue
descriptor in multiple places. This includes both before we send a message
on a transmit queue, after the writeback completion of a message on the
transmit queue, and when we receive a message on a receive queue.
This function does not include data about *which* control queue the message
is on, nor whether it was what we sent to the queue or what we received
from the queue.
Modify ice_debug_cq to take two extra parameters, a pointer to the control
queue and a boolean indicating if this was a response or a command. Improve
the debug messages by replacing "CQ CMD" with a string indicating which
specific control queue (based on cq->qtype) and whether this was a command
sent by the PF or a response from the queue.
This helps make the log output easier to understand and consume when
debugging.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_sq_done function is used to check the control queue head register
and determine whether or not the control queue processing is done. This
function is called in a loop checking against jiffies for a specified
timeout.
The pattern of reading a register in a loop until a condition is true or a
timeout is reached is a relatively common pattern. In fact, the kernel
provides a read_poll_timeout function implementing this behavior in
<linux/iopoll.h>
Use of read_poll_timeout is preferred over directly coding these loops.
However, using it in the ice driver is a bit more difficult because of the
rd32 wrapper. Implement a rd32_poll_timeout wrapper based on
read_poll_timeout.
Refactor ice_sq_done to use rd32_poll_timeout, replacing the loop calling
ice_sq_done in ice_sq_send_cmd. This simplifies the logic down to a single
ice_sq_done() call.
The implementation of rd32_poll_timeout uses microseconds for its timeout
value, so update the CQ timeout macros used to be specified in microseconds
units as well instead of using HZ for jiffies.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit 5fbf57a937 ("net: netlink: remove the cb_mutex "injection" from
netlink core") has removed the usage of the 'dump_cb_mutex' field from the
struct netlink_sock.
Remove the field itself now. It saves a few bytes in the structure.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When net devices propagate xdp configurations to slave devices,
we will need to perform a memory provider check to ensure we're
not binding xdp to a device using unreadable netmem.
Currently the ->ndo_bpf calls in a few places. Adding checks to all
these places would not be ideal.
Refactor all the ->ndo_bpf calls into one place where we can add this
check in the future.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li Zetao says:
====================
net: Delete some redundant judgments
This patchset aims to remove some unnecessary judgments and make the
code more concise. In some network modules, rtnl_set_sk_err is used to
record error information, but the err is repeatedly judged to be less
than 0 on the error path. Deleted these redundant judgments.
No functional change intended.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial value of err is -ENOMEM, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The newly introduced phy_link_topology tracks all ethernet PHYs that are
attached to a netdevice. Document the base principle, internal and
external APIs. As the phy_link_topology is expected to be extended, this
documentation will hold any further improvements and additions made
relative to topology handling.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
PSE and PD configuration is a PHY-specific command. Instead of targeting
the command towards dev->phydev, use the request to pick the targeted
PHY device.
As we don't get the PHY directly from the netdev's attached phydev, also
adjust the error messages.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PHY_GET command, supporting both DUMP and GET operations, is used to
retrieve the list of PHYs connected to a netdevice, and get topology
information to know where exactly it sits on the physical link.
Add the netlink specs corresponding to that command.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we have the ability to track the PHYs connected to a net_device
through the link_topology, we can expose this list to userspace. This
allows userspace to use these identifiers for phy-specific commands and
take the decision of which PHY to target by knowing the link topology.
Add PHY_GET and PHY_DUMP, which can be a filtered DUMP operation to list
devices on only one interface.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some netlink commands are target towards ethernet PHYs, to control some
of their features. As there's several such commands, add the ability to
pass a PHY index in the ethnl request, which will populate the generic
ethnl_req_info with the passed phy_index.
Add a helper that netlink command handlers need to use to grab the
targeted PHY from the req_info. This helper needs to hold rtnl_lock()
while interacting with the PHY, as it may be removed at any point.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Knowing the bus name is helpful when we want to expose the link topology
to userspace, add a helper to return the SFP bus name.
This call will always be made while holding the RTNL which ensures
that the SFP driver won't unbind from the device. The returned pointer
to the bus name will only be used while RTNL is held.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a few PHY drivers that can handle SFP modules through their
sfp_upstream_ops. Introduce Phylib helpers to keep track of connected
SFP PHYs in a netdevice's namespace, by adding the SFP PHY to the
upstream PHY's netdev's namespace.
By doing so, these SFP PHYs can be enumerated and exposed to users,
which will be able to use their capabilities.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Link topologies containing multiple network PHYs attached to the same
net_device can be found when using a PHY as a media converter for use
with an SFP connector, on which an SFP transceiver containing a PHY can
be used.
With the current model, the transceiver's PHY can't be used for
operations such as cable testing, timestamping, macsec offload, etc.
The reason being that most of the logic for these configuration, coming
from either ethtool netlink or ioctls tend to use netdev->phydev, which
in multi-phy systems will reference the PHY closest to the MAC.
Introduce a numbering scheme allowing to enumerate PHY devices that
belong to any netdev, which can in turn allow userspace to take more
precise decisions with regard to each PHY's configuration.
The numbering is maintained per-netdev, in a phy_device_list.
The numbering works similarly to a netdevice's ifindex, with
identifiers that are only recycled once INT_MAX has been reached.
This prevents races that could occur between PHY listing and SFP
transceiver removal/insertion.
The identifiers are assigned at phy_attach time, as the numbering
depends on the netdevice the phy is attached to. The PHY index can be
re-used for PHYs that are persistent.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
Unmask upper DSCP bits - part 1
tl;dr - This patchset starts to unmask the upper DSCP bits in the IPv4
flow key in preparation for allowing IPv4 FIB rules to match on DSCP.
No functional changes are expected.
The TOS field in the IPv4 flow key ('flowi4_tos') is used during FIB
lookup to match against the TOS selector in FIB rules and routes.
It is currently impossible for user space to configure FIB rules that
match on the DSCP value as the upper DSCP bits are either masked in the
various call sites that initialize the IPv4 flow key or along the path
to the FIB core.
In preparation for adding a DSCP selector to IPv4 and IPv6 FIB rules, we
need to make sure the entire DSCP value is present in the IPv4 flow key.
This patchset starts to unmask the upper DSCP bits in the various places
that invoke the core FIB lookup functions directly (patches #1-#7) and
in the input route path (patches #8-#12). Future patchsets will do the
same in the output route path.
No functional changes are expected as commit 1fa3314c14 ("ipv4:
Centralize TOS matching") moved the masking of the upper DSCP bits to
the core where 'flowi4_tos' is matched against the TOS selector.
====================
Link: https://patch.msgid.link/20240821125251.1571445-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Unmask the upper DSCP bits when performing source validation and routing
a packet using the same route from a previously processed packet (hint).
In the future, this will allow us to perform the FIB lookup that is
performed as part of source validation according to the full DSCP value.
No functional changes intended since the upper DSCP bits are masked when
comparing against the TOS selectors in FIB rules and routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-13-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Unmask the upper DSCP bits when performing source validation for
multicast packets during early demux. In the future, this will allow us
to perform the FIB lookup which is performed as part of source
validation according to the full DSCP value.
No functional changes intended since the upper DSCP bits are masked when
comparing against the TOS selectors in FIB rules and routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-12-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Align the ICMP code to other callers of ip_route_input() and pass the
full DS field. In the future this will allow us to perform a route
lookup according to the full DSCP value.
No functional changes intended since the upper DSCP bits are masked when
comparing against the TOS selectors in FIB rules and routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-11-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Unmask the upper DSCP bits when looking up an input route via the
RTM_GETROUTE netlink message so that in the future the lookup could be
performed according to the full DSCP value.
No functional changes intended since the upper DSCP bits are masked when
comparing against the TOS selectors in FIB rules and routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-10-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As explained in commit 35ebf65e85 ("ipv4: Create and use
fib_compute_spec_dst() helper."), the function is used - for example -
to determine the source address for an ICMP reply. If we are responding
to a multicast or broadcast packet, the source address is set to the
source address that we would use if we were to send a packet to the
unicast source of the original packet. This address is determined by
performing a FIB lookup and using the preferred source address of the
resulting route.
Unmask the upper DSCP bits of the DS field of the packet that triggered
the reply so that in the future the FIB lookup could be performed
according to the full DSCP value.
No functional changes intended since the upper DSCP bits are masked when
comparing against the TOS selectors in FIB rules and routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-8-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Unmask the upper DSCP bits when calling ipmr_fib_lookup() so that in the
future it could perform the FIB lookup according to the full DSCP value.
Note that ipmr_fib_lookup() performs a FIB rule lookup (returning the
relevant routing table) and that IPv4 multicast FIB rules do not support
matching on TOS / DSCP. However, it is still worth unmasking the upper
DSCP bits in case support for DSCP matching is ever added.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-7-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In a similar fashion to the iptables rpfilter match, unmask the upper
DSCP bits of the DS field of the currently tested packet so that in the
future the FIB lookup could be performed according to the full DSCP
value.
No functional changes intended since the upper DSCP bits are masked when
comparing against the TOS selectors in FIB rules and routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The rpfilter match performs a reverse path filter test on a packet by
performing a FIB lookup with the source and destination addresses
swapped.
Unmask the upper DSCP bits of the DS field of the tested packet so that
in the future the FIB lookup could be performed according to the full
DSCP value.
No functional changes intended since the upper DSCP bits are masked when
comparing against the TOS selectors in FIB rules and routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Record Route IP option records the addresses of the routers that
routed the packet. In the case of forwarded packets, the kernel performs
a route lookup via fib_lookup() and fills in the preferred source
address of the matched route.
Unmask the upper DSCP bits when performing the lookup so that in the
future the lookup could be performed according to the full DSCP value.
No functional changes intended since the upper DSCP bits are masked when
comparing against the TOS selectors in FIB rules and routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The NETLINK_FIB_LOOKUP netlink family can be used to perform a FIB
lookup according to user provided parameters and communicate the result
back to user space.
Unmask the upper DSCP bits of the user-provided DS field before invoking
the IPv4 FIB lookup API so that in the future the lookup could be
performed according to the full DSCP value.
No functional changes intended since the upper DSCP bits are masked when
comparing against the TOS selectors in FIB rules and routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The helper performs a FIB lookup according to the parameters in the
'params' argument, one of which is 'tos'. According to the test in
test_tc_neigh_fib.c, it seems that BPF programs are expected to
initialize the 'tos' field to the full 8 bit DS field from the IPv4
header.
Unmask the upper DSCP bits before invoking the IPv4 FIB lookup APIs so
that in the future the lookup could be performed according to the full
DSCP value.
No functional changes intended since the upper DSCP bits are masked when
comparing against the TOS selectors in FIB rules and routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Abhinav Jain says:
====================
Enhance network interface feature testing
This small series includes fixes for creation of veth pairs for
networkless kernels & adds tests for turning the different network
interface features on and off in selftests/net/netdevice.sh script.
Tested using vng and compiles for network as well as networkless kernel.
# selftests: net: netdevice.sh
# No valid network device found, creating veth pair
# PASS: veth0: set interface up
# PASS: veth0: set MAC address
# XFAIL: veth0: set IP address unsupported for veth*
# PASS: veth0: ethtool list features
# PASS: veth0: Turned off feature: rx-checksumming
# PASS: veth0: Turned on feature: rx-checksumming
# PASS: veth0: Restore feature rx-checksumming to initial state on
# Actual changes:
# tx-checksum-ip-generic: off
...
# PASS: veth0: Turned on feature: rx-udp-gro-forwarding
# PASS: veth0: Restore feature rx-udp-gro-forwarding to initial state off
# Cannot get register dump: Operation not supported
# XFAIL: veth0: ethtool dump not supported
# PASS: veth0: ethtool stats
# PASS: veth0: stop interface
====================
Link: https://patch.msgid.link/20240821171903.118324-1-jain.abhinav177@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>