Commit Graph

1297322 Commits

Author SHA1 Message Date
Yevgeny Kliteynik
aefc15a0fa net/mlx5: HWS, added modify header pattern and args handling
Packet headers/metadta manipulations are split into two parts:
 - Header Modify Pattern: an object that describes which fields
   will be modified and in which way
 - Header Modify Argument: an object that provides the values
   to be used for header modification

Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09 11:10:05 -07:00
Yevgeny Kliteynik
0869701cba net/mlx5: HWS, added FW commands handling
This patch adds implementation of FW object handling, such
as creation/destruction, modification, and querying.

Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09 11:10:05 -07:00
Yevgeny Kliteynik
472dd79234 net/mlx5: HWS, added matchers functionality
Matcher object encompasses all the building blocks that are
needed in order to perform flow steering of a given flow:
 - flow table that serves as entering point of this matcher
 - Rule Table Context (RTC) objects to hold ll the Steering
   Table Entries (STEs), both for matching the flow and for
   performing actions
 - rules that describe the set of matching parameters for a
   flow and actions to perform in case of a hit.
This patch adds implementation of matchers handling in HWS.

Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09 11:10:05 -07:00
Yevgeny Kliteynik
74a778b4a6 net/mlx5: HWS, added definers handling
The Match Definer combines packet fields and a mask,
creating a key which can be used for packet matching
during steering flow processing.
This patch adds handling of definer objects in HWS.

Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09 11:10:05 -07:00
Yevgeny Kliteynik
4967480354 net/mlx5: HWS, added rules handling
Steering rule is a concept that includes match parameters for a flow,
and actions to perform on the flows that match these parameters.
This patch adds rules handling part of HW Steering.

Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09 11:10:05 -07:00
Yevgeny Kliteynik
71a1372b82 net/mlx5: HWS, added tables handling
Flow tables are SW objects that are comprised of list of matchers,
that in turn define the properties of a flow to match on and set
of actions to perform on the flows in case of match hit or miss.

Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09 11:10:04 -07:00
Yevgeny Kliteynik
504e536d90 net/mlx5: HWS, added actions handling
When a packet matches a flow, the actions specified for the flow are
applied. The supported actions include (but not limited to) the following:
 - drop: packet processing is stopped
 - go to vport: packet is forwarded to a specified vport
 - go to flow table: packet is forwarded to a specified table
   and processing continues there
 - push/pop vlan: add/remove vlan header respectively to/from the packet
 - insert/remove header: add/remove a user-defined header to/from
   the packet
 - counter: count the packet bytes in the specified counter
 - tag: tag the matching flow with a provided tag value
 - reformat: change the packet format by adding or removing some
   of its headers
 - modify header: modify the value of the packet headers with
   set/add/copy ops
 - range: match packet on range of values

Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09 11:10:04 -07:00
Yevgeny Kliteynik
00b9f0daef net/mlx5: Added missing definitions in preparation for HW Steering
As part of preparation for HWS, added missing definitions
in qp.h and fs_core.h:

 - FS_FT_FDB_RX/TX table types that are used by HWS in addition
   to an existing FS_FT_FDB
 - MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE that is used by HWS to
   require fence in WQE

Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09 11:10:04 -07:00
Yevgeny Kliteynik
34c626c300 net/mlx5: Added missing mlx5_ifc definition for HW Steering
Add mlx5_ifc definitions that are required for HWS support.

Note that due to change in the mlx5_ifc_flow_table_context_bits
structure that now includes both SWS and HWS bits in a union,
this patch also includes small change in one of SWS files that
was required for compilation.

Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09 11:10:04 -07:00
David S. Miller
bfba7bc8b7 Merge branch 'unmask-dscp-part-four'
Ido Schimmel says:

====================
Unmask upper DSCP bits - part 4 (last)

tl;dr - This patchset finishes to unmask the upper DSCP bits in the IPv4
flow key in preparation for allowing IPv4 FIB rules to match on DSCP. No
functional changes are expected.

The TOS field in the IPv4 flow key ('flowi4_tos') is used during FIB
lookup to match against the TOS selector in FIB rules and routes.

It is currently impossible for user space to configure FIB rules that
match on the DSCP value as the upper DSCP bits are either masked in the
various call sites that initialize the IPv4 flow key or along the path
to the FIB core.

In preparation for adding a DSCP selector to IPv4 and IPv6 FIB rules, we
need to make sure the entire DSCP value is present in the IPv4 flow key.
This patchset finishes to unmask the upper DSCP bits by adjusting all
the callers of ip_route_output_key() to properly initialize the full
DSCP value in the IPv4 flow key.

No functional changes are expected as commit 1fa3314c14 ("ipv4:
Centralize TOS matching") moved the masking of the upper DSCP bits to
the core where 'flowi4_tos' is matched against the TOS selector.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:54 +01:00
Ido Schimmel
8b6d13cc8b sctp: Unmask upper DSCP bits in sctp_v4_get_dst()
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Note that the 'tos' variable holds the full DS field.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:53 +01:00
Ido Schimmel
2c60fc9ca2 ipv4: udp_tunnel: Unmask upper DSCP bits in udp_tunnel_dst_lookup()
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Note that callers of udp_tunnel_dst_lookup() pass the entire DS field in
the 'tos' argument.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:53 +01:00
Ido Schimmel
345663e6a7 netfilter: nf_dup4: Unmask upper DSCP bits in nf_dup_ipv4_route()
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:53 +01:00
Ido Schimmel
b7172768ab netfilter: nft_flow_offload: Unmask upper DSCP bits in nft_flow_route()
Unmask the upper DSCP bits when calling nf_route() which eventually
calls ip_route_output_key() so that in the future it could perform the
FIB lookup according to the full DSCP value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:53 +01:00
Ido Schimmel
4f0880766a ipv4: netfilter: Unmask upper DSCP bits in ip_route_me_harder()
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:53 +01:00
Ido Schimmel
c2b639f9f3 ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_xmit()
Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.

Note that the 'tos' variable includes the full DS field. Either the one
specified as part of the tunnel parameters or the one inherited from the
inner packet.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:53 +01:00
Ido Schimmel
c34cfe72bb ipv4: ip_tunnel: Unmask upper DSCP bits in ip_md_tunnel_xmit()
Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.

Note that the 'tos' variable includes the full DS field. Either the one
specified via the tunnel key or the one inherited from the inner packet.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:53 +01:00
Ido Schimmel
e7191e517a ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_bind_dev()
Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:53 +01:00
Ido Schimmel
848789d552 ipv4: icmp: Unmask upper DSCP bits in icmp_reply()
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:53 +01:00
Ido Schimmel
b3899830aa bpf: lwtunnel: Unmask upper DSCP bits in bpf_lwt_xmit_reroute()
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:52 +01:00
Ido Schimmel
25376a8901 ipv4: ip_gre: Unmask upper DSCP bits in ipgre_open()
Unmask the upper DSCP bits when calling ip_route_output_gre() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:52 +01:00
Ido Schimmel
1f23a1909d netfilter: br_netfilter: Unmask upper DSCP bits in br_nf_pre_routing_finish()
Unmask upper DSCP bits when calling ip_route_output() so that in the
future it could perform the FIB lookup according to the full DSCP value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 14:14:52 +01:00
Zijun Hu
8f08854199 net: sysfs: Fix weird usage of class's namespace relevant fields
Device class has two namespace relevant fields which are associated by
the following usage:

struct class {
	...
	const struct kobj_ns_type_operations *ns_type;
	const void *(*namespace)(const struct device *dev);
	...
}
if (dev->class && dev->class->ns_type)
	dev->class->namespace(dev);

The usage looks weird since it checks @ns_type but calls namespace()
it is found for all existing class definitions that the other filed is
also assigned once one is assigned in current kernel tree, so fix this
weird usage by checking @namespace to call namespace().

Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 10:30:52 +01:00
David S. Miller
4897313bdb Merge branch 'fs_enet-cleanup'
Maxime Chevallier says:

====================
net: ethernet: fs_enet: Cleanup and phylink conversion

This is V3 of a series that cleans-up fs_enet, with the ultimate goal of
converting it to phylink (patch 8).

The main changes compared to V2 are :
 - Reviewed-by tags from Andrew were gathered
 - Patch 5 now includes the removal of now unused includes, thanks
   Andrew for spotting this
 - Patch 4 is new, it reworks the adjust_link to move the spinlock
   acquisition to a more suitable location. Although this dissapears in
   the actual phylink port, it makes the phylink conversion clearer on
   that point
 - Patch 8 includes fixes in the tx_timeout cancellation, to prevent
   taking rtnl twice when canceling a pending tx_timeout. Thanks Jakub
   for spotting this.

Link to V2: https://lore.kernel.org/netdev/20240829161531.610874-1-maxime.chevallier@bootlin.com/
Link to V1: https://lore.kernel.org/netdev/20240828095103.132625-1-maxime.chevallier@bootlin.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 10:29:05 +01:00
Maxime Chevallier
41f5fbffd1 net: ethernet: fs_enet: phylink conversion
fs_enet is a quite old but still used Ethernet driver found on some NXP
devices. It has support for 10/100 Mbps ethernet, with half and full
duplex. Some variants of it can use RMII, while other integrations are
MII-only.

Add phylink support, thus removing custom fixed-link hanldling.

This also allows removing some internal flags such as the use_rmii flag.

Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 10:29:04 +01:00
Maxime Chevallier
c614acf6e8 net: ethernet: fs_enet: simplify clock handling with devm accessors
devm_clock_get_enabled() can be used to simplify clock handling for the
PER register clock.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 10:29:04 +01:00
Maxime Chevallier
21c6321459 net: ethernet: fs_enet: use macros for speed and duplex values
The PHY speed and duplex should be manipulated using the SPEED_XXX and
DUPLEX_XXX macros available. Use it in the fcc, fec and scc MAC for
fs_enet.

Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 10:29:04 +01:00
Maxime Chevallier
6b576b2d44 net: ethernet: fs_enet: drop unused phy_info and mii_if_info
There's no user of the struct phy_info, the 'phy' field and the
mii_if_info in the fs_enet driver, probably dating back when phylib
wasn't as widely used.  Drop these from the driver code.

As the definition for struct mii_if_info is no longer required, drop the
include for linux/mii.h altogether in the driver.

Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 10:29:04 +01:00
Maxime Chevallier
aa3672be73 net: ethernet: fs_enet: only protect the .restart() call in .adjust_link
When .adjust_link() gets called, it runs in thread context, with the
phydev->lock held. We only need to protect the fep->fecp/fccp/sccp
register that are accessed within the .restart() function from
concurrent access from the interrupts.

These registers are being protected by the fep->lock spinlock, so we can
move the spinlock protection around the .restart() call instead of the
entire adjust_link() call. By doing so, we can simplify further the
.adjust_link() callback and avoid the intermediate helper.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 10:29:04 +01:00
Maxime Chevallier
96bf0c4e9f net: ethernet: fs_enet: drop the .adjust_link custom fs_ops
There's no in-tree user for the fs_ops .adjust_link() function, so we
can always use the generic one in fe_enet-main.

Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 10:29:04 +01:00
Maxime Chevallier
2b29ac68e7 net: ethernet: fs_enet: cosmetic cleanups
Due to the age of the driver and the slow recent activity on it, the code
has taken some layers of dust. Clean the main driver file up so that it
passes checkpatch and also conforms with the net coding style.

Changes include :
 - Re-ordering of the variable declarations for RCT
 - Fixing the comment styles to either one-line comments, or net-style
   comments
 - Adding braces around single-statement 'else' clauses
 - Aligning function/macro parameters on the opening parenthesis
 - Simplifying checks for NULL pointers
 - Splitting cascaded assignments into individual assignments
 - Fixing some typos
 - Fixing whitespace issues

This is a cosmetic change and doesn't introduce any change in behaviour.

Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 10:29:04 +01:00
Maxime Chevallier
30ba6d2f34 net: ethernet: fs_enet: convert to SPDX
The ENET driver has SPDX tags in the header files, but they were missing
in the C files. Change the licence information to SPDX format.

Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09 10:29:04 +01:00
Mahesh Bandewar
c259acab83 ptp/ioctl: support MONOTONIC{,_RAW} timestamps for PTP_SYS_OFFSET_EXTENDED
The ability to read the PHC (Physical Hardware Clock) alongside
multiple system clocks is currently dependent on the specific
hardware architecture. This limitation restricts the use of
PTP_SYS_OFFSET_PRECISE to certain hardware configurations.

The generic soultion which would work across all architectures
is to read the PHC along with the latency to perform PHC-read as
offered by PTP_SYS_OFFSET_EXTENDED which provides pre and post
timestamps.  However, these timestamps are currently limited
to the CLOCK_REALTIME timebase. Since CLOCK_REALTIME is affected
by NTP (or similar time synchronization services), it can
experience significant jumps forward or backward. This hinders
the precise latency measurements that PTP_SYS_OFFSET_EXTENDED
is designed to provide.

This problem could be addressed by supporting MONOTONIC_RAW
timestamps within PTP_SYS_OFFSET_EXTENDED. Unlike CLOCK_REALTIME
or CLOCK_MONOTONIC, the MONOTONIC_RAW timebase is unaffected
by NTP adjustments.

This enhancement can be implemented by utilizing one of the three
reserved words within the PTP_SYS_OFFSET_EXTENDED struct to pass
the clock-id for timestamps.  The current behavior aligns with
clock-id for CLOCK_REALTIME timebase (value of 0), ensuring
backward compatibility of the UAPI.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-08 18:40:33 +01:00
Dmitry Antipov
d5c4546062 net: sched: consistently use rcu_replace_pointer() in taprio_change()
According to Vinicius (and carefully looking through the whole
https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa
once again), txtime branch of 'taprio_change()' is not going to
race against 'advance_sched()'. But using 'rcu_replace_pointer()'
in the former may be a good idea as well.

Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-08 11:18:57 +01:00
Jakub Kicinski
f723224742 Merge tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

Patch #1 adds ctnetlink support for kernel side filtering for
	 deletions, from Changliang Wu.

Patch #2 updates nft_counter support to Use u64_stats_t,
	 from Sebastian Andrzej Siewior.

Patch #3 uses kmemdup_array() in all xtables frontends,
	 from Yan Zhen.

Patch #4 is a oneliner to use ERR_CAST() in nf_conntrack instead
	 opencoded casting, from Shen Lichuan.

Patch #5 removes unused argument in nftables .validate interface,
	 from Florian Westphal.

Patch #6 is a oneliner to correct a typo in nftables kdoc,
	 from Simon Horman.

Patch #7 fixes missing kdoc in nftables, also from Simon.

Patch #8 updates nftables to handle timeout less than CONFIG_HZ.

Patch #9 rejects element expiration if timeout is zero,
	 otherwise it is silently ignored.

Patch #10 disallows element expiration larger than timeout.

Patch #11 removes unnecessary READ_ONCE annotation while mutex is held.

Patch #12 adds missing READ_ONCE/WRITE_ONCE annotation in dynset.

Patch #13 annotates data-races around element expiration.

Patch #14 allocates timeout and expiration in one single set element
	  extension, they are tighly couple, no reason to keep them
	  separated anymore.

Patch #15 updates nftables to interpret zero timeout element as never
	  times out. Note that it is already possible to declare sets
	  with elements that never time out but this generalizes to all
	  kind of set with timeouts.

Patch #16 supports for element timeout and expiration updates.

* tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: set element timeout update support
  netfilter: nf_tables: zero timeout means element never times out
  netfilter: nf_tables: consolidate timeout extension for elements
  netfilter: nf_tables: annotate data-races around element expiration
  netfilter: nft_dynset: annotate data-races around set timeout
  netfilter: nf_tables: remove annotation to access set timeout while holding lock
  netfilter: nf_tables: reject expiration higher than timeout
  netfilter: nf_tables: reject element expiration with no timeout
  netfilter: nf_tables: elements with timeout below CONFIG_HZ never expire
  netfilter: nf_tables: Add missing Kernel doc
  netfilter: nf_tables: Correct spelling in nf_tables.h
  netfilter: nf_tables: drop unused 3rd argument from validate callback ops
  netfilter: conntrack: Convert to use ERR_CAST()
  netfilter: Use kmemdup_array instead of kmemdup for multiple allocation
  netfilter: nft_counter: Use u64_stats_t for statistic.
  netfilter: ctnetlink: support CTA_FILTER for flush
====================

Link: https://patch.msgid.link/20240905232920.5481-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:39:31 -07:00
Vadim Fedorenko
aa05fe67bc ptp: ocp: Improve PCIe delay estimation
The PCIe bus can be pretty busy during boot and probe function can
see excessive delays. Let's find the minimal value out of several
tests and use it as estimated value.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20240905140028.560454-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:28:07 -07:00
Eric Dumazet
9a95eedc81 netpoll: remove netpoll_srcu
netpoll_srcu is currently used from netpoll_poll_disable() and
__netpoll_cleanup()

Both functions run under RTNL, using netpoll_srcu adds confusion
and no additional protection.

Moreover the synchronize_srcu() call in __netpoll_cleanup() is
performed before clearing np->dev->npinfo, which violates RCU rules.

After this patch, netpoll_poll_disable() and netpoll_poll_enable()
simply use rtnl_dereference().

This saves a big chunk of memory (more than 192KB on platforms
with 512 cpus)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20240905084909.2082486-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:24:59 -07:00
Jakub Kicinski
c6a1739778 Merge branch 'octeontx2-address-some-warnings'
Simon Horman says:

====================
octeontx2: Address some warnings

This patchset addresses some warnings flagged by Sparse, gcc-14, and
clang-18 in files touched by recent patch submissions.

Although these changes do not alter the functionality of the code, by
addressing them real problems introduced in future which are flagged by
Sparse will stand out more readily.

v1: https://lore.kernel.org/20240903-octeontx2-sparse-v1-0-f190309ecb0a@kernel.org
====================

Link: https://patch.msgid.link/20240904-octeontx2-sparse-v2-0-14f2305fe4b2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:23:52 -07:00
Simon Horman
7baa90c616 octeontx2-pf: Make iplen __be16 in otx2_sqe_add_ext()
In otx2_sqe_add_ext() iplen is used to hold a 16-bit big-endian value,
but it's type is u16, indicating a host byte order integer.

Address this mismatch by changing the type of iplen to __be16.

Flagged by Sparse as:

.../otx2_txrx.c:699:31: warning: incorrect type in assignment (different base types)
.../otx2_txrx.c:699:31:    expected unsigned short [usertype] iplen
.../otx2_txrx.c:699:31:    got restricted __be16 [usertype]
.../otx2_txrx.c:701:54: warning: incorrect type in assignment (different base types)
.../otx2_txrx.c:701:54:    expected restricted __be16 [usertype] tot_len
.../otx2_txrx.c:701:54:    got unsigned short [usertype] iplen
.../otx2_txrx.c:704:60: warning: incorrect type in assignment (different base types)
.../otx2_txrx.c:704:60:    expected restricted __be16 [usertype] payload_len
.../otx2_txrx.c:704:60:    got unsigned short [usertype] iplen

Introduced in
commit dc1a9bf2c8 ("octeontx2-pf: Add UDP segmentation offload support")

No functional change intended.
Compile tested only by author.

Tested-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240904-octeontx2-sparse-v2-2-14f2305fe4b2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:23:50 -07:00
Simon Horman
92218f108f octeontx2-af: Pass string literal as format argument of alloc_workqueue()
Recently I noticed that both gcc-14 and clang-18 report that passing
a non-string literal as the format argument of alloc_workqueue()
is potentially insecure.

E.g. clang-18 says:

.../rvu.c:2493:32: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
 2493 |         mw->mbox_wq = alloc_workqueue(name,
      |                                       ^~~~
.../rvu.c:2493:32: note: treat the string as an argument to avoid this
 2493 |         mw->mbox_wq = alloc_workqueue(name,
      |                                       ^
      |                                       "%s",

It is always the case where the contents of name is safe to pass as the
format argument. That is, in my understanding, it never contains any
format escape sequences.

But, it seems better to be safe than sorry. And, as a bonus, compiler
output becomes less verbose by addressing this issue as suggested by
clang-18.

Compile tested only by author.

Tested-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240904-octeontx2-sparse-v2-1-14f2305fe4b2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:23:49 -07:00
Rosen Penev
cca0d69baf net: phy: qca83xx: use PHY_ID_MATCH_EXACT
No need for the mask when there's already a macro for this.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240904205659.7470-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:23:08 -07:00
Edward Cree
32b81e4f0e sfc: siena: rip out rss-context dead code
Siena hardware does not support custom RSS contexts, but when the
 driver was forked from sfc.ko, some of the plumbing for them was
 copied across from the common code.  Actually trying to use them
 would lead to EOPNOTSUPP as the relevant efx_nic_type methods were
 not populated.
Remove this dead code from the Siena driver.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240904181156.1993666-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:22:17 -07:00
Jakub Kicinski
5953711258 Merge branch 'use-functionality-of-irq_get_trigger_type'
Vasileios Amoiridis says:

====================
Use functionality of irq_get_trigger_type()

v1: https://lore.kernel.org/20240902225534.130383-1-vassilisamir@gmail.com
====================

Link: https://patch.msgid.link/20240904151018.71967-1-vassilisamir@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:21:47 -07:00
Vasileios Amoiridis
f4bbf496f5 net: smc91x: Make use of irq_get_trigger_type()
Convert irqd_get_trigger_type(irq_get_irq_data(irq)) cases to the more
simple irq_get_trigger_type(irq).

Signed-off-by: Vasileios Amoiridis <vassilisamir@gmail.com>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Link: https://patch.msgid.link/20240904151018.71967-4-vassilisamir@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:21:43 -07:00
Vasileios Amoiridis
36a5faec56 net: dsa: realtek: rtl8366rb: Make use of irq_get_trigger_type()
Convert irqd_get_trigger_type(irq_get_irq_data(irq)) cases to the more
simple irq_get_trigger_type(irq).

Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Vasileios Amoiridis <vassilisamir@gmail.com>
Link: https://patch.msgid.link/20240904151018.71967-3-vassilisamir@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:21:43 -07:00
Vasileios Amoiridis
517c292475 net: dsa: realtek: rtl8365mb: Make use of irq_get_trigger_type()
Convert irqd_get_trigger_type(irq_get_irq_data(irq)) cases to the more
simple irq_get_trigger_type(irq).

Signed-off-by: Vasileios Amoiridis <vassilisamir@gmail.com>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Link: https://patch.msgid.link/20240904151018.71967-2-vassilisamir@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:21:43 -07:00
Sascha Hauer
54001d0f2f net: tls: wait for async completion on last message
When asynchronous encryption is used KTLS sends out the final data at
proto->close time. This becomes problematic when the task calling
close() receives a signal. In this case it can happen that
tcp_sendmsg_locked() called at close time returns -ERESTARTSYS and the
final data is not sent.

The described situation happens when KTLS is used in conjunction with
io_uring, as io_uring uses task_work_add() to add work to the current
userspace task. A discussion of the problem along with a reproducer can
be found in [1] and [2]

Fix this by waiting for the asynchronous encryption to be completed on
the final message. With this there is no data left to be sent at close
time.

[1] https://lore.kernel.org/all/20231010141932.GD3114228@pengutronix.de/
[2] https://lore.kernel.org/all/20240315100159.3898944-1-s.hauer@pengutronix.de/

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://patch.msgid.link/20240904-ktls-wait-async-v1-1-a62892833110@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:20:55 -07:00
Jakub Kicinski
7ab50d5e88 Merge branch 'make-use-of-the-helper-macro-list_head'
Hongbo Li says:

====================
make use of the helper macro LIST_HEAD()

The macro LIST_HEAD() declares a list variable and
initializes it, which can be used to simplify the steps
of list initialization, thereby simplifying the code.
These serials just do some equivalatent substitutions,
and with no functional modifications.
====================

Link: https://patch.msgid.link/20240904093243.3345012-1-lihongbo22@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:10:26 -07:00
Hongbo Li
17f0139190 net/core: make use of the helper macro LIST_HEAD()
list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD(). Here we can simplify
the code.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://patch.msgid.link/20240904093243.3345012-6-lihongbo22@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:10:22 -07:00
Hongbo Li
2a7dd251b6 net/ipv6: make use of the helper macro LIST_HEAD()
list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD(). Here we can simplify
the code.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://patch.msgid.link/20240904093243.3345012-5-lihongbo22@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-06 18:10:21 -07:00