If the CPU bandwidth capacity permits, it may be useful to mirror the
entire ingress of a user port to software.
This is in fact possible to express even if there is no net_device
representation for the CPU port. In fact, that approach was already
exhausted and that representation wouldn't have even helped [1].
The idea behind implementing this is that currently, we refuse to
offload any mirroring towards a non-DSA target net_device. But if we
acknowledge the fact that to reach any foreign net_device, the switch
must send the packet to the CPU anyway, then we can simply offload just
that part, and let the software do the rest. There is only one condition
we need to uphold: the filter needs to be present in the software data
path as well (no skip_sw).
There are 2 actions to consider: FLOW_ACTION_MIRRED (redirect to egress
of target interface) and FLOW_ACTION_MIRRED_INGRESS (redirect to ingress
of target interface). We don't have the ability/API to offload
FLOW_ACTION_MIRRED_INGRESS when the target port is also a DSA user port,
but we could also permit that through mirred to the CPU + software.
Example:
$ ip link add dummy0 type dummy; ip link set dummy0 up
$ tc qdisc add dev swp0 clsact
$ tc filter add dev swp0 ingress matchall action mirred ingress mirror dev dummy0
Any DSA driver with a ds->ops->port_mirror_add() implementation can now
make use of this with no additional change.
[1] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The body is a bit hard to read, hard to extend, and has duplicated
conditions.
Clean up the "if (many conditions) else if (many conditions, some
of them repeated)" pattern by:
- Moving the repeated conditions out
- Replacing the repeated tests for the same variable with a switch/case
- Moving the protocol check inside the dsa_user_add_cls_matchall_mirred()
function call.
This is pure refactoring, no logic has been changed, though some tests
were reordered. The order does not matter - they are independent things
to be tested for.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Background: switchdev ports offload the Linux bridge, and most of the
packets they handle will never see the CPU. The ports between which
there exists no hardware data path are considered 'foreign' to switchdev.
These can either be normal physical NICs without switchdev offload, or
incompatible switchdev ports, or virtual interfaces like veth/dummy/etc.
In some cases, an offloaded filter can only do half the work, and the
rest must be handled by software. Redirecting/mirroring from the ingress
of a switchdev port towards a foreign interface is one example of
combined hardware/software data path. The most that the switchdev port
can do is to extract the matching packets from its offloaded data path
and send them to the CPU. From there on, the software filter runs
(a second time, after the first run in hardware) on the packet and
performs the mirred action.
It makes sense for switchdev drivers which allow this kind of "half
offloading" to sense the "skip_sw" flag of the filter/action pair, and
deny attempts from the user to install a filter that does not run in
software, because that simply won't work.
In fact, a mirred action on a switchdev port towards a dummy interface
appears to be a valid way of (selectively) monitoring offloaded traffic
that flows through it. IFF_PROMISC was also discussed years ago, but
(despite initial disagreement) there seems to be consensus that this
flag should not affect the destination taken by packets, but merely
whether or not the NIC discards packets with unknown MAC DA for local
processing.
[1] https://lore.kernel.org/netdev/20190830092637.7f83d162@ceranb/
[2] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/netdev/ZxUo0Dc0M5Y6l9qF@shredder.mtl.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20241023135251.1752488-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sven Schnelle says:
====================
PtP driver for s390 clocks
these patches add support for using the s390 physical and TOD clock as ptp
clock. To do so, the first patch adds a clock id to the s390 TOD clock,
while the second patch adds the PtP driver itself.
====================
Link: https://patch.msgid.link/20241023065601.449586-1-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a small PtP driver which allows user space to get
the values of the physical and tod clock. This allows
programs like chrony to use STP as clock source and
steer the kernel clock. The physical clock can be used
as a debugging aid to get the clock without any additional
offsets like STP steering or LPAR offset.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://patch.msgid.link/20241023065601.449586-3-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The HSR test, hsr_ping.sh, actually needs 7 min to run. Around 375s to
be exact, and even more on a debug kernel or kernel with other network
security limits. The timeout setting for the kselftest is currently 45
seconds, which is way too short to integrate hsr tests to run_kselftest
infrastructure. However, timeout of hundreds of seconds is quite a long
time, especially in a CI/CD environment. It seems that we need
accelerate the test and balance with timeout setting.
The most time-consuming func is do_ping_long, where ping command sends
10 packages to the given address. The default interval between two ping
packages is 1s according to the ping Mannual. There isn't any operation
between pings thus we could pass -i 0.1 to ping to make it 10 times
faster.
While even with this short interval, the test still need about 46.4
seconds to finish because of the two HSR interfaces, each of which is
tested by calling do_ping func 12 times and do_ping_long func 19 times
and sleep for 3s.
So, an explicit setting is also needed to slightly increase the
timeout. And to leave us some slack, use 50 as default timeout.
Signed-off-by: Yunshui Jiang <jiangyunshui@kylinos.cn>
Link: https://patch.msgid.link/20241028082757.2945232-1-jiangyunshui@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing says:
====================
tcp: add tcp_warn_once() common helper
Paolo Abeni suggested we can introduce a new helper to cover more cases
in the future for better debug.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalle Valo says:
====================
wireless-next patches for v6.13
The first -next "new features" pull request for v6.13. This is a big
one as we have not been able to send one earlier. We have also some
patches affecting other subsystems: in staging we deleted the rtl8192e
driver and in debugfs added a new interface to save struct
file_operations memory; both were acked by GregKH.
Because of the lib80211/libipw move there were quite a lot of
conflicts and to solve those we decided to merge net-next into
wireless-next.
Major changes:
cfg80211/mac80211
* stop exporting wext symbols
* new mac80211 op to indicate that a new interface is to be added
* support radio separation of multi-band devices
Wireless Extensions
* move wext spy implementation to libiw
* remove iw_public_data from struct net_device
brcmfmac
* optional LPO clock support
ipw2x00
* move remaining lib80211 code into libiw
wilc1000
* WILC3000 support
rtw89
* RTL8852BE and RTL8852BE-VT BT-coexistence improvements
* tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (126 commits)
mac80211: Remove NOP call to ieee80211_hw_config
wifi: iwlwifi: work around -Wenum-compare-conditional warning
wifi: mac80211: re-order assigning channel in activate links
wifi: mac80211: convert debugfs files to short fops
debugfs: add small file operations for most files
wifi: mac80211: remove misleading j_0 construction parts
wifi: mac80211_hwsim: use hrtimer_active()
wifi: mac80211: refactor BW limitation check for CSA parsing
wifi: mac80211: filter on monitor interfaces based on configured channel
wifi: mac80211: refactor ieee80211_rx_monitor
wifi: mac80211: add support for the monitor SKIP_TX flag
wifi: cfg80211: add monitor SKIP_TX flag
wifi: mac80211: add flag to opt out of virtual monitor support
wifi: cfg80211: pass net_device to .set_monitor_channel
wifi: mac80211: remove status->ampdu_delimiter_crc
wifi: cfg80211: report per wiphy radio antenna mask
wifi: mac80211: use vif radio mask to limit creating chanctx
wifi: mac80211: use vif radio mask to limit ibss scan frequencies
wifi: cfg80211: add option for vif allowed radios
wifi: iwlwifi: allow IWL_FW_CHECK() with just a string
...
====================
Link: https://patch.msgid.link/20241025170705.5F6B2C4CEC3@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I suggested to put DEBUG_NET_WARN_ON_ONCE() in __sock_create() to
catch possible use-after-free.
But the warning itself was not useful because our interest is in
the callee than the caller.
Let's define DEBUG_NET_WARN_ONCE() and print the name of pf->create()
and the socket identifier.
While at it, we enclose DEBUG_NET_WARN_ON_ONCE() in parentheses too
to avoid a checkpatch error.
Note that %pf or %pF were obsoleted and will be removed later as per
comment in lib/vsprintf.c.
Link: https://lore.kernel.org/netdev/202410231427.633734b3-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241024201458.49412-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King says:
====================
net: phylink: simplify SFP PHY attachment
These two patches simplify how we attach SFP PHYs.
The first patch notices that at the two sites where we call
sfp_select_interface(), if that fails, we always print the same error.
Move this into its own function.
The second patch adds an additional level of validation, checking that
the returned interface is one that is supported by the MAC/PCS.
The last patch simplifies how SFP PHYs are attached, reducing the
number of times that we do validation in this path.
====================
Link: https://patch.msgid.link/Zxj8_clRmDA_G7uH@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are a few issues with how SFP PHYs are attached:
a) The phylink_sfp_connect_phy() and phylink_sfp_config_phy() code
validates the configuration three times:
1. To discover the support/advertising masks that the PHY/PCS/MAC
can support in order to select an interface.
2. To validate the selected interface.
3. When the PHY is brought up after being attached, another validation
is done.
This is needlessly complex.
b) The configuration is set prior to the PHY being attached, which
means we don't have the PHY available in phylink_major_config()
for phylink_pcs_neg_mode() to make decisions upon.
We have already added an extra step to validate the selected interface,
so we can now move the attachment and bringup of the PHY earlier,
inside phylink_sfp_config_phy(). This results in the validation at
step 2 above becoming entirely unnecessary, so remove that too.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1t3bcb-000c8H-3e@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
phylink_parse_fixedlink() wants to preserve the pause, asym_pause and
autoneg bits in pl->supported. Rather than reading the bits into
separate bools, zeroing pl->supported, and then setting them if they
were previously set, use a mask and linkmode_and() to achieve the same
result.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/E1t3Fh5-000aQi-Nk@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tariq Toukan says:
====================
mlx5e update features on config changes
This small patchset by Dragos adds a call to netdev_update_features()
in configuration changes that could impact the features status.
====================
Link: https://patch.msgid.link/20241024164134.299646-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the ring size changes successfully, trigger
netdev_update_features() to enable features in wanted state if
applicable.
An example of such scenario:
$ ip link set dev eth1 up
$ ethtool --set-ring eth1 rx 8192
$ ip link set dev eth1 mtu 9000
$ ethtool --features eth1 rx-gro-hw on --> fails
$ ethtool --set-ring eth1 rx 1024
With this patch, HW GRO will be turned on automatically because
it is set in the device's wanted_features.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241024164134.299646-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the MTU changes successfully, trigger netdev_update_features() to
enable features in wanted state if applicable.
An example of such scenario:
$ ip link set dev eth1 up
$ ethtool --set-ring eth1 rx 8192
$ ip link set dev eth1 mtu 9000
$ ethtool --features eth1 rx-gro-hw on --> fails
$ ip link set dev eth1 mtu 7000
With this patch, HW GRO will be turned on automatically because
it is set in the device's wanted_features.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241024164134.299646-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Both gcc-14 and clang-18 report that passing a non-string literal as the
format argument of dev_set_name() is potentially insecure.
E.g. clang-18 says:
drivers/net/wwan/wwan_core.c:442:34: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
442 | return dev_set_name(&port->dev, buf);
| ^~~
drivers/net/wwan/wwan_core.c:442:34: note: treat the string as an argument to avoid this
442 | return dev_set_name(&port->dev, buf);
| ^
| "%s",
It is always the case where the contents of mod is safe to pass as the
format argument. That is, in my understanding, it never contains any
format escape sequences.
But, it seems better to be safe than sorry. And, as a bonus, compiler
output becomes less verbose by addressing this issue as suggested by
clang-18.
Compile tested only.
No functional change intended.
Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Link: https://patch.msgid.link/20241023-wwan-fmt-v1-1-521b39968639@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guillaume Nault says:
====================
ipv4: Prepare core ipv4 files to future .flowi4_tos conversion.
Continue preparing users of ->flowi4_tos (struct flowi4) to the future
conversion of this field (from __u8 to dscp_t). The objective is to
have type annotation to properly separate DSCP bits from ECN ones. This
way we'll ensure that ECN doesn't interfere with DSCP and avoid
regressions where it break routing descisions (fib rules in particular).
This series concentrates on some easy IPv4 conversions where
->flowi4_tos is set directly from an IPv4 header, so we can get the
DSCP value using the ip4h_dscp() helper function.
====================
Link: https://patch.msgid.link/cover.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev says:
====================
ibm: emac: more cleanups
Tested on Cisco MX60W.
v2: fixed build errors. Also added extra commits to clean the driver up
further.
v3: Added tested message. Removed bad alloc_netdev_dummy commit.
v4: removed modules changes from patchset. Added fix for if MAC not
found.
v5: added of_find_matching_node commit.
v6: resend after net-next merge.
v7: removed of_find_matching_node commit. Adjusted mutex_init patch.
v8: removed patch removing custom init/exit. Needs more work.
====================
Link: https://patch.msgid.link/20241022002245.843242-1-rosenp@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>