The NETLINK_FIB_LOOKUP netlink family can be used to perform a FIB
lookup according to user provided parameters and communicate the result
back to user space.
However, unlike other users of the FIB lookup API, the upper DSCP bits
and the ECN bits of the DS field are not masked, which can result in the
wrong result being returned.
Solve this by masking the upper DSCP bits and the ECN bits using
IPTOS_RT_MASK.
The structure that communicates the request and the response is not
exported to user space, so it is unlikely that this netlink family is
actually in use [1].
[1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wen Gu says:
====================
net/smc: introduce ringbufs usage statistics
Currently, we have histograms that show the sizes of ringbufs that ever
used by SMC connections. However, they are always incremental and since
SMC allows the reuse of ringbufs, we cannot know the actual amount of
ringbufs being allocated or actively used.
So this patch set introduces statistics for the amount of ringbufs that
actually allocated by link group and actively used by connections of a
certain net namespace, so that we can react based on these memory usage
information, e.g. active fallback to TCP.
With appropriate adaptations of smc-tools, we can obtain these ringbufs
usage information:
$ smcr -d linkgroup
LG-ID : 00000500
LG-Role : SERV
LG-Type : ASYML
VLAN : 0
PNET-ID :
Version : 1
Conns : 0
Sndbuf : 12910592 B <-
RMB : 12910592 B <-
or
$ smcr -d stats
[...]
RX Stats
Data transmitted (Bytes) 869225943 (869.2M)
Total requests 18494479
Buffer usage (Bytes) 12910592 (12.31M) <-
[...]
TX Stats
Data transmitted (Bytes) 12760884405 (12.76G)
Total requests 36988338
Buffer usage (Bytes) 12910592 (12.31M) <-
[...]
[...]
Change log:
v3->v2
- use new helper nla_put_uint() instead of nla_put_u64_64bit().
v2->v1
https://lore.kernel.org/r/20240807075939.57882-1-guwen@linux.alibaba.com/
- remove inline keyword in .c files.
- use local variable in macros to avoid potential side effects.
v1
https://lore.kernel.org/r/20240805090551.80786-1-guwen@linux.alibaba.com/
====================
Link: https://patch.msgid.link/20240814130827.73321-1-guwen@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The buffer size histograms in smc_stats, namely rx/tx_rmbsize, record
the sizes of ringbufs for all connections that have ever appeared in
the net namespace. They are incremental and we cannot know the actual
ringbufs usage from these. So here introduces statistics for current
ringbufs usage of existing smc connections in the net namespace into
smc_stats, it will be incremented when new connection uses a ringbuf
and decremented when the ringbuf is unused.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently we have the statistics on sndbuf/RMB sizes of all connections
that have ever been on the link group, namely smc_stats_memsize. However
these statistics are incremental and since the ringbufs of link group
are allowed to be reused, we cannot know the actual allocated buffers
through these. So here introduces the statistic on actual allocated
ringbufs of the link group, it will be incremented when a new ringbuf is
added into buf_list and decremented when it is deleted from buf_list.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
fsl,enetc-ptp is embedded pcie device. Add compatible string pci1957,ee02.
Fix warning:
arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-kbox-a-230-ls.dtb: ethernet@0,4:
compatible:0: 'pci1957,ee02' is not one of ['fsl,etsec-ptp', 'fsl,fman-ptp-timer', 'fsl,dpaa2-ptp', 'fsl,enetc-ptp']
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnx2x_get_vf_config():
* The vlan field of ivi is a 32-bit integer, it is used to store a vlan ID.
* The vlan field of bulletin is a 16-bit integer, it is also used to store
a vlan ID.
In the current code, ivi->vlan is set using memset. But in the case of
setting it to the value of bulletin->vlan, this involves reading
32 bits from a 16bit source. This is likely safe, as the following
6 bytes are padding in the same structure, but none the less, it seems
undesirable.
However, it is entirely unclear to me how this scheme works on
big-endian systems.
Resolve this by simply assigning integer values to ivi->vlan.
Flagged by W=1 builds.
f.e. gcc-14 reports:
In function 'fortify_memcpy_chk',
inlined from 'bnx2x_get_vf_config' at .../bnx2x_sriov.c:2655:4:
.../fortify-string.h:580:25: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
580 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240815-bnx2x-int-vlan-v1-1-5940b76e37ad@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mpls_xmit() needs to prepend the MPLS-labels to the packet. That implies
one needs to make sure there is enough space for it in the headers.
Calling skb_cow() implies however that one wants to change even the
playload part of the packet (which is not true for MPLS). Thus, call
skb_cow_head() instead, which is what other tunnelling protocols do.
Running a server with this comm it entirely removed the calls to
pskb_expand_head() from the callstack in mpls_xmit() thus having
significant CPU-reduction, especially at peak times.
Cc: Roopa Prabhu <roopa@nvidia.com>
Reported-by: Craig Taylor <cmtaylor@apple.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240815161201.22021-1-cpaasch@apple.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove unnecessary NULL check before freeing using kvfree().
This function will ignore a NULL argument.
Flagged by Coccinelle:
.../txgbe_hw.c:187:2-8: WARNING: NULL check before some freeing functions is not needed.
No functional change intended.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240815-txgbe-kvfree-v1-1-5ecf8656f555@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change mdio.yaml nodename match pattern to
'^mdio(-(bus|external))?(@.+|-([0-9]+))$'
Fix mdio.yaml wrong parser mdio controller's address instead phy's address
when mdio-mux exista.
For example:
mdio-mux-emi1@54 {
compatible = "mdio-mux-mmioreg", "mdio-mux";
mdio@20 {
reg = <0x20>;
^^^ This is mdio controller register
ethernet-phy@2 {
reg = <0x2>;
^^^ This phy's address
};
};
};
Only phy's address is limited to 31 because MDIO bus definition.
But CHECK_DTBS report below warning:
arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: mdio-mux-emi1@54:
mdio@20:reg:0:0: 32 is greater than the maximum of 31
The reason is that "mdio-mux-emi1@54" match "nodename: '^mdio(@.*)?'" in
mdio.yaml.
Change to '^mdio(-(bus|external))?(@.+|-([0-9]+))?$' to avoid wrong match
mdio mux controller's node.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20240815163408.4184705-1-Frank.Li@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
ice: iavf: add support for TC U32 filters on VFs
Ahmed Zaki says:
The Intel Ethernet 800 Series is designed with a pipeline that has
an on-chip programmable capability called Dynamic Device Personalization
(DDP). A DDP package is loaded by the driver during probe time. The DDP
package programs functionality in both the parser and switching blocks in
the pipeline, allowing dynamic support for new and existing protocols.
Once the pipeline is configured, the driver can identify the protocol and
apply any HW action in different stages, for example, direct packets to
desired hardware queues (flow director), queue groups or drop.
Patches 1-8 introduce a DDP package parser API that enables different
pipeline stages in the driver to learn the HW parser capabilities from
the DDP package that is downloaded to HW. The parser library takes raw
packet patterns and masks (in binary) indicating the packet protocol fields
to be matched and generates the final HW profiles that can be applied at
the required stage. With this API, raw flow filtering for FDIR or RSS
could be done on new protocols or headers without any driver or Kernel
updates (only need to update the DDP package). These patches were submitted
before [1] but were not accepted mainly due to lack of a user.
Patches 9-11 extend the virtchnl support to allow the VF to request raw
flow director filters. Upon receiving the raw FDIR filter request, the PF
driver allocates and runs a parser lib instance and generates the hardware
profile definitions required to program the FDIR stage. These were also
submitted before [2].
Finally, patches 12 and 13 add TC U32 filter support to the iavf driver.
Using the parser API, the ice driver runs the raw patterns sent by the
user and then adds a new profile to the FDIR stage associated with the VF's
VSI. Refer to examples in patch 13 commit message.
[1]: https://lore.kernel.org/netdev/20230904021455.3944605-1-junfeng.guo@intel.com/
[2]: https://lore.kernel.org/intel-wired-lan/20230818064703.154183-1-junfeng.guo@intel.com/
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
iavf: add support for offloading tc U32 cls filters
iavf: refactor add/del FDIR filters
ice: enable FDIR filters from raw binary patterns for VFs
ice: add method to disable FDIR SWAP option
virtchnl: support raw packet in protocol header
ice: add API for parser profile initialization
ice: add UDP tunnels support to the parser
ice: support turning on/off the parser's double vlan mode
ice: add parser execution main loop
ice: add parser internal helper functions
ice: add debugging functions for the parser sections
ice: parse and init various DDP parser sections
ice: add parser create and destroy skeleton
====================
Link: https://patch.msgid.link/20240813222249.3708070-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev says:
====================
use more devm for ag71xx
Some of these were introduced after the driver got introduced. In any
case, using more devm allows removal of the remove function and overall
simplifies the code. All of these were tested on a TP-LINK Archer C7v2.
====================
Link: https://patch.msgid.link/20240813170516.7301-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel says:
====================
selftests: fib_rule_tests: Cleanups and new tests
This patchset performs some cleanups and adds new tests in preparation
for upcoming FIB rule DSCP selector.
====================
Link: https://patch.msgid.link/20240814111005.955359-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The TOS value reaches the FIB rule core via different call paths when an
input route is looked up compared to an output route.
Re-test TOS matching with input routes to exercise these code paths.
Pass the 'iif' and 'from' selectors separately from the 'get{,no}match'
variables as otherwise the test name is too long to be printed without
misalignments.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20240814111005.955359-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The fib_rule{4,6}_connect tests verify that locally generated traffic
from a socket that specifies a DS Field using the IP_TOS / IPV6_TCLASS
socket options is correctly redirected using a FIB rule that matches on
the given DS Field.
Add negative tests to verify that the FIB rule is not hit when the
socket specifies a DS Field that differs from the one used by the FIB
rule.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20240814111005.955359-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The fib_rule{4,6} tests verify the behavior of a given FIB rule selector
(e.g., dport, sport) by redirecting to a routing table with a default
route using a FIB rule with the given selector and checking that a route
lookup using the selector matches this default route.
Add negative tests to verify that a FIB rule is not hit when it should
not.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20240814111005.955359-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Clarify the test results by grouping the output of test cases belonging
to the same test under a common title. This is consistent with the
output of fib_tests.sh.
Before:
# ./fib_rule_tests.sh
TEST: rule6 check: oif redirect to table [ OK ]
TEST: rule6 del by pref: oif redirect to table [ OK ]
[...]
TEST: rule4 check: oif redirect to table [ OK ]
TEST: rule4 del by pref: oif redirect to table [ OK ]
[...]
Tests passed: 116
Tests failed: 0
After:
# ./fib_rule_tests.sh
IPv6 FIB rule tests
TEST: rule6 check: oif redirect to table [ OK ]
TEST: rule6 del by pref: oif redirect to table [ OK ]
[...]
IPv4 FIB rule tests
TEST: rule4 check: oif redirect to table [ OK ]
TEST: rule4 del by pref: oif redirect to table [ OK ]
[...]
Tests passed: 116
Tests failed: 0
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20240814111005.955359-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The KSZ87xx switches have 32 static MAC address table entries and not
8. This fixes -ENOSPC non-critical errors from ksz8_add_sta_mac when
configured as a bridge.
Add a new ksz87xx_dev_ops structure to be able to use the
ksz_r_mib_stat64 pointer for this family; this corrects a wrong
mib->counters cast to ksz88xx_stats_raw. This fixes iproute2
statistics. Rename ksz8_dev_ops structure to ksz88x3_dev_ops, in line
with ksz_is_* naming conventions from ksz_common.h.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://patch.msgid.link/20240813142750.772781-6-vtpieter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add WoL support for KSZ87xx family of switches. This code was tested
with a KSZ8794 chip.
Implement ksz_common usage of the new device-tree property
'microchip,pme-active-high'.
Make use of the now generalized ksz_common WoL functions, adding an
additional interrupt register write for KSZ87xx. Add helper functions
to convert from PME (port) read/writes to indirect register
read/writes in the dedicated ksz8795 sources. Add initial
configuration during (port) setup as per KSZ9477.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://patch.msgid.link/20240813142750.772781-5-vtpieter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce cable testing support for the DP83TG720 PHY. This implementation
is based on the "DP83TG720S-Q1: Configuring for Open Alliance Specification
Compliance (Rev. B)" application note.
The feature has been tested with cables of various lengths:
- No cable: 1m till open reported.
- 5 meter cable: reported properly.
- 20 meter cable: reported as 19m.
- 40 meter cable: reported as cable ok.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240812073046.1728288-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add new result codes to support TDR diagnostics in preparation for
Open Alliance 1000BaseT1 TDR support:
- ETHTOOL_A_CABLE_RESULT_CODE_NOISE: TDR not possible due to high noise
level.
- ETHTOOL_A_CABLE_RESULT_CODE_RESOLUTION_NOT_POSSIBLE: TDR resolution not
possible / out of distance.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240812073046.1728288-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Wang says:
====================
virtio-net: synchronize op/admin state
This series tries to synchronize the operstate with the admin state
which allows the lower virtio-net to propagate the link status to the
upper devices like macvlan.
This is done by toggling carrier during ndo_open/stop while doing
other necessary serialization about the carrier settings during probe.
While at it, also fix a race between probe and ndo_set_features as we
didn't initalize the guest offload setting under rtnl lock.
====================
Link: https://patch.msgid.link/20240814052228.4654-1-jasowang@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch synchronizes operstate with admin state per RFC2863.
This is done by trying to toggle the carrier upon open/close and
synchronize with the config change work. This allows to propagate
status correctly to stacked devices like:
ip link add link enp0s3 macvlan0 type macvlan
ip link set link enp0s3 down
ip link show
Before this patch:
3: enp0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
......
5: macvlan0@enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
After this patch:
3: enp0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
...
5: macvlan0@enp0s3: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Cc: Gia-Khanh Nguyen <gia-khanh.nguyen@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20240814052228.4654-4-jasowang@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sometime, it would be useful to disable the configure change
notification from the driver. So this patch allows this by introducing
a variable config_change_driver_disabled and only allow the configure
change notification callback to be triggered when it is allowed by
both the virtio core and the driver. It is set to false by default to
hold the current semantic so we don't need to change any drivers.
The first user for this would be virtio-net.
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Cc: Gia-Khanh Nguyen <gia-khanh.nguyen@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20240814052228.4654-3-jasowang@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add missing __percpu qualifier to a (void *) cast to fix
dev.c:10863:45: warning: cast removes address space '__percpu' of expression
sparse warning. Also remove now unneeded __force sparse directives.
Found by GCC's named address space checks.
There were no changes in the resulting object file.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Link: https://patch.msgid.link/20240814070748.943671-1-ubizjak@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to commit 70f06c115b ("sched: act_ct: switch to per-action
label counting"), we should also switch to per-action label counting
in openvswitch conntrack, as Florian suggested.
The difference is that nf_connlabels_get() is called unconditionally
when creating an ct action in ovs_ct_copy_action(). As with these
flows:
table=0,ip,actions=ct(commit,table=1)
table=1,ip,actions=ct(commit,exec(set_field:0xac->ct_label),table=2)
it needs to make sure the label ext is created in the 1st flow before
the ct is committed in ovs_ct_commit(). Otherwise, the warning in
nf_ct_ext_add() when creating the label ext in the 2nd flow will
be triggered:
WARN_ON(nf_ct_is_confirmed(ct));
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/6b9347d5c1a0b364e88d900b29a616c3f8e5b1ca.1723483073.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>