VLAN proto, used in ice XDP hints implementation is stored in ring packet
context. Utilize this value in skb VLAN processing too instead of checking
netdev features.
At the same time, use vlan_tci instead of vlan_tag in touched code,
because VLAN tag often refers to VLAN proto and VLAN TCI combined,
while in the code we clearly store only VLAN TCI.
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-12-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In AF_XDP ZC, xdp_buff is not stored on ring,
instead it is provided by xsk_buff_pool.
Space for metadata sources right after such buffers was already reserved
in commit 94ecc5ca4d ("xsk: Add cb area to struct xdp_buff_xsk").
Some things (such as pointer to packet context) do not change on a
per-packet basis, so they can be set at the same time as RX queue info.
On the other hand, RX descriptor is unique for each packet, but is already
known when setting DMA addresses. This minimizes performance impact of
hints on regular packet processing.
Update AF_XDP ZC packet processing to support XDP hints.
Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-9-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
RX hash XDP hint requests both hash value and type.
Type is XDP-specific, so we need a separate way to map
these values to the hardware ptypes, so create a lookup table.
Instead of creating a new long list, reuse contents
of ice_decode_rx_desc_ptype[] through preprocessor.
Current hash type enum does not contain ICMP packet type,
but ice devices support it, so also add a new type into core code.
Then use previously refactored code and create a function
that allows XDP code to read RX hash.
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-7-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use previously refactored code and create a function
that allows XDP code to read HW timestamp.
Also, introduce packet context, where hints-related data will be stored.
ice_xdp_buff contains only a pointer to this structure, to avoid copying it
in ZC mode later in the series.
HW timestamp is the first supported hint in the driver,
so also add xdp_metadata_ops.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-6-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In order to use XDP hints via kfuncs we need to put
RX descriptor and miscellaneous data next to xdp_buff.
Same as in hints implementations in other drivers, we achieve
this through putting xdp_buff into a child structure.
Currently, xdp_buff is stored in the ring structure,
so replace it with union that includes child structure.
This way enough memory is available while existing XDP code
remains isolated from hints.
Minimum size of the new child structure (ice_xdp_buff) is exactly
64 bytes (single cache line). To place it at the start of a cache line,
move 'next' field from CL1 to CL4, as it isn't used often. This still
leaves 192 bits available in CL3 for packet context extensions.
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-5-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, rx_ptype variable is used only as an argument
to ice_process_skb_fields() and is computed
just before the function call.
Therefore, there is no reason to pass this value as an argument.
Instead, remove this argument and compute the value directly inside
ice_process_skb_fields() function.
Also, separate its calculation into a short function, so the code
can later be reused in .xmo_() callbacks.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-4-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Previously, we only needed RX HW timestamp in skb path,
hence all related code was written with skb in mind.
But with the addition of XDP hints via kfuncs to the ice driver,
the same logic will be needed in .xmo_() callbacks.
Put generic process of reading RX HW timestamp from a descriptor
into a separate function.
Move skb-related code into another source file.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-3-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Previously, we only needed RX hash in skb path,
hence all related code was written with skb in mind.
But with the addition of XDP hints via kfuncs to the ice driver,
the same logic will be needed in .xmo_() callbacks.
Separate generic process of reading RX hash from a descriptor
into a separate function.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-2-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Set backpressure watermark for hardware RX queues. Backpressure
gets triggered when the available buffers of a hardware RX queue
falls below the set watermark. This backpressure will propagate
to packet processing pipeline in the OCTEON card, so that the host
receives fewer packets and prevents packet dropping at host.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add PCI Endpoint NIC support for Octeon CN98 devices.
CN98 devices are part of Octeon 9 family products with
similar PCI NIC characteristics to CN93, already supported
driver.
Add CN98 card to the device id table, as well
as support differences in the register fields and
certain usage scenarios such as unload.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://lore.kernel.org/r/20231129045348.2538843-3-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the phy reset sequence is as shown below for a
devicetree described mdio phy on boot:
1. Assert the phy_device's reset as part of registering
2. Deassert the phy_device's reset as part of registering
3. Deassert the phy_device's reset as part of phy_probe
4. Deassert the phy_device's reset as part of phy_hw_init
The extra two deasserts include waiting the deassert delay afterwards,
which is adding unnecessary delay.
This applies to both possible types of resets (reset controller
reference and a reset gpio) that can be used.
Here's some snipped tracing output using the following command line
params "trace_event=gpio:* trace_options=stacktrace" illustrating
the reset handling and where its coming from:
/* Assert */
systemd-udevd-283 [002] ..... 6.780434: gpio_value: 544 set 0
systemd-udevd-283 [002] ..... 6.783849: <stack trace>
=> gpiod_set_raw_value_commit
=> gpiod_set_value_nocheck
=> gpiod_set_value_cansleep
=> mdio_device_reset
=> mdiobus_register_device
=> phy_device_register
=> fwnode_mdiobus_phy_device_register
=> fwnode_mdiobus_register_phy
=> __of_mdiobus_register
=> stmmac_mdio_register
=> stmmac_dvr_probe
=> stmmac_pltfr_probe
=> devm_stmmac_pltfr_probe
=> qcom_ethqos_probe
=> platform_probe
/* Deassert */
systemd-udevd-283 [002] ..... 6.802480: gpio_value: 544 set 1
systemd-udevd-283 [002] ..... 6.805886: <stack trace>
=> gpiod_set_raw_value_commit
=> gpiod_set_value_nocheck
=> gpiod_set_value_cansleep
=> mdio_device_reset
=> phy_device_register
=> fwnode_mdiobus_phy_device_register
=> fwnode_mdiobus_register_phy
=> __of_mdiobus_register
=> stmmac_mdio_register
=> stmmac_dvr_probe
=> stmmac_pltfr_probe
=> devm_stmmac_pltfr_probe
=> qcom_ethqos_probe
=> platform_probe
/* Deassert */
systemd-udevd-283 [002] ..... 6.882601: gpio_value: 544 set 1
systemd-udevd-283 [002] ..... 6.886014: <stack trace>
=> gpiod_set_raw_value_commit
=> gpiod_set_value_nocheck
=> gpiod_set_value_cansleep
=> mdio_device_reset
=> phy_probe
=> really_probe
=> __driver_probe_device
=> driver_probe_device
=> __device_attach_driver
=> bus_for_each_drv
=> __device_attach
=> device_initial_probe
=> bus_probe_device
=> device_add
=> phy_device_register
=> fwnode_mdiobus_phy_device_register
=> fwnode_mdiobus_register_phy
=> __of_mdiobus_register
=> stmmac_mdio_register
=> stmmac_dvr_probe
=> stmmac_pltfr_probe
=> devm_stmmac_pltfr_probe
=> qcom_ethqos_probe
=> platform_probe
/* Deassert */
NetworkManager-477 [000] ..... 7.023144: gpio_value: 544 set 1
NetworkManager-477 [000] ..... 7.026596: <stack trace>
=> gpiod_set_raw_value_commit
=> gpiod_set_value_nocheck
=> gpiod_set_value_cansleep
=> mdio_device_reset
=> phy_init_hw
=> phy_attach_direct
=> phylink_fwnode_phy_connect
=> __stmmac_open
=> stmmac_open
There's a lot of paths where the device is getting its reset
asserted and deasserted. Let's track the state and only actually
do the assert/deassert when it changes.
Reported-by: Sagar Cheluvegowda <quic_scheluve@quicinc.com>
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231127-net-phy-reset-once-v2-1-448e8658779e@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-11-30
We've added 30 non-merge commits during the last 7 day(s) which contain
a total of 58 files changed, 1598 insertions(+), 154 deletions(-).
The main changes are:
1) Add initial TX metadata implementation for AF_XDP with support in mlx5
and stmmac drivers. Two types of offloads are supported right now, that
is, TX timestamp and TX checksum offload, from Stanislav Fomichev with
stmmac implementation from Song Yoong Siang.
2) Change BPF verifier logic to validate global subprograms lazily instead
of unconditionally before the main program, so they can be guarded using
BPF CO-RE techniques, from Andrii Nakryiko.
3) Add BPF link_info support for uprobe multi link along with bpftool
integration for the latter, from Jiri Olsa.
4) Use pkg-config in BPF selftests to determine ld flags which is
in particular needed for linking statically, from Akihiko Odaki.
5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits)
bpf/tests: Remove duplicate JSGT tests
selftests/bpf: Add TX side to xdp_hw_metadata
selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP
selftests/bpf: Add TX side to xdp_metadata
selftests/bpf: Add csum helpers
selftests/xsk: Support tx_metadata_len
xsk: Add option to calculate TX checksum in SW
xsk: Validate xsk_tx_metadata flags
xsk: Document tx_metadata_len layout
net: stmmac: Add Tx HWTS support to XDP ZC
net/mlx5e: Implement AF_XDP TX timestamp and checksum offload
tools: ynl: Print xsk-features from the sample
xsk: Add TX timestamp and TX checksum offload support
xsk: Support tx_metadata_len
selftests/bpf: Use pkg-config for libelf
selftests/bpf: Override PKG_CONFIG for static builds
selftests/bpf: Choose pkg-config for the target
bpftool: Add support to display uprobe_multi links
selftests/bpf: Add link_info test for uprobe_multi link
selftests/bpf: Use bpf_link__destroy in fill_link_info tests
...
====================
Conflicts:
Documentation/netlink/specs/netdev.yaml:
839ff60df3 ("net: page_pool: add nlspec for basic access to page pools")
48eb03dd26 ("xsk: Add TX timestamp and TX checksum offload support")
https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/
While at it also regen, tree is dirty after:
48eb03dd26 ("xsk: Add TX timestamp and TX checksum offload support")
looks like code wasn't re-rendered after "render-max" was removed.
Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Replace the error path returning a non-zero value by an error message
and a comment that there is more to do. With that this patch results in
no change of behaviour in this driver apart from improving the error
message.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Replace the error path returning a non-zero value by an error message
and a comment that there is more to do. With that this patch results in
no change of behaviour in this driver apart from improving the error
message.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Replace the error path returning a non-zero value by an error message
and a comment that there is more to do. With that this patch results in
no change of behaviour in this driver apart from improving the error
message.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
On RZ/G3S SMARC Carrier II board having RGMII connections b/w Ethernet
MACs and PHYs it has been discovered that doing unbind/bind for ravb
driver in a loop leads to wrong speed and duplex for Ethernet links and
broken connectivity (the connectivity cannot be restored even with
bringing interface down/up). Before doing unbind/bind the Ethernet
interfaces were configured though systemd. The sh instructions used to
do unbind/bind were:
$ cd /sys/bus/platform/drivers/ravb/
$ while :; do echo 11c30000.ethernet > unbind ; \
echo 11c30000.ethernet > bind; done
It has been discovered that there is a race b/w IOCTLs initialized by
systemd at the response of success binding and the
"ravb_write(ndev, CCC_OPC_RESET, CCC)" call in ravb_remove() as
follows:
1/ as a result of bind success the user space open/configures the
interfaces tough an IOCTL; the following stack trace has been
identified on RZ/G3S:
Call trace:
dump_backtrace+0x9c/0x100
show_stack+0x20/0x38
dump_stack_lvl+0x48/0x60
dump_stack+0x18/0x28
ravb_open+0x70/0xa58
__dev_open+0xf4/0x1e8
__dev_change_flags+0x198/0x218
dev_change_flags+0x2c/0x80
devinet_ioctl+0x640/0x708
inet_ioctl+0x1e4/0x200
sock_do_ioctl+0x50/0x108
sock_ioctl+0x240/0x358
__arm64_sys_ioctl+0xb0/0x100
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0xc0/0xc8
el0t_64_sync+0x190/0x198
2/ this call may execute concurrently with ravb_remove() as the
unbind/bind operation was executed in a loop
3/ if the operation mode is changed to RESET (through
ravb_write(ndev, CCC_OPC_RESET, CCC) call in ravb_remove())
while the above ravb_open() is in progress it may lead to MAC
(or PHY, or MAC-PHY connection, the right point hasn't been identified
at the moment) to be broken, thus the Ethernet connectivity fails to
restore.
The simple fix for this is to move ravb_write(ndev, CCC_OPC_RESET, CCC))
after unregister_netdev() to avoid resetting the controller while the
netdev interface is still registered.
To avoid future issues in ravb_remove(), the patch follows the proper order
of operations in ravb_remove(): reverse order compared with ravb_probe().
This avoids described races as the IOCTLs as well as unregister_netdev()
(called now at the beginning of ravb_remove()) calls rtnl_lock() before
continuing and IOCTLs check (though devinet_ioctl()) if device is still
registered just after taking the lock:
int devinet_ioctl(struct net *net, unsigned int cmd, struct ifreq *ifr)
{
// ...
rtnl_lock();
ret = -ENODEV;
dev = __dev_get_by_name(net, ifr->ifr_name);
if (!dev)
goto done;
// ...
done:
rtnl_unlock();
out:
return ret;
}
Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In case ravb_phy_start() returns with error the settings applied in
ravb_dmac_init() are not reverted (e.g. config mode). For this call
ravb_stop_dma() on failure path of ravb_open().
Fixes: a0d2f20650 ("Renesas Ethernet AVB PTP clock driver")
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hardware manual of RZ/G3S (and RZ/G2L) specifies the following on the
description of CXR35 register (chapter "PHY interface select register
(CXR35)"): "After release reset, make write-access to this register before
making write-access to other registers (except MDIOMOD). Even if not need
to change the value of this register, make write-access to this register
at least one time. Because RGMII/MII MODE is recognized by accessing this
register".
The setup procedure for EMAC module (chapter "Setup procedure" of RZ/G3S,
RZ/G2L manuals) specifies the E-MAC.CXR35 register is the first EMAC
register that is to be configured.
Note [A] from chapter "PHY interface select register (CXR35)" specifies
the following:
[A] The case which CXR35 SEL_XMII is used for the selection of RGMII/MII
in APB Clock 100 MHz.
(1) To use RGMII interface, Set ‘H’03E8_0000’ to this register.
(2) To use MII interface, Set ‘H’03E8_0002’ to this register.
Take into account these indication.
Fixes: 1089877ada ("ravb: Add RZ/G2L MII interface support")
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
pm_runtime_get_sync() may return an error. In case it returns with an error
dev->power.usage_count needs to be decremented. pm_runtime_resume_and_get()
takes care of this. Thus use it.
Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
reset_control_deassert() could return an error. Some devices cannot work
if reset signal de-assert operation fails. To avoid this check the return
code of reset_control_deassert() in ravb_probe() and take proper action.
Along with it, the free_netdev() call from the error path was moved after
reset_control_assert() on its own label (out_free_netdev) to free
netdev in case reset_control_deassert() fails.
Fixes: 0d13a1a464 ("ravb: Add reset support")
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In this patch, add the artifacts for the rFID family that works in CFF
flood mode.
The same that was said about PGT organization and lookup in bridge FID
families applies for the rFID family as well. The main difference lies in
the fact that in the controlled flood mode, the FW was taking care of
maintaining the PGT tables for rFIDs. In CFF mode, the responsibility
shifts to the driver.
All rFIDs are based off either a front panel port, or a LAG port. For those
based off ports, we need to maintain at worst one PGT block for each port,
for those based off LAGs, one PGT block per LAG. This reflects in the
pgt_size callback, which determines the PGT footprint based on number of
ports and the LAG capacity.
A number of FIDs may end up using the same PGT base. Unlike with bridges,
where membership of a port in a given FID is highly dynamic, an rFID based
of a port will just always need to flood to that port.
Both the port and the LAG subtables need to be actively maintained. To that
end, the CFF rFID family implements fid_port_init and fid_port_fini
callbacks, which toggle the necessary bits.
Both FID-MID translation and SFMR packing then point into either the port
or the LAG subtable, to the block that corresponds to a given port or a
given LAG, depending on what port the RIF bound to the rFID uses.
As in the previous patch, the way CFF flood mode organizes PGT accesses
allows for much more smarts and dynamism. As in the previous patch, we
rather aim to keep things simple and static.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/962deb4367585d38250e80c685a34735c0c7f3ad.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In this patch, add the artifacts for 802.1d and 802.1q FID families that
work in CFF flood mode.
In CFF flood mode, the way flood vectors are looked up changes: there's a
per-FID PGT base, to which a small offset is added depending on type of
traffic. Thus each FID occupies a small contiguous block of PGT memory,
whereas in the controlled flood mode, flood vectors for a given FID were
spread across the PGT.
The term "flood table" as used by the spectrum_fid module, borrows from
controlled flood mode way of organizing the PGT table. There flood tables
were actual tables, contiguous in the PGT. In the CFF flood mode, they are
more abstract: a flood table becomes a collection of e.g. all first rows of
the per-FID PGT blocks. Nonetheless we retain the nomenclature.
FIDs are still configured through the SFMR register, but there are
different fields to set under CFF mode: PGT base and profile. Thus register
packing gets a dedicated op overload as well.
The new organization of PGT makes it possible to treat the PGT as a block
of an ordinary memory, allocate and deallocate on demand, and achieve
better flexibility. Here instead, we aim to keep the code as close as
possible to the previous controlled flood mode, support for which we need
to retain for Spectrum-1 and older FW versions anyway. Thus the PGT
footprint of the individual families is the same as before, just the
internal organization of the per-family PGT region differs. Hence the
pgt_size callback is reused between the controlled and CFF flood modes.
Since the dummy family has no flood tables in either the CTL mode or in
CFF mode, the existing one can be reused for the CFF family array.
Users should not notice any changes between the controlled and CFF flood
modes.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/ca40b8163e6d6a21f63ef299619acee953cf9519.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In CFF flood mode, the way flood vectors are looked up changes: there's a
per-FID PGT base, to which a small offset is added depending on type of
traffic. Thus each FID occupies a small contiguous block of PGT memory,
whereas in the controlled flood mode, flood vectors for a given FID were
spread across the PGT.
Each FID is associated with one of a handful of profiles. The profile and
the traffic type are then used as keys to look up the PGT offset. This
offset is then added to the per-FID PGT base. The profile / type / offset
mapping needs to be configured by the driver, and is only relevant in CFF
flood mode.
In this patch, add the SFFP initialization code. Only initialize the one
profile currently explicitly used. As follow-up patch add more profiles,
this code will pick them up and initialize as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/2c4733ed72d439444218969c032acad22cd4ed88.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the CFF mode, flood profiles are identified by a unique numerical
identifier. This is used for configuration of FIDs and for configuration of
traffic-type to PGT offset rules. In both cases, the numerical identifier
serves as a handle for the flood profile. Add the identifier to the flood
profile structure.
There is currently only one flood profile in use explicitly, the one used
for all bridging. Eventually three will be necessary in total: one for
bridges, one for rFIDs, one for NVE underlay. A total of four profiles
are supported by the HW. Start allocating at 1, because 0 is currently
used for underlay NVE flood.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/19ea9c35ba8b522fa5f7eb6fd7bc1b68f0f66b41.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A flood profile is a mapping from traffic type to an offset at which a
flood vector should be looked up. In mlxsw so far, a flood profile was
somewhat implicitly represented by flood table array. When the CFF flood
mode will be introduced, the flood profile will become more explicit: each
will get a number and the profile ID / traffic-type / offset mapping will
actually need to be initialized in the hardware.
Therefore it is going to be handy to have a structure that keeps all the
components that compose a flood profile. Add this structure, currently with
just the flood table array bits. In the FID families that flood at all,
reference the flood profile instead of just the table array.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/15e113de114d3f41ce3fd2a14a2fa6a1b1d7e8f2.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the CFF flood mode, the driver has to allocate a table within PGT, which
holds flood vectors for router subport FIDs. For LAGs, these flood vectors
have to obviously be maintained dynamically as port membership in a LAG
changes. But even for physical ports, the flood vectors have to be kept
valid, and may not contain enabled bits corresponding to non-existent
ports. It is therefore not possible to precompute the port part of the RSP
table, it has to be maintained as ports come and go due to splits.
To support the RSP table maintenance, add to FID ops two new ops:
fid_port_init and fid_port_fini, for when a port comes to existence, or
joins a lag, and vice versa. Invoke these ops from
mlxsw_sp_port_fids_init() and mlxsw_sp_port_fids_fini(), which are called
when port is added and removed, respectively. Also add two new hooks for
LAG maintenance, mlxsw_sp_fid_port_join_lag() / _leave_lag() which
transitively call into the same ops.
Later patches will actually add the op implementations themselves, this
just adds the scaffolding.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/234398a23540317abb25f74f920a5c8121faecf0.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the CFF flood mode, the PGT allocation size of RFID family will not
depend on number of FIDs, but rather number of ports and LAGs. Therefore
introduce a FID family operation to calculate the PGT allocation size.
The way that size is calculated in the CFF mode depends on calling fallible
functions. Thus express the op as returning an int, with the size returned
via a pointer argument.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/1174651b7160fcedbef50010ae4b68201112fe6f.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In controlled flood mode, for each bridge FID family (i.e., 802.1Q and
802.1D) and packet type (i.e., UUC/MC/BC), the hardware needs to be told
which PGT address to use as the base address for the flood table and how
to determine the offset from the base for each FID.
The above is not needed in CFF mode where each FID has its own flood
table instead of the FID family itself.
Therefore, create a new FID family operation for the above configuration
and only implement it for the 802.1Q and 802.1D families in controlled
flood mode.
No functional changes intended.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/06f71415eec75811585ec597e1dd101b6dff77e7.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In future patches, for CFF flood mode support, we will need a way to
determine a PGT base dynamically, as an op. Therefore, for symmetry,
split out a helper, mlxsw_sp_fid_pgt_base_ctl(), that determines a PGT base
in the controlled mode as well.
Now that the helper is available, use it in mlxsw_sp_fid_flood_table_init()
which currently invokes the FID->MID helper to that end.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/fd41c66a1df4df6499d3da34f40e7b9efa15bc3e.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, mlxsw always uses a "controlled" flood mode on all Nvidia
Spectrum generations. The following patches will however introduce a
possibility to run a "CFF" (for Compressed FID Flooding) mode on newer
machines, if the FW supports it.
To reflect that, label all FID ops, FID families and FID family arrays with
a _ctl suffix. This will make it clearer what is what when the CFF families
are introduced in later patches.
Keep the dummy family intact. Since the dummy family has no flood tables
in either CTL or CFF mode, there are no flood-mode-specific callbacks.
Additionally, add a remark at two fields that they are only relevant when
flood mode is not CFF.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/96b6da5439bb662fa86e795bbcec9dc3ccfa59fd.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, mlxsw always uses a "controlled" flood mode on all Nvidia
Spectrum generations. The following patches will however introduce a
possibility to run a "CFF" (for Compressed FID Flooding) mode on newer
machines, if the FW supports it.
Several operations will differ between how they need to be done in
controlled mode vs. CFF mode. Thus the per-FID-family ops will differ
between controlled and CFF, thus the FID family array as such will
differ depending on whether the mode negotiated with FW is controlled
or CFF.
The simple approach of having several globally visible arrays for
spectrum.c to statically choose from no longer works. Instead privatize all
FID initialization and finalization logic, and expose it as ops instead.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/d3fa390d97cf3dbd2f7a28741be69b311e2059e4.1701183891.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On further testing on BE target with kernel test robot, it was notice
that the endianness conversion for addr and CRC in fw_load_memory was
wrong.
Drop the cpu_to_le32 conversion for addr load as it's not needed.
Use get_unaligned_le32 instead of get_unaligned for FW data word load to
correctly convert data in the correct order to follow system endian.
Also drop the cpu_to_be32 for CRC calculation as it's wrong and would
cause different CRC on BE system.
The loaded word is swapped internally and MAILBOX calculates the CRC on
the swapped word. To correctly calculate the CRC to be later matched
with the one from MAILBOX, use an u8 struct and swap the word there to
keep the same order on both LE and BE for crc_ccitt_false function.
Also add additional comments on how the CRC verification for the loaded
section works.
CRC is calculated as we load the section and verified with the MAILBOX
only after the entire section is loaded to skip additional slowdown by
loop the section data again.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311210414.sEJZjlcD-lkp@intel.com/
Fixes: e93984ebc1 ("net: phy: aquantia: add firmware load support")
Tested-by: Robert Marko <robimarko@gmail.com> # ipq8072 LE device
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20231128135928.9841-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is an error when an interface has the following conditions:
- PF is in an aggregate (bond)
- PF has VFs created on it
- bond is in a state where it is failed-over to the secondary interface
- A VF reset is issued on one or more of those VFs
The issue is generated by the originating PF trying to rebuild or
reconfigure the VF resources. Since the bond is failed over to the
secondary interface the queue contexts are in a modified state.
To fix this issue, have the originating interface reclaim its resources
prior to the tear-down and rebuild or reconfigure. Then after the process
is complete, move the resources back to the currently active interface.
There are multiple paths that can be used depending on what triggered the
event, so create a helper function to move the queues and use paired calls
to the helper (back to origin, process, then move back to active interface)
under the same lag_mutex lock.
Fixes: 1e0f9881ef ("ice: Flesh out implementation of support for SRIOV on bonded interface")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20231127212340.1137657-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>