linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-01 00:02:19 -04:00

Author	SHA1	Message	Date
Yanteng Si	803fc61df2	net: stmmac: dwmac-loongson: Add Loongson Multi-channels GMAC support The Loongson DWMAC driver currently supports the Loongson GMAC devices (based on the DW GMAC v3.50a/v3.73a IP-core) installed to the LS2K1000 SoC and LS7A1000 chipset. But recently a new generation LS2K2000 SoC was released with the new version of the Loongson GMAC synthesized in. The new controller is based on the DW GMAC v3.73a IP-core with the AV-feature enabled, which implies the multi DMA-channels support. The multi DMA-channels feature has the next vendor-specific peculiarities: 1. Split up Tx and Rx DMA IRQ status/mask bits: Name Tx Rx DMA_INTR_ENA_NIE = 0x00040000 \| 0x00020000; DMA_INTR_ENA_AIE = 0x00010000 \| 0x00008000; DMA_STATUS_NIS = 0x00040000 \| 0x00020000; DMA_STATUS_AIS = 0x00010000 \| 0x00008000; DMA_STATUS_FBI = 0x00002000 \| 0x00001000; 2. Custom Synopsys ID hardwired into the GMAC_VERSION.SNPSVER register field. It's 0x10 while it should have been 0x37 in accordance with the actual DW GMAC IP-core version. 3. There are eight DMA-channels available meanwhile the Synopsys DW GMAC IP-core supports up to three DMA-channels. 4. It's possible to have each DMA-channel IRQ independently delivered. The MSI IRQs must be utilized for that. Thus in order to have the multi-channels Loongson GMAC controllers supported let's modify the Loongson DWMAC driver in accordance with all the peculiarities described above: 1. Create the multi-channels Loongson GMAC-specific stmmac_dma_ops::dma_interrupt() stmmac_dma_ops::init_chan() callbacks due to the non-standard DMA IRQ CSR flags layout. 2. Create the Loongson DWMAC-specific platform setup() method which gets to initialize the DMA-ops with the dwmac1000_dma_ops instance and overrides the callbacks described in 1. The method also overrides the custom Synopsys ID with the real one in order to have the rest of the HW-specific callbacks correctly detected by the driver core. 3. Make sure the platform setup() method enables the flow control and duplex modes supported by the controller. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-13 09:48:00 +02:00
Yanteng Si	126f4f96c4	net: stmmac: dwmac-loongson: Add DT-less GMAC PCI-device support The Loongson GMAC driver currently supports the network controllers installed on the LS2K1000 SoC and LS7A1000 chipset, for which the GMAC devices are required to be defined in the platform device tree source. But Loongson machines may have UEFI (implies ACPI) or PMON/UBOOT (implies FDT) as the system bootloaders. In order to have both system configurations support let's extend the driver functionality with the case of having the Loongson GMAC probed on the PCI bus with no device tree node defined for it. That requires to make the device DT-node optional, to rely on the IRQ line detected by the PCI core and to have the MDIO bus ID calculated using the PCIe Domain+BDF numbers. In order to have the device probe() and remove() methods less complicated let's move the DT- and ACPI-specific code to the respective sub-functions. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-13 09:48:00 +02:00
Yanteng Si	0ec04d32b5	net: stmmac: dwmac-loongson: Introduce PCI device info data The Loongson GNET device support is about to be added in one of the next commits. As another preparation for that introduce the PCI device info data with a setup() callback performing the device-specific platform data initializations. Currently it is utilized for the already supported Loongson GMAC device only. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-13 09:48:00 +02:00
Yanteng Si	849dc7341d	net: stmmac: dwmac-loongson: Add phy_interface for Loongson GMAC PHY-interface of the Loongson GMAC device is RGMII with no internal delays added to the data lines signal. So to comply with that let's pre-initialize the platform-data field with the respective enum constant. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-13 09:48:00 +02:00
Yanteng Si	c70f316368	net: stmmac: dwmac-loongson: Init ref and PTP clocks rate Reference and PTP clocks rate of the Loongson GMAC devices is 125MHz. (So is in the GNET devices which support is about to be added.) Set the respective plat_stmmacenet_data field up in accordance with that so to have the coalesce command and timestamping work correctly. Fixes: `30bba69d7d` ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-13 09:48:00 +02:00
Yanteng Si	79afc70002	net: stmmac: dwmac-loongson: Detach GMAC-specific platform data init Loongson delivers two types of the network devices: Loongson GMAC and Loongson GNET in the framework of four SOC/Chipsets revisions: Chip Network PCI Dev ID Synopys Version DMA-channel LS2K1000 SOC GMAC 0x7a03 v3.50a/v3.73a 1 LS7A1000 Chipset GMAC 0x7a03 v3.50a/v3.73a 1 LS2K2000 SOC GMAC 0x7a03 v3.73a 8 LS2K2000 SOC GNET 0x7a13 v3.73a 8 LS7A2000 Chipset GNET 0x7a13 v3.73a 1 The driver currently supports the chips with the Loongson GMAC network device synthesized with a single DMA-channel available. As a preparation before adding the Loongson GNET support detach the Loongson GMAC-specific platform data initializations to the loongson_gmac_data() method and preserve the common settings in the loongson_default_data(). While at it drop the return value statement from the loongson_default_data() method as redundant. Note there is no intermediate vendor-specific PCS in between the MAC and PHY on Loongson GMAC and GNET. So the plat->mac_interface field can be freely initialized with the PHY_INTERFACE_MODE_NA value. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-13 09:48:00 +02:00
Yanteng Si	324d96b465	net: stmmac: dwmac-loongson: Use PCI_DEVICE_DATA() macro for device identification For the readability sake convert the hard-coded Loongson GMAC PCI ID to the respective macro and use the PCI_DEVICE_DATA() macro-function to create the pci_device_id array entry. The later change will be specifically useful in order to assign the device-specific data for the currently supported device and for about to be added Loongson GNET controller. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-13 09:48:00 +02:00
Yanteng Si	0c979e6b55	net: stmmac: dwmac-loongson: Drop pci_enable/disable_msi calls The Loongson GMAC driver currently doesn't utilize the MSI IRQs, but retrieves the IRQs specified in the device DT-node. Let's drop the direct pci_enable_msi()/pci_disable_msi() calls then as redundant Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-13 09:48:00 +02:00
Yanteng Si	393ea68bf1	net: stmmac: dwmac-loongson: Drop duplicated hash-based filter size init The plat_stmmacenet_data::multicast_filter_bins field is twice initialized in the loongson_default_data() method. Drop the redundant initialization, but for the readability sake keep the filters init statements defined in the same place of the method. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-13 09:48:00 +02:00
Yanteng Si	005c0f071b	net: stmmac: Export dwmac1000_dma_ops Export the DW GMAC DMA-ops descriptor so one could be available in the low-level platform drivers. It will be utilized to override some callbacks in order to handle the LS2K2000 GNET device specifics. The GNET controller support is being added in one of the following up commits. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-13 09:48:00 +02:00
Yanteng Si	ad72f783de	net: stmmac: Add multi-channel support DW GMAC v3.73 can be equipped with the Audio Video (AV) feature which enables transmission of time-sensitive traffic over bridged local area networks (DWC Ethernet QoS Product). In that case there can be up to two additional DMA-channels available with no Tx COE support (unless there is vendor-specific IP-core alterations). Each channel is implemented as a separate Control and Status register (CSR) for managing the transmit and receive functions, descriptor handling, and interrupt handling. Add the multi-channels DW GMAC controllers support just by making sure the already implemented DMA-configs are performed on the per-channel basis. Note the only currently known instance of the multi-channel DW GMAC IP-core is the LS2K2000 GNET controller, which has been released with the vendor-specific feature extension of having eight DMA-channels. The device support will be added in one of the following up commits. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-13 09:48:00 +02:00
Yanteng Si	12dbc67c3b	net: stmmac: Move the atds flag to the stmmac_dma_cfg structure ATDS (Alternate Descriptor Size) is a part of the DMA Bus Mode configs (together with PBL, ALL, EME, etc) of the DW GMAC controllers. Seeing it's not changed at runtime but is activated as long as the IP-core has it supported (at least due to the Type 2 Full Checksum Offload Engine feature), move the respective parameter from the stmmac_dma_ops::init() callback argument to the stmmac_dma_cfg structure, which already have the rest of the DMA-related configs defined. Besides the being added in the next commit DW GMAC multi-channels support will require to add the stmmac_dma_ops::init_chan() callback and have the ATDS flag set/cleared for each channel in there. Having the atds-flag in the stmmac_dma_cfg structure will make the parameter accessible from stmmac_dma_ops::init_chan() callback too. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-13 09:48:00 +02:00
Gustavo A. R. Silva	0a3e6939d4	net/smc: Use static_assert() to check struct sizes Commit `9748dbc9f2` ("net/smc: Avoid -Wflex-array-member-not-at-end warnings") introduced tagged `struct smc_clc_v2_extension_fixed` and `struct smc_clc_smcd_v2_extension_fixed`. We want to ensure that when new members need to be added to the flexible structures, they are always included within these tagged structs. So, we use `static_assert()` to ensure that the memory layout for both the flexible structure and the tagged struct is the same after any changes. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jan Karcher <jaka@linux.ibm.com> Link: https://patch.msgid.link/ZrVBuiqFHAORpFxE@cute Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 18:41:42 -07:00
Gustavo A. R. Silva	46dd90fe51	nfp: Use static_assert() to check struct sizes Commit `d88cabfd9a` ("nfp: Avoid -Wflex-array-member-not-at-end warnings") introduced tagged `struct nfp_dump_tl_hdr`. We want to ensure that when new members need to be added to the flexible structure, they are always included within this tagged struct. So, we use `static_assert()` to ensure that the memory layout for both the flexible structure and the tagged struct is the same after any changes. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/ZrVB43Hen0H5WQFP@cute Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 18:40:44 -07:00
Gustavo A. R. Silva	e2d0fadd70	sched: act_ct: avoid -Wflex-array-member-not-at-end warning -Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Remove unnecessary flex-array member `pad[]` and refactor the related code a bit. Fix the following warning: net/sched/act_ct.c:57:29: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/ZrY0JMVsImbDbx6r@cute Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 17:54:24 -07:00
Jakub Kicinski	e96f6fd30e	Merge branch 'net-nexthop-increase-weight-to-u16' Petr Machata says: ==================== net: nexthop: Increase weight to u16 In CLOS networks, as link failures occur at various points in the network, ECMP weights of the involved nodes are adjusted to compensate. With high fan-out of the involved nodes, and overall high number of nodes, a (non-)ECMP weight ratio that we would like to configure does not fit into 8 bits. Instead of, say, 255:254, we might like to configure something like 1000:999. For these deployments, the 8-bit weight may not be enough. To that end, in this patchset increase the next hop weight from u8 to u16. Patch #1 adds a flag that indicates whether the reserved fields are zeroed. This is a follow-up to a new fix merged in commit `6d745cd0e9` ("net: nexthop: Initialize all fields in dumped nexthops"). The theory behind this patch is that there is a strict ordering between the fields actually being zeroed, the kernel declaring that they are, and the kernel repurposing the fields. Thus clients can use the flag to tell if it is safe to interpret the reserved fields in any way. Patch #2 contains the substantial code and the commit message covers the details of the changes. Patches #3 to #6 add selftests. ==================== Link: https://patch.msgid.link/cover.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 17:50:36 -07:00
Petr Machata	4b808f4473	selftests: fib_nexthops: Test 16-bit next hop weights Add tests that attempt to create NH groups that use full 16 bits of NH weight. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/101cdd3f2bfd9511c9bec95f909d20ff56f70ba5.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 17:50:35 -07:00
Petr Machata	dce0765c1d	selftests: router_mpath_nh_res: Test 16-bit next hop weights Add tests that exercise full 16 bits of NH weight. Like in the previous patch, omit the 255:65535 test when KSFT_MACHINE_SLOW. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/a91d6ead9d1b1b4b7e276ca58a71ef814f42b7dd.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 17:50:34 -07:00
Petr Machata	bb89fdacf9	selftests: router_mpath_nh: Test 16-bit next hop weights Add tests that exercise full 16 bits of NH weight. To test the 255:65535, it is necessary to run more packets than for the other tests. On a debug kernel, the test can take up to a minute, therefore avoid the test when KSFT_MACHINE_SLOW. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/c0c257c00ad30b07afc3fa5e2afd135925405544.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 17:50:34 -07:00
Petr Machata	110d3ffe9d	selftests: router_mpath: Sleep after MZ In the context of an offloaded datapath, it may take a while for the ip link stats to be updated. This causes the test to fail when MZ_DELAY is too low. Sleep after the packets are sent for the link stats to get up to date. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/8b1971d948273afd7de2da3d6a2ba35200540e55.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 17:50:34 -07:00
Petr Machata	b72a6a7ab9	net: nexthop: Increase weight to u16 In CLOS networks, as link failures occur at various points in the network, ECMP weights of the involved nodes are adjusted to compensate. With high fan-out of the involved nodes, and overall high number of nodes, a (non-)ECMP weight ratio that we would like to configure does not fit into 8 bits. Instead of, say, 255:254, we might like to configure something like 1000:999. For these deployments, the 8-bit weight may not be enough. To that end, in this patch increase the next hop weight from u8 to u16. Increasing the width of an integral type can be tricky, because while the code still compiles, the types may not check out anymore, and numerical errors come up. To prevent this, the conversion was done in two steps. First the type was changed from u8 to a single-member structure, which invalidated all uses of the field. This allowed going through them one by one and audit for type correctness. Then the structure was replaced with a vanilla u16 again. This should ensure that no place was missed. The UAPI for configuring nexthop group members is that an attribute NHA_GROUP carries an array of struct nexthop_grp entries: struct nexthop_grp { __u32 id; /* nexthop id - must exist / __u8 weight; / weight of this nexthop / __u8 resvd1; __u16 resvd2; }; The field resvd1 is currently validated and required to be zero. We can lift this requirement and carry high-order bits of the weight in the reserved field: struct nexthop_grp { __u32 id; / nexthop id - must exist / __u8 weight; / weight of this nexthop */ __u8 weight_high; __u16 resvd2; }; Keeping the fields split this way was chosen in case an existing userspace makes assumptions about the width of the weight field, and to sidestep any endianness issues. The weight field is currently encoded as the weight value minus one, because weight of 0 is invalid. This same trick is impossible for the new weight_high field, because zero must mean actual zero. With this in place: - Old userspace is guaranteed to carry weight_high of 0, therefore configuring 8-bit weights as appropriate. When dumping nexthops with 16-bit weight, it would only show the lower 8 bits. But configuring such nexthops implies existence of userspace aware of the extension in the first place. - New userspace talking to an old kernel will work as long as it only attempts to configure 8-bit weights, where the high-order bits are zero. Old kernel will bounce attempts at configuring >8-bit weights. Renaming reserved fields as they are allocated for some purpose is commonly done in Linux. Whoever touches a reserved field is doing so at their own risk. nexthop_grp::resvd1 in particular is currently used by at least strace, however they carry an own copy of UAPI headers, and the conversion should be trivial. A helper is provided for decoding the weight out of the two fields. Forcing a conversion seems preferable to bending backwards and introducing anonymous unions or whatever. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/483e2fcf4beb0d9135d62e7d27b46fa2685479d4.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 17:50:34 -07:00
Petr Machata	75bab45e6b	net: nexthop: Add flag to assert that NHGRP reserved fields are zero There are many unpatched kernel versions out there that do not initialize the reserved fields of struct nexthop_grp. The issue with that is that if those fields were to be used for some end (i.e. stop being reserved), old kernels would still keep sending random data through the field, and a new userspace could not rely on the value. In this patch, use the existing NHA_OP_FLAGS, which is currently inbound only, to carry flags back to the userspace. Add a flag to indicate that the reserved fields in struct nexthop_grp are zeroed before dumping. This is reliant on the actual fix from commit `6d745cd0e9` ("net: nexthop: Initialize all fields in dumped nexthops"). Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/21037748d4f9d8ff486151f4c09083bcf12d5df8.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 17:50:34 -07:00
Maciej Żenczykowski	246ef40670	ipv6: eliminate ndisc_ops_is_useropt() as it doesn't seem to offer anything of value. There's only 1 trivial user: int lowpan_ndisc_is_useropt(u8 nd_opt_type) { return nd_opt_type == ND_OPT_6CO; } but there's no harm to always treating that as a useropt... Cc: David Ahern <dsahern@kernel.org> Cc: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> Signed-off-by: Maciej Żenczykowski <maze@google.com> Link: https://patch.msgid.link/20240730003010.156977-1-maze@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 17:23:57 -07:00
Jakub Kicinski	9a4615be65	Merge branch 'eth-fbnic-add-basic-stats' Jakub Kicinski says: ==================== eth: fbnic: add basic stats Add basic interface stats to fbnic. v1: https://lore.kernel.org/20240807022631.1664327-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20240810054322.2766421-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 15:44:34 -07:00
Stanislav Fomichev	8be1bd91db	eth: fbnic: add support for basic qstats Implement netdev_stat_ops and export the basic per-queue stats. This interface expect users to set the values that are used either to zero or to some other preserved value (they are 0xff by default). So here we export bytes/packets/drops from tx and rx_stats plus set some of the values that are exposed by queue stats to zero. $ cd tools/testing/selftests/drivers/net && ./stats.py [...] Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0 Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20240810054322.2766421-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 15:44:23 -07:00
Jakub Kicinski	45d84008cc	eth: fbnic: add basic rtnl stats Count packets, bytes and drop on the datapath, and report to the user. Since queues are completely freed when the device is down - accumulate the stats in the main netdev struct. This means that per-queue stats will only report values since last reset (per qstat recommendation). Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240810054322.2766421-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 15:44:23 -07:00
David S. Miller	fe1f433555	Merge branch 'ethtool-rss-driver-tweaks' Jakub Kicinski says: ==================== ethtool: rss: driver tweaks and netlink context dumps This series is a semi-related collection of RSS patches. Main point is supporting dumping RSS contexts via ethtool netlink. At present additional RSS contexts can be queried one by one, and assuming user know the right IDs. This series uses the XArray added by Ed to provide netlink dump support for ETHTOOL_GET_RSS. Patch 1 is a trivial selftest debug patch. Patch 2 coverts mvpp2 for no real reason other than that I had a grand plan of converting all drivers at some stage. Patch 3 removes a now moot check from mlx5 so that all tests can pass. Patch 4 and 5 make a bit used for context support optional, for easier grepping of drivers which need converting if nothing else. Patch 6 OTOH adds a new cap bit; some devices don't support using a different key per context and currently act in surprising ways. Patch 7 and 8 update the RSS netlink code to use XArray. Patch 9 and 10 add support for dumping contexts. Patch 11 and 12 are small adjustments to spec and a new test. I'm getting distracted with other work, so probably won't have the time soon to complete next steps, but things which are missing are (and some of these may be bad ideas): - better discovery Some sort of API to tell the user who many contexts the device can create. Upper bound, devices often share contexts between ports etc. so it's hard to tell exactly and upfront number of contexts for a netdev. But order of magnitude (4 vs 10s) may be enough for container management system to know whether to bother. - create/modify/delete via netlink The only question here is how to handle all the tricky IOCTL legacy. "No change" maps trivially to attribute not present. "reset" (indir_size = 0) probably needs to be a new NLA_FLAG? - better table size handling The current API assumes the LUT has fixed size, which isn't true for modern devices. We should have better APIs for the drivers to resize the tables, and in user facing API - the ability to specify pattern and min size rather than exact table expected (sort of like ethtool CLI already does). - recounted / socket-bound contexts Support for contexts which get "cleaned up" when their parent netlink socket gets closed. The major catch is that ntuple filters (which we don't currently track) depend on the context, so we need auto-removal for both. v5: - fix build v4: https://lore.kernel.org/20240809031827.2373341-1-kuba@kernel.org - adjust to the meaning of max context from net v3: https://lore.kernel.org/20240806193317.1491822-1-kuba@kernel.org - quite a few code comments and commit message changes - mvpp2: fix interpretation of max_context_id (I'll take care of the net -> net-next merge as needed) - filter by ifindex in the selftest v2: https://lore.kernel.org/20240803042624.970352-1-kuba@kernel.org - fix bugs and build in mvpp2 v1: https://lore.kernel.org/20240802001801.565176-1-kuba@kernel.org ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:25 +01:00
Jakub Kicinski	c1ad8ef804	selftests: drv-net: rss_ctx: test dumping RSS contexts Add a test for dumping RSS contexts. Make sure indir table and key are sane when contexts are created with various combination of inputs. Test the dump filtering by ifname and start-context. Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:25 +01:00
Jakub Kicinski	8ad3be1352	netlink: specs: decode indirection table as u32 array Indirection table is dumped as a raw u32 array, decode it. It's tempting to decode hash key, too, but it is an actual bitstream, so leave it be for now. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	3d50c66c06	ethtool: rss: support skipping contexts during dump Applications may want to deal with dynamic RSS contexts only. So dumping context 0 will be counter-productive for them. Support starting the dump from a given context ID. Alternative would be to implement a dump flag to skip just context 0, not sure which is better... Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	f6122900f4	ethtool: rss: support dumping RSS contexts Now that we track RSS contexts in the core we can easily dump them. This is a major introspection improvement, as previously the only way to find all contexts would be to try all ids (of which there may be 2^32 - 1). Don't use the XArray iterators (like xa_for_each_start()) as they do not move the index past the end of the array once done, which caused multiple bugs in Netlink dumps in the past. Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	bb87f2c796	ethtool: rss: report info about additional contexts from XArray IOCTL already uses the XArray when reporting info about additional contexts. Do the same thing in netlink code. Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	a7ddfd5d57	ethtool: rss: move the device op invocation out of rss_prepare_data() Factor calling device ops out of rss_prepare_data(). Next patch will add alternative path using xarray. No functional changes. Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	ec6e57beaf	ethtool: rss: don't report key if device doesn't support it marvell/otx2 and mvpp2 do not support setting different keys for different RSS contexts. Contexts have separate indirection tables but key is shared with all other contexts. This is likely fine, indirection table is the most important piece. Don't report the key-related parameters from such drivers. This prevents driver-errors, e.g. otx2 always writes the main key, even when user asks to change per-context key. The second reason is that without this change tracking the keys by the core gets complicated. Even if the driver correctly reject setting key with rss_context != 0, change of the main key would have to be reflected in the XArray for all additional contexts. Since the additional contexts don't have their own keys not including the attributes (in Netlink speak) seems intuitive. ethtool CLI seems to deal with it just fine. Having to set the flag in majority of the drivers is a bit tedious but not reporting the key is a safer default. Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	fb770fe758	eth: remove .cap_rss_ctx_supported from updated drivers Remove .cap_rss_ctx_supported from drivers which moved to the new API. This makes it easy to grep for drivers which still need to be converted. Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	ce056504e2	ethtool: make ethtool_ops::cap_rss_ctx_supported optional cap_rss_ctx_supported was created because the API for creating and configuring additional contexts is mux'ed with the normal RSS API. Presence of ops does not imply driver can actually support rss_context != 0 (in fact drivers mostly ignore that field). cap_rss_ctx_supported lets core check that the driver is context-aware before calling it. Now that we have .create_rxfh_context, there is no such ambiguity. We can depend on presence of the op. Make setting the bit optional. Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	a7f6f56f60	eth: mlx5: allow disabling queues when RSS contexts exist Since commit `24ac7e5440` ("ethtool: use the rss context XArray in ring deactivation safety-check") core will prevent queues from being disabled while being used by additional RSS contexts. The safety check is no longer necessary, and core will do a more accurate job of only rejecting changes which can actually break things. Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	f203fd85e6	eth: mvpp2: implement new RSS context API Implement the separate create/modify/delete ops for RSS. No problems with IDs - even tho RSS tables are per device the driver already seems to allocate IDs linearly per port. There's a translation table from per-port context ID to device context ID. mvpp2 doesn't have a key for the hash, it defaults to an empty/previous indir table. Note that there is no key at all, so we don't have to be concerned with reporting the wrong one (which is addressed by a patch later in the series). Compile-tested only. Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	10fbe8c082	selftests: drv-net: rss_ctx: add identifier to traffic comments Include the "name" of the context in the comment for traffic checks. Makes it easier to reason about which context failed when we loop over 32 contexts (it may matter if we failed in first vs last, for example). Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:23 +01:00
Menglong Dong	6b8a024d25	net: vxlan: remove duplicated initialization in vxlan_xmit The variable "did_rsc" is initialized twice, which is unnecessary. Just remove one of them. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 13:37:43 +01:00
Rosen Penev	f547e956dd	net: sunvnet: use ethtool_sprintf/puts Simpler and allows avoiding manual pointer addition. Signed-off-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 13:25:38 +01:00
Enguerrand de Ribaucourt	c4e82c025b	net: dsa: microchip: ksz9477: split half-duplex monitoring function In order to respect the 80 columns limit, split the half-duplex monitoring function in two. This is just a styling change, no functional change. Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:08:34 +01:00
David S. Miller	462a94ec9f	Merge branch 'phylib-fixed-speed-1G' Russell King says: ==================== net: phylib: fix fixed-speed >= 1G This is v2 of the patch (now patches) adding support for ethtool !autoneg while respecting the requirements of IEEE 802.3. v2 fixes the build errors in the previous patch by first constifying the "advertisement" argument to the linkmode functions that only read from this pointer. It also fixes the incorrectly named linkmode_set function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:04:29 +01:00
Russell King (Oracle)	6ff3cddc36	net: phylib: do not disable autoneg for fixed speeds >= 1G We have an increasing number of drivers that are forcing auto-negotiation to be enabled for speeds of 1G or faster. It would appear that auto-negotiation is mandatory for speeds above 100M. In 802.3, Annex 40C's state diagrams seems to imply that mr_autoneg_enable (BMCR AN ENABLE) doesn't affect whether or not the AN state machines work for 1000base-T, and some PHY datasheets (e.g. Marvell Alaska) state that disabling mr_autoneg_enable leaves AN enabled but forced to 1G full duplex. Other PHY datasheets imply that BMCR AN ENABLE should not be cleared for >= 1G. Thus, this should be handled in phylib rather than in each driver. Rather than erroring out, arrange to implement the Marvell Alaska solution but in software for all PHYs: generate an appropriate single-speed advertisement for the requested speed, and keep AN enabled to the PHY driver. However, to avoid userspace API breakage, continue to report to userspace that we have AN disabled. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:04:29 +01:00
Russell King (Oracle)	aa9fbc5dd9	net: mii: constify advertising mask Constify the advertising mask to linkmode functions that only read from the advertising mask. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:04:29 +01:00
David S. Miller	4efee05fef	Merge branch 'mvpp2-child-port-removal' Javier Carrasco says: ==================== net: mvpp2: rework child node/port removal handling These two patches used to be part of another series [1] that did not apply to the networking tree without conflicts. This is therefore just a partial resend with no code modifications, just rebased onto net/main. Link: https://lore.kernel.org/all/20240806181026.5fe7f777@kernel.org/ [1] ==================== Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:00:33 +01:00
Javier Carrasco	a7b3274447	net: mvpp2: use device_for_each_child_node() to access device child nodes The iterated nodes are direct children of the device node, and the `device_for_each_child_node()` macro accounts for child node availability. `fwnode_for_each_available_child_node()` is meant to access the child nodes of an fwnode, and therefore not direct child nodes of the device node. The child nodes within mvpp2_probe are not accessed outside the loops, and the scoped version of the macro can be used to automatically decrement the refcount on early exits. Use `device_for_each_child_node()` and its scoped variant to indicate device's direct child nodes. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:00:33 +01:00
Javier Carrasco	e81d00a6b3	net: mvpp2: use port_count to remove ports As discussed in [1], there is no need to iterate over child nodes to remove the list of ports. Instead, a loop up to `port_count` ports can be used, and is in fact more reliable in case the child node availability changes. The suggested approach removes the need for the `fwnode` and `port_fwnode` variables in mvpp2_remove() as well. Link: https://lore.kernel.org/all/ZqdRgDkK1PzoI2Pf@shell.armlinux.org.uk/ [1] Suggested-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:00:33 +01:00
David S. Miller	80d021bc57	Merge branch 'bnxt_en-fix-queue-reset-when-queue-active' David Wei says: ==================== fix bnxt_en queue reset when queue is active The current bnxt_en queue API implementation is buggy when resetting a queue that has active traffic. The problem is that there is no FW involved to stop the flow of packets and relying on napi_disable() isn't enough. To fix this, call bnxt_hwrm_vnic_update() with MRU set to 0 for both the default and the ntuple vnic to stop the flow of packets. This works for any Rx queue and not only those that have ntuple rules since every Rx queue is either in the default or the ntuple vnic. For bnxt_hwrm_vnic_update() to work, proper flushing must be done by the FW. A FW flag is there to indicate support and queue_mgmt_ops is keyed behind this. The first three patches are from Michael Chan and adds the prerequisite vnic functions and FW flags indicating that it will properly flush during vnic update. Tested on BCM957504 while iperf3 is active: 1. Reset a queue that has an ntuple rule steering flow into it 2. Reset all queues in order, one at a time In both cases the flow is not interrupted. Sending this to net-next as there is no in-tree kernel consumer of queue API just yet, and there is a patch that changes when the queue_mgmt_ops is registered. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> --- v3: - include patches from Michael Chan that adds a FW flag for vnic flush capability - key support for queue_mgmt_ops behind this new flag v2: - split setting vnic->mru into a separate patch (Wojciech) - clarify why napi_enable()/disable() is removed ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:03 +01:00
David Wei	97cbf3d0ac	bnxt_en: only set dev->queue_mgmt_ops if supported by FW The queue API calls bnxt_hwrm_vnic_update() to stop/start the flow of packets, which can only properly flush the pipeline if FW indicates support. Add a macro BNXT_SUPPORTS_QUEUE_API that checks for the required flags and only set queue_mgmt_ops if true. Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00

1 2 3 4 5 ...

1295183 Commits