linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-13 17:39:23 -04:00

Author	SHA1	Message	Date
Qingfang Deng	952d732536	net: ethernet: mediatek: add EEE support Add EEE support to MediaTek SoC Ethernet. The register fields are similar to the ones in MT7531, except that the LPI threshold is in milliseconds. Signed-off-by: Qingfang Deng <dqfext@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250217094022.1065436-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:09:24 -08:00
Pei Xiao	9faaaef27c	net: freescale: ucc_geth: make ugeth_mac_ops be static const sparse warning: sparse: symbol 'ugeth_mac_ops' was not declared. Should it be static. Add static to fix sparse warnings and add const. phylink_create() will accept a const struct. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/202502141128.9HfxcdIE-lkp@intel.com Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:08:18 -08:00
Jakub Kicinski	59ed446bc4	Merge branch 'net-phy-improve-and-simplify-eee-handling-in-phylib' Heiner Kallweit says: ==================== net: phy: improve and simplify EEE handling in phylib This series improves and simplifies phylib's EEE handling. ==================== Link: https://patch.msgid.link/3caa3151-13ac-44a8-9bb6-20f82563f698@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:07:10 -08:00
Heiner Kallweit	809265fe96	net: phy: c45: remove local advertisement parameter from genphy_c45_eee_is_active After the last user has gone, we can remove the local advertisement parameter from genphy_c45_eee_is_active. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/bd121330-9e28-4bc8-8422-794bd54d561f@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:07:09 -08:00
Heiner Kallweit	199d0ce385	net: phy: c45: use cached EEE advertisement in genphy_c45_ethtool_get_eee Now that disabled EEE modes are considered when populating advertising_eee, we can use this bitmap here instead of reading the PHY register. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/e57ed3d4-d0bc-4f91-83f6-8f48dfb6d7d7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:07:09 -08:00
Heiner Kallweit	aa951feb54	net: phy: c45: Don't silently remove disabled EEE modes any longer when writing advertisement register advertising_eee is adjusted now whenever an EEE mode gets disabled. Therefore we can remove the silent removal of disabled EEE modes here. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/e95b9dad-24a7-4e3e-9af9-6f0770cf1520@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:07:09 -08:00
Heiner Kallweit	7f33fea6bb	net: phy: remove disabled EEE modes from advertising_eee in phy_probe A PHY driver may populate eee_disabled_modes in its probe or get_features callback, therefore filter the EEE advertisement read from the PHY. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/493f3e2e-9cfc-445d-adbe-58d9c117a489@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:07:09 -08:00
Heiner Kallweit	a9b6a860d7	net: phy: improve phy_disable_eee_mode If a mode is to be disabled, remove it from advertising_eee. Disabling EEE modes shall be done before calling phy_start(), warn if that's not the case. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/92164896-38ff-4474-b98b-e83fc05b9509@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:07:08 -08:00
Heiner Kallweit	8a6a77bb5a	net: phy: move definition of phy_is_started before phy_disable_eee_mode In preparation of a follow-up patch, move phy_is_started() to before phy_disable_eee_mode(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/04d1e7a5-f4c0-42ab-8fa4-88ad26b74813@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:07:08 -08:00
Heiner Kallweit	fabcfd6d10	net: phy: realtek: add defines for shadowed c45 standard registers Realtek shadows standard c45 registers in VEND2 device register space. Add defines for these VEND2 registers, based on the names of the standard c45 registers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/c90bdf76-f8b8-4d06-9656-7a52d5658ee6@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:04:05 -08:00
Siddh Raman Pant	438989137a	netlink: Unset cb_running when terminating dump on release When we terminated the dump, the callback isn't running, so cb_running should be set to false to be logically consistent. cb_running signifies whether a dump is ongoing. It is set to true in cb->start(), and is checked in netlink_dump() to be true initially. After the dump, it is set to false in the same function. This is just a cleanup, no path should access this field on a closed socket. Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com> Link: https://patch.msgid.link/aff028e3eb2b768b9895fa6349fa1981ae22f098.camel@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:03:12 -08:00
Jakub Kicinski	d5b595d3ae	Merge branch 'net-cadence-macb-modernize-statistics-reporting' Sean Anderson says: ==================== net: cadence: macb: Modernize statistics reporting Implement the modern interfaces for statistics reporting. ==================== Link: https://patch.msgid.link/20250214212703.2618652-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:00:10 -08:00
Sean Anderson	f6af690a29	net: cadence: macb: Report standard stats Report standard statistics using the dedicated callbacks instead of get_ethtool_stats. OCTTX is split over two registers. Accumulating these registers separately in gem_stats just means we need to combine them again later. Instead, combine these stats before saving them, like is done for ethtool_stats. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250214212703.2618652-3-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:00:08 -08:00
Sean Anderson	75696dd0fd	net: cadence: macb: Convert to get_stats64 Convert the existing get_stats implementation to get_stats64. Since we now report 64-bit values, increase the counters to 64-bits as well. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250214212703.2618652-2-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 18:00:07 -08:00
Sean Anderson	c900e49d58	net: xilinx: axienet: Implement BQL Implement byte queue limits to allow queueing disciplines to account for packets enqueued in the ring buffers but not yet transmitted. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250214211252.2615573-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 17:54:35 -08:00
Heiner Kallweit	8af2136e77	net: phy: realtek: add helper RTL822X_VND2_C22_REG C22 register space is mapped to 0xa400 in MMD VEND2 register space. Add a helper to access mapped C22 registers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/6344277b-c5c7-449b-ac89-d5425306ca76@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 15:37:11 -08:00
Jakub Kicinski	2e864f18e5	Merge branch 'eth-mlx4-use-the-page-pool-for-rx-buffers' Jakub Kicinski says: ==================== eth: mlx4: use the page pool for Rx buffers Convert mlx4 to page pool. I've been sitting on these patches for over a year, and Jonathan Lemon had a similar series years before. We never deployed it or sent upstream because it didn't really show much perf win under normal load (admittedly I think the real testing was done before Ilias's work on recycling). During the v6.9 kernel rollout Meta's CDN team noticed that machines with CX3 Pro (mlx4) are prone to overloads (double digit % of CPU time spent mapping buffers in the IOMMU). The problem does not occur with modern NICs, so I dusted off this series and reportedly it still works. And it makes the problem go away, no overloads, perf back in line with older kernels. Something must have changed in IOMMU code, I guess. This series is very simple, and can very likely be optimized further. Thing is, I don't have access to any CX3 Pro NICs. They only exist in CDN locations which haven't had a HW refresh for a while. So I can say this series survives a week under traffic w/ XDP enabled, but my ability to iterate and improve is a bit limited. v2: https://lore.kernel.org/20250211192141.619024-1-kuba@kernel.org v1: https://lore.kernel.org/20250205031213.358973-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250213010635.1354034-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 15:32:22 -08:00
Jakub Kicinski	82b023c97f	eth: mlx4: use the page pool for Rx buffers Simple conversion to page pool. Preserve the current fragmentation logic / page splitting. Each page starts with a single frag reference, and then we bump that when attaching to skbs. This can likely be optimized further. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250213010635.1354034-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 15:32:20 -08:00
Jakub Kicinski	d17fb2c055	eth: mlx4: remove the local XDP fast-recycling ring It will be replaced with page pool's built-in recycling. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250213010635.1354034-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 15:32:20 -08:00
Jakub Kicinski	8fdeafd66e	eth: mlx4: don't try to complete XDP frames in netpoll mlx4 doesn't support ndo_xdp_xmit / XDP_REDIRECT and wasn't using page pool until now, so it could run XDP completions in netpoll (NAPI budget == 0) just fine. Page pool has calling context requirements, make sure we don't try to call it from what is potentially HW IRQ context. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250213010635.1354034-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 15:32:20 -08:00
Jakub Kicinski	8533b14b3d	eth: mlx4: create a page pool for Rx Create a pool per rx queue. Subsequent patches will make use of it. Move fcs_del to a hole to make space for the pointer. Per common "wisdom" base the page pool size on the ring size. Note that the page pool cache size is in full pages, so just round up the effective buffer size to pages. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250213010635.1354034-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-18 15:32:20 -08:00
Niklas Söderlund	4991b88c25	net: phy: marvell-88q2xxx: Init PHY private structure for mv88q211x When adding LED support for mv88q222x devices the PHY private data structure was added to the mv88q211x code path, the data structure is however only allocated during mv88q222x probe. This results in a nullptr deference for mv88q2110 devices. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000001 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [0000000000000001] user address but active_mm is swapper Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc1-arm64-renesas-00342-ga3783dbf2574 #7 Hardware name: Renesas White Hawk Single board based on r8a779g2 (DT) pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mv88q2xxx_config_init+0x28/0x84 lr : mv88q2110_config_init+0x98/0xb0 sp : ffff8000823eb9d0 x29: ffff8000823eb9d0 x28: ffff000440942000 x27: ffff80008144e400 x26: 0000000000001002 x25: 0000000000000000 x24: 0000000000000000 x23: 0000000000000009 x22: ffff8000810534f0 x21: ffff800081053550 x20: 0000000000000000 x19: ffff0004437d6800 x18: 0000000000000018 x17: 00000000000961c8 x16: ffff0006bef75ec0 x15: 0000000000000001 x14: 0000000000000001 x13: ffff000440218080 x12: 071c71c71c71c71c x11: ffff000440218080 x10: 0000000000001420 x9 : ffff8000823eb770 x8 : ffff8000823eb650 x7 : ffff8000823eb750 x6 : ffff8000823eb710 x5 : 0000000000000000 x4 : 0000000000000800 x3 : 0000000000000001 x2 : 0000000000000000 x1 : 00000000ffffffff x0 : ffff0004437d6800 Call trace: mv88q2xxx_config_init+0x28/0x84 (P) mv88q2110_config_init+0x98/0xb0 phy_init_hw+0x64/0x9c phy_attach_direct+0x118/0x320 phy_connect_direct+0x24/0x80 of_phy_connect+0x5c/0xa0 rtsn_open+0x5bc/0x78c __dev_open+0xf8/0x1fc __dev_change_flags+0x198/0x220 dev_change_flags+0x20/0x64 ip_auto_config+0x270/0xefc do_one_initcall+0xe4/0x22c kernel_init_freeable+0x2a8/0x308 kernel_init+0x20/0x130 ret_from_fork+0x10/0x20 Code: b907e404 f9432814 3100083f 540000e3 (39400680) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x000,00000070,00801250,8200700b Memory Limit: none ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- Fix this by using a generic probe function for both mv88q211x and mv88q222x devices that allocates the PHY private data structure, while only the mv88q222x probes for LED support. Fixes: `a3783dbf25` ("net: phy: marvell-88q2xxx: Add support for PHY LEDs on 88q2xxx") Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20250214174650.2056949-1-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 15:33:41 +01:00
Breno Leitao	8e677a4661	trace: tcp: Add tracepoint for tcp_cwnd_reduction() Add a lightweight tracepoint to monitor TCP congestion window adjustments via tcp_cwnd_reduction(). This tracepoint enables tracking of: - TCP window size fluctuations - Active socket behavior - Congestion window reduction events Meta has been using BPF programs to monitor this function for years. Adding a proper tracepoint provides a stable API for all users who need to monitor TCP congestion window behavior. Use DECLARE_TRACE instead of TRACE_EVENT to avoid creating trace event infrastructure and exporting to tracefs, keeping the implementation minimal. (Thanks Steven Rostedt) Given that this patch creates a rawtracepoint, you could hook into it using regular tooling, like bpftrace, using regular rawtracepoint infrastructure, such as: rawtracepoint:tcp_cwnd_reduction_tp { .... } Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250214-cwnd_tracepoint-v2-1-ef8d15162d95@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 15:29:53 +01:00
Paolo Abeni	8f17a6a861	Merge branch 'net-phy-marvell-88q2xxx-cleanup' Dimitri Fedrau says: ==================== net: phy: marvell-88q2xxx: cleanup - align defines - order includes alphabetically - enable temperature sensor in mv88q2xxx_config_init Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> ==================== Link: https://patch.msgid.link/20250214-marvell-88q2xxx-cleanup-v1-0-71d67c20f308@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 13:39:43 +01:00
Dimitri Fedrau	6c806720ba	net: phy: marvell-88q2xxx: enable temperature sensor in mv88q2xxx_config_init Temperature sensor gets enabled for 88Q222X devices in mv88q222x_config_init. Move enabling to mv88q2xxx_config_init because all 88Q2XXX devices support the temperature sensor. Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 13:39:40 +01:00
Dimitri Fedrau	cbe0449e8f	net: phy: marvell-88q2xxx: order includes alphabetically Order includes alphabetically. Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 13:39:40 +01:00
Dimitri Fedrau	8dcaed624f	net: phy: marvell-88q2xxx: align defines Align some defines. Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 13:39:40 +01:00
Paolo Abeni	01072deab3	Merge branch 'vxlan-join-leave-mc-group-when-reconfigured' Petr Machata says: ==================== vxlan: Join / leave MC group when reconfigured When a vxlan netdevice is brought up, if its default remote is a multicast address, the device joins the indicated group. Therefore when the multicast remote address changes, the device should leave the current group and subscribe to the new one. Similarly when the interface used for endpoint communication is changed in a situation when multicast remote is configured. This is currently not done. Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is possible that with such fix, the netdevice will end up in an inconsistent situation where the old group is not joined anymore, but joining the new group fails. Should we join the new group first, and leave the old one second, we might end up in the opposite situation, where both groups are joined. Undoing any of this during rollback is going to be similarly problematic. One solution would be to just forbid the change when the netdevice is up. However in vnifilter mode, changing the group address is allowed, and these problems are simply ignored (see vxlan_vni_update_group()): # ip link add name br up type bridge vlan_filtering 1 # ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789 # bridge vni add dev vx1 vni 200 group 224.0.0.1 # tcpdump -i lo & # bridge vni add dev vx1 vni 200 group 224.0.0.2 18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s) 18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s) # bridge vni dev vni group/remote vx1 200 224.0.0.2 Having two different modes of operation for conceptually the same interface is silly, so in this patchset, just do what the vnifilter code does and deal with the errors by crossing fingers real hard. ==================== Link: https://patch.msgid.link/cover.1739548836.git.petrm@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 13:06:50 +01:00
Petr Machata	eae1e92a1d	selftests: test_vxlan_fdb_changelink: Add a test for MC remote change Changes to MC remote need to be reflected in actual group memberships. Add a test to verify that it is the case. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 13:06:44 +01:00
Petr Machata	24adf47ea9	selftests: test_vxlan_fdb_changelink: Convert to lib.sh Instead of inlining equivalents, use lib.sh-provided primitives. Use defer to manage vx lifetime. This will make it easier to extend the test in the next patch. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 13:06:44 +01:00
Petr Machata	f802f172d7	selftests: forwarding: lib: Move require_command to net, generalize This helper could be useful to more than just forwarding tests. Move it upstairs and port over to log_test_skip(). Split the function into two parts: the bit that actually checks and reports skip, which is in a new function check_command(). And a bit that exits the test script if the check fails. This allows users consistent checking behavior while giving an option to bail out from a single test without bailing out of the whole script. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 13:06:43 +01:00
Petr Machata	d42d543368	vxlan: Join / leave MC group after remote changes When a vxlan netdevice is brought up, if its default remote is a multicast address, the device joins the indicated group. Therefore when the multicast remote address changes, the device should leave the current group and subscribe to the new one. Similarly when the interface used for endpoint communication is changed in a situation when multicast remote is configured. This is currently not done. Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is possible that with such fix, the netdevice will end up in an inconsistent situation where the old group is not joined anymore, but joining the new group fails. Should we join the new group first, and leave the old one second, we might end up in the opposite situation, where both groups are joined. Undoing any of this during rollback is going to be similarly problematic. One solution would be to just forbid the change when the netdevice is up. However in vnifilter mode, changing the group address is allowed, and these problems are simply ignored (see vxlan_vni_update_group()): # ip link add name br up type bridge vlan_filtering 1 # ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789 # bridge vni add dev vx1 vni 200 group 224.0.0.1 # tcpdump -i lo & # bridge vni add dev vx1 vni 200 group 224.0.0.2 18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s) 18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s) # bridge vni dev vni group/remote vx1 200 224.0.0.2 Having two different modes of operation for conceptually the same interface is silly, so in this patch, just do what the vnifilter code does and deal with the errors by crossing fingers real hard. The vnifilter code leaves old before joining new, and in case of join / leave failures does not roll back the configuration changes that have already been applied, but bails out of joining if it could not leave. Do the same here: leave before join, apply changes unconditionally and do not attempt to join if we couldn't leave. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 13:06:43 +01:00
Petr Machata	5afb1596b9	vxlan: Drop 'changelink' parameter from vxlan_dev_configure() vxlan_dev_configure() only has a single caller that passes false for the changelink parameter. Drop the parameter and inline the sole value. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 13:06:43 +01:00
Jason Xing	43130d02ba	page_pool: avoid infinite loop to schedule delayed worker We noticed the kworker in page_pool_release_retry() was waken up repeatedly and infinitely in production because of the buggy driver causing the inflight less than 0 and warning us in page_pool_inflight()[1]. Since the inflight value goes negative, it means we should not expect the whole page_pool to get back to work normally. This patch mitigates the adverse effect by not rescheduling the kworker when detecting the inflight negative in page_pool_release_retry(). [1] [Mon Feb 10 20:36:11 2025] ------------[ cut here ]------------ [Mon Feb 10 20:36:11 2025] Negative(-51446) inflight packet-pages ... [Mon Feb 10 20:36:11 2025] Call Trace: [Mon Feb 10 20:36:11 2025] page_pool_release_retry+0x23/0x70 [Mon Feb 10 20:36:11 2025] process_one_work+0x1b1/0x370 [Mon Feb 10 20:36:11 2025] worker_thread+0x37/0x3a0 [Mon Feb 10 20:36:11 2025] kthread+0x11a/0x140 [Mon Feb 10 20:36:11 2025] ? process_one_work+0x370/0x370 [Mon Feb 10 20:36:11 2025] ? __kthread_cancel_work+0x40/0x40 [Mon Feb 10 20:36:11 2025] ret_from_fork+0x35/0x40 [Mon Feb 10 20:36:11 2025] ---[ end trace ebffe800f33e7e34 ]--- Note: before this patch, the above calltrace would flood the dmesg due to repeated reschedule of release_dw kworker. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250214064250.85987-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 12:48:29 +01:00
Paolo Abeni	b4cb730862	Merge branch 'add-af_xdp-support-for-cn10k' Suman Ghosh says: ==================== Add af_xdp support for cn10k This patchset includes changes to support AF_XDP for cn10k chipsets. Both non-zero copy and zero copy will be supported after these changes. Also, the RSS will be reconfigured once a particular receive queue is added/removed to/from AF_XDP support. Patch #1: octeontx2-pf: use xdp_return_frame() to free xdp buffers Patch #2: octeontx2-pf: Add AF_XDP non-zero copy support Patch #3: octeontx2-pf: AF_XDP zero copy receive support Patch #4: octeontx2-pf: Reconfigure RSS table after enabling AF_XDP zerocopy on rx queue Patch #5: octeontx2-pf: Prepare for AF_XDP transmit Patch #6: octeontx2-pf: AF_XDP zero copy transmit support ==================== Link: https://patch.msgid.link/20250213053141.2833254-1-sumang@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 11:36:30 +01:00
Suman Ghosh	53616af09b	octeontx2-pf: AF_XDP zero copy transmit support This patch implements below changes, 1. To avoid concurrency with normal traffic uses XDP queues. 2. Since there are chances that XDP and AF_XDP can fall under same queue uses separate flags to handle dma buffers. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 11:36:27 +01:00
Suman Ghosh	c5c2398eb8	octeontx2-pf: Prepare for AF_XDP Implement necessary APIs required for AF_XDP transmit. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 11:36:27 +01:00
Suman Ghosh	25b07c1a86	octeontx2-pf: Reconfigure RSS table after enabling AF_XDP zerocopy on rx queue RSS table needs to be reconfigured once a rx queue is enabled or disabled for AF_XDP zerocopy support. After enabling UMEM on a rx queue, that queue should not be part of RSS queue selection algorithm. Similarly the queue should be considered again after UMEM is disabled. Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 11:36:27 +01:00
Suman Ghosh	efabce2901	octeontx2-pf: AF_XDP zero copy receive support This patch adds support to AF_XDP zero copy for CN10K. This patch specifically adds receive side support. In this approach once a xdp program with zero copy support on a specific rx queue is enabled, then that receive quse is disabled/detached from the existing kernel queue and re-assigned to the umem memory. Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 11:36:27 +01:00
Suman Ghosh	b4164de504	octeontx2-pf: Add AF_XDP non-zero copy support Set xdp rx ring memory type as MEM_TYPE_PAGE_POOL for af-xdp to work. This is needed since xdp_return_frame internally will use page pools. Fixes: `06059a1a9a` ("octeontx2-pf: Add XDP support to netdev PF") Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 11:36:27 +01:00
Suman Ghosh	94c80f7488	octeontx2-pf: use xdp_return_frame() to free xdp buffers xdp_return_frames() will help to free the xdp frames and their associated pages back to page pool. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-18 11:36:27 +01:00
Jakub Kicinski	0f375d90c4	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice, iavf: Add support for Rx timestamping Mateusz Polchlopek says: Initially, during VF creation it registers the PTP clock in the system and negotiates with PF it's capabilities. In the meantime the PF enables the Flexible Descriptor for VF. Only this type of descriptor allows to receive Rx timestamps. Enabling virtual clock would be possible, though it would probably perform poorly due to the lack of direct time access. Enable timestamping should be done using userspace tools, e.g. hwstamp_ctl -i $VF -r 14 In order to report the timestamps to userspace, the VF extends timestamp to 40b. To support this feature the flexible descriptors and PTP part in iavf driver have been introduced. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: iavf: add support for Rx timestamps to hotpath iavf: handle set and get timestamps ops iavf: Implement checking DD desc field iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors iavf: define Rx descriptors as qwords libeth: move idpf_rx_csum_decoded and idpf_rx_extracted iavf: periodically cache PHC time iavf: add support for indirect access to PHC time iavf: add initial framework for registering PTP clock iavf: negotiate PTP capabilities iavf: add support for negotiating flexible RXDID format virtchnl: add enumeration for the rxdid format ice: support Rx timestamp on flex descriptor virtchnl: add support for enabling PTP on iAVF ==================== Link: https://patch.msgid.link/20250214192739.1175740-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-17 17:09:44 -08:00
Jakub Kicinski	b0b0f52042	eth: fbnic: support TCP segmentation offload Add TSO support to the driver. Device can handle unencapsulated or IPv6-in-IPv6 packets. Any other tunnel stacks are handled with GSO partial. Validate that the packet can be offloaded in ndo_features_check. Main thing we need to check for is that the header geometry can be expressed in the decriptor fields (offsets aren't too large). Report number of TSO super-packets via the qstat API. Link: https://patch.msgid.link/20250216174109.2808351-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-17 16:54:41 -08:00
Jakub Kicinski	b5e489003a	netdev: clarify GSO vs csum in qstats Could be just me, but I had to pause and double check that the Tx csum counter in qstat should include GSO'd packets. GSO pretty much implies csum so one could possibly interpret the csum counter as pure csum offload. But the counters are based on virtio: [tx_needs_csum] The number of packets which require checksum calculation by the device. [rx_needs_csum] The number of packets with VIRTIO_NET_HDR_F_NEEDS_CSUM. and VIRTIO_NET_HDR_F_NEEDS_CSUM gets set on GSO packets virtio sends. Clarify this in the spec to avoid any confusion. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250214224601.2271201-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-17 16:48:32 -08:00
Jakub Kicinski	637026e591	net: move stale comment about ntuple validation Gal points out that the comment now belongs further down, since the original if condition was split into two in commit `de7f7582df` ("net: ethtool: prevent flow steering to RSS contexts which don't exist") Link: https://lore.kernel.org/de4a2a8a-1eb9-4fa8-af87-7526e58218e9@nvidia.com Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250214224340.2268691-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-17 16:47:01 -08:00
Jakub Kicinski	24fc595edb	Merge branch 'netdev-genl-add-an-xsk-attribute-to-queues' Joe Damato says: ==================== netdev-genl: Add an xsk attribute to queues This is an attempt to followup on something Jakub asked me about [1], adding an xsk attribute to queues and more clearly documenting which queues are linked to NAPIs... After the RFC [2], Jakub suggested creating an empty nest for queues which have a pool, so I've adjusted this version to work that way. The nest can be extended in the future to express attributes about XSK as needed. Queues which are not used for AF_XDP do not have the xsk attribute present. I've run the included test on: - my mlx5 machine (via NETIF=) - without setting NETIF And the test seems to pass in both cases. [1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/ [2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/ ==================== Link: https://patch.msgid.link/20250214211255.14194-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-17 16:46:05 -08:00
Joe Damato	788e52e2b6	selftests: drv-net: Test queue xsk attribute Test that queues which are used for AF_XDP have the xsk nest attribute. The attribute is currently empty, but its existence means the AF_XDP is being used for the queue. Enable CONFIG_XDP_SOCKETS for selftests/drivers/net tests, as well. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-4-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-17 16:46:03 -08:00
Joe Damato	df524c8f57	netdev-genl: Add an XSK attribute to queues Expose a new per-queue nest attribute, xsk, which will be present for queues that are being used for AF_XDP. If the queue is not being used for AF_XDP, the nest will not be present. In the future, this attribute can be extended to include more data about XSK as it is needed. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-17 16:46:03 -08:00
Joe Damato	a127c18462	netlink: Add nla_put_empty_nest helper Creating empty nests is helpful when the exact attributes to be exposed in the future are not known. Encapsulate the logic in a helper. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-17 16:46:03 -08:00
Anna Emese Nyiri	c935af429e	selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITY Introduce tests to verify the correct functionality of the SO_RCVMARK and SO_RCVPRIORITY socket options. Suggested-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250214205828.48503-1-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-17 16:45:19 -08:00

1 2 3 4 5 ...

1336291 Commits