linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-05 19:24:01 -04:00

Author	SHA1	Message	Date
Eric Dumazet	d677aebd66	tcp: move sysctl_tcp_l3mdev_accept to netns_ipv4_read_rx sysctl_tcp_l3mdev_accept is read from TCP receive fast path from tcp_v6_early_demux(), __inet6_lookup_established, inet_request_bound_dev_if(). Move it to netns_ipv4_read_rx. Remove the '#ifdef CONFIG_NET_L3_MASTER_DEV' that was guarding its definition. Note this adds a hole of three bytes that could be filled later. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Cc: Wei Wang <weiwan@google.com> Cc: Coco Li <lixiaoyan@google.com> Link: https://patch.msgid.link/20241010034100.320832-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-11 08:45:24 -07:00
Aryan Srivastava	7e5b547cac	net: phy: aquantia: poll status register The system interface connection status register is not immediately correct upon line side link up. This results in the status being read as OFF and then transitioning to the correct host side link mode with a short delay. This causes the phylink framework passing the OFF status down to all MAC config drivers, resulting in the host side link being misconfigured, which in turn can lead to link flapping or complete packet loss in some cases. Mitigate this by periodically polling the register until it not showing the OFF state. This will be done every 1ms for 10ms, using the same poll/timeout as the processor intensive operation reads. If the phy is still expressing the OFF state after the timeout, then set the link to false and pass the NA interface mode onto the phylink framework. Signed-off-by: Aryan Srivastava <aryan.srivastava@alliedtelesis.co.nz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241010004935.1774601-1-aryan.srivastava@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-11 08:43:47 -07:00
Jakub Kicinski	8401a108a6	eth: remove the DLink/Sundance (ST201) driver Konstantin reports the maintainer's address bounces. There is no other maintainer and the driver is quite old. There is a good chance nobody is using this driver any more. Let's try to remove it completely, we can revert it back in if someone complains. Link: https://lore.kernel.org/20240925-bizarre-earwig-from-pluto-1484aa@lemu/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Denis Kirjanov <dkirjanov@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-10-11 13:08:23 +01:00
Jakub Kicinski	59ae83dcf1	Merge branch 'tg3-link-irqs-napis-and-queues' Joe Damato says: ==================== tg3: Link IRQs, NAPIs, and queues This follows from a previous RFC (wherein I botched the subject lines of all the messages) [1]. I've taken Michael Chan's suggestion on modifying patch 2 and I've updated the commit messages of both patches to test and show the output for the default 1 TX 4 RX queues and the 4 TX and 4 RX queues cases. Reviewers: please check the commit messages carefully to ensure the output is correct (or on your own systems to verify, if you like). I am not a tg3 expert and it's possible that I got something wrong. [1]: https://lore.kernel.org/all/20241005145717.302575-3-jdamato@fastly.com/T/ ==================== Link: https://patch.msgid.link/20241009175509.31753-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 18:40:34 -07:00
Joe Damato	aec5514d73	tg3: Link queues to NAPIs Link queues to NAPIs using the netdev-genl API so this information is queryable. First, test with the default setting on my tg3 NIC at boot with 1 TX queue: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}] Now, adjust the number of TX queues to be 4 via ethtool: $ sudo ethtool -L eth0 tx 4 $ sudo ethtool -l eth0 \| tail -5 Current hardware settings: RX: 4 TX: 4 Other: n/a Combined: n/a Despite "Combined: n/a" in the ethtool output, /proc/interrupts shows the tg3 has renamed the IRQs to be combined: 343: [...] eth0-0 344: [...] eth0-txrx-1 345: [...] eth0-txrx-2 346: [...] eth0-txrx-3 347: [...] eth0-txrx-4 Now query this via netlink to ensure the queues are linked properly to their NAPIs: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'tx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'tx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'tx'}] As you can see above, id 0 for both TX and RX share a NAPI, NAPI ID 8960, and so on for each queue index up to 3. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241009175509.31753-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 18:40:29 -07:00
Joe Damato	25118cce66	tg3: Link IRQs to NAPI instances Link IRQs to NAPI instances with netif_napi_set_irq. This information can be queried with the netdev-genl API. Begin by testing my tg3 device in its default state: 1 TX queue and 4 RX queues. Compare the output of /proc/interrupts for my tg3 device with the output of netdev-genl after applying this patch: $ cat /proc/interrupts \| grep eth0 343: [...] eth0-tx-0 344: [...] eth0-rx-1 345: [...] eth0-rx-2 346: [...] eth0-rx-3 347: [...] eth0-rx-4 As you can see above, tg3 has named the IRQs such that there is a dedicated tx IRQ and 4 dedicated rx IRQs, for a total of 5 IRQs. $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8197, 'ifindex': 2, 'irq': 347}, {'id': 8196, 'ifindex': 2, 'irq': 346}, {'id': 8195, 'ifindex': 2, 'irq': 345}, {'id': 8194, 'ifindex': 2, 'irq': 344}, {'id': 8193, 'ifindex': 2, 'irq': 343}] Netlink displays the same IRQs as above, noting that each is mapped to a unique NAPI instance. Now, reconfigure the NIC to have 4 TX queues and 4 RX queues: $ sudo ethtool -L eth0 rx 4 tx 4 $ sudo ethtool -l eth0 \| tail -5 Current hardware settings: RX: 4 TX: 4 Other: n/a Combined: n/a Examine /proc/interrupts once again, noting that tg3 will now rename the IRQs to suggest that they are combined tx and rx without allocating additional IRQs, so the total IRQ count in /proc/interrupts is unchanged: 343: [...] eth0-0 344: [...] eth0-txrx-1 345: [...] eth0-txrx-2 346: [...] eth0-txrx-3 347: [...] eth0-txrx-4 Check the output from netlink again: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8973, 'ifindex': 2, 'irq': 347}, {'id': 8972, 'ifindex': 2, 'irq': 346}, {'id': 8971, 'ifindex': 2, 'irq': 345}, {'id': 8970, 'ifindex': 2, 'irq': 344}, {'id': 8969, 'ifindex': 2, 'irq': 343}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241009175509.31753-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 18:40:29 -07:00
Heiner Kallweit	854d71c555	r8169: remove original workaround for RTL8125 broken rx issue Now that we have `b9c7ac4fe2` ("r8169: disable ALDPS per default for RTL8125"), the first attempt to fix the issue shouldn't be needed any longer. So let's effectively revert `621735f590` ("r8169: fix rare issue with broken rx after link-down on RTL8125") and see whether anybody complains. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/382d8c88-cbce-400f-ad62-fda0181c7e38@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 18:36:28 -07:00
Heiner Kallweit	87e26448db	r8169: don't apply UDP padding quirk on RTL8126A Vendor drivers r8125/r8126 indicate that this quirk isn't needed any longer for RTL8126A. Mimic this in r8169. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/d1317187-aa81-4a69-b831-678436e4de62@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 18:36:06 -07:00
Jakub Kicinski	9c0fc36ec4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.12-rc3). No conflicts and no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 13:13:33 -07:00
Linus Torvalds	1d227fcc72	Merge tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth and netfilter. Current release - regressions: - dsa: sja1105: fix reception from VLAN-unaware bridges - Revert "net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is enabled" - eth: fec: don't save PTP state if PTP is unsupported Current release - new code bugs: - smc: fix lack of icsk_syn_mss with IPPROTO_SMC, prevent null-deref - eth: airoha: update Tx CPU DMA ring idx at the end of xmit loop - phy: aquantia: AQR115c fix up PMA capabilities Previous releases - regressions: - tcp: 3 fixes for retrans_stamp and undo logic Previous releases - always broken: - net: do not delay dst_entries_add() in dst_release() - netfilter: restrict xtables extensions to families that are safe, syzbot found a way to combine ebtables with extensions that are never used by userspace tools - sctp: ensure sk_state is set to CLOSED if hashing fails in sctp_listen_start - mptcp: handle consistently DSS corruption, and prevent corruption due to large pmtu xmit" * tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) MAINTAINERS: Add headers and mailing list to UDP section MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL] slip: make slhc_remember() more robust against malicious packets net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC ppp: fix ppp_async_encode() illegal access docs: netdev: document guidance on cleanup patches phonet: Handle error of rtnl_register_module(). mpls: Handle error of rtnl_register_module(). mctp: Handle error of rtnl_register_module(). bridge: Handle error of rtnl_register_module(). vxlan: Handle error of rtnl_register_module(). rtnetlink: Add bulk registration helpers for rtnetlink message handlers. net: do not delay dst_entries_add() in dst_release() mptcp: pm: do not remove closing subflows mptcp: fallback when MPTCP opts are dropped after 1st data tcp: fix mptcp DSS corruption due to large pmtu xmit mptcp: handle consistently DSS corruption net: netconsole: fix wrong warning net: dsa: refuse cross-chip mirroring operations net: fec: don't save PTP state if PTP is unsupported ...	2024-10-10 12:36:35 -07:00
Linus Torvalds	0edab8d132	Merge tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: "Ring-buffer fix: do not have boot-mapped buffers use CPU hotplug callbacks When a ring buffer is mapped to memory assigned at boot, it also splits it up evenly between the possible CPUs. But the allocation code still attached a CPU notifier callback to this ring buffer. When a CPU is added, the callback will happen and another per-cpu buffer is created for the ring buffer. But for boot mapped buffers, there is no room to add another one (as they were all created already). The result of calling the CPU hotplug notifier on a boot mapped ring buffer is unpredictable and could lead to a system crash. If the ring buffer is boot mapped simply do not attach the CPU notifier to it" * tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Do not have boot mapped buffers hook to CPU hotplug	2024-10-10 12:25:32 -07:00
Linus Torvalds	eb952c47d1	Merge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - update fstrim loop and add more cancellation points, fix reported delayed or blocked suspend if there's a huge chunk queued - fix error handling in recent qgroup xarray conversion - in zoned mode, fix warning printing device path without RCU protection - again fix invalid extent xarray state (`6252690f7e`), lost due to refactoring * tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix clear_dirty and writeback ordering in submit_one_sector() btrfs: zoned: fix missing RCU locking in error message when loading zone info btrfs: fix missing error handling when adding delayed ref with qgroups enabled btrfs: add cancellation points to trim loops btrfs: split remaining space to discard in chunks	2024-10-10 10:02:59 -07:00
Linus Torvalds	5870963f6c	Merge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix NFSD bring-up / shutdown - Fix a UAF when releasing a stateid * tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: fix possible badness in FREE_STATEID nfsd: nfsd_destroy_serv() must call svc_destroy() even if nfsd_startup_net() failed NFSD: Mark filecache "down" if init fails	2024-10-10 09:52:49 -07:00
Linus Torvalds	825ec756af	Merge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Carlos Maiolino: - A few small typo fixes - fstests xfs/538 DEBUG-only fix - Performance fix on blockgc on COW'ed files, by skipping trims on cowblock inodes currently opened for write - Prevent cowblocks to be freed under dirty pagecache during unshare - Update MAINTAINERS file to quote the new maintainer * tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix a typo xfs: don't free cowblocks from under dirty pagecache on unshare xfs: skip background cowblock trims on inodes open for write xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc xfs: don't ifdef around the exact minlen allocations xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split xfs: return bool from xfs_attr3_leaf_add xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate() xfs: scrub: convert comma to semicolon xfs: Remove empty declartion in header file MAINTAINERS: add Carlos Maiolino as XFS release manager	2024-10-10 09:45:45 -07:00
Jakub Kicinski	7b43ba6501	Merge branch 'maintainers-networking-file-coverage-updates' Simon Horman says: ==================== MAINTAINERS: Networking file coverage updates The aim of this proposal is to make the handling of some files, related to Networking and Wireless, more consistently. It does so by: 1. Adding some more headers to the UDP section, making it consistent with the TCP section. 2. Excluding some files relating to Wireless from NETWORKING [GENERAL], making their handling consistent with other files related to Wireless. The aim of this is to make things more consistent. And for MAINTAINERS to better reflect the situation on the ground. I am more than happy to be told that the current state of affairs is fine. Or for other ideas to be discussed. v1: https://lore.kernel.org/20241004-maint-net-hdrs-v1-0-41fd555aacc5@kernel.org ==================== Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-0-f2c86e7309c8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 09:35:51 -07:00
Simon Horman	5404b5a2fe	MAINTAINERS: Add headers and mailing list to UDP section Add netdev mailing list and some more udp.h headers to the UDP section. This is now more consistent with the TCP section. Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-2-f2c86e7309c8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 09:35:48 -07:00
Simon Horman	9937aae39b	MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL] We already exclude wireless drivers from the netdev@ traffic, to delegate it to linux-wireless@, and avoid overwhelming netdev@. Many of the following wireless-related sections MAINTAINERS are already not included in the NETWORKING [GENERAL] section. For consistency, exclude those that are. * 802.11 (including CFG80211/NL80211) * MAC80211 * RFKILL Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-1-f2c86e7309c8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 09:35:48 -07:00
Eric Dumazet	7d3fce8cbe	slip: make slhc_remember() more robust against malicious packets syzbot found that slhc_remember() was missing checks against malicious packets [1]. slhc_remember() only checked the size of the packet was at least 20, which is not good enough. We need to make sure the packet includes the IPv4 and TCP header that are supposed to be carried. Add iph and th pointers to make the code more readable. [1] BUG: KMSAN: uninit-value in slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666 slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666 ppp_receive_nonmp_frame+0xe45/0x35e0 drivers/net/ppp/ppp_generic.c:2455 ppp_receive_frame drivers/net/ppp/ppp_generic.c:2372 [inline] ppp_do_recv+0x65f/0x40d0 drivers/net/ppp/ppp_generic.c:2212 ppp_input+0x7dc/0xe60 drivers/net/ppp/ppp_generic.c:2327 pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379 sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113 __release_sock+0x1da/0x330 net/core/sock.c:3072 release_sock+0x6b/0x250 net/core/sock.c:3626 pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4091 [inline] slab_alloc_node mm/slub.c:4134 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678 alloc_skb include/linux/skbuff.h:1322 [inline] sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732 pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 0 UID: 0 PID: 5460 Comm: syz.2.33 Not tainted 6.12.0-rc2-syzkaller-00006-g87d6aab2389e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Fixes: `b5451d783a` ("slip: Move the SLIP drivers") Reported-by: syzbot+2ada1bc857496353be5a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/670646db.050a0220.3f80e.0027.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241009091132.2136321-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 09:06:32 -07:00
Simon Horman	cd959bf7c3	net/smc: Address spelling errors Address spelling errors flagged by codespell. This patch is intended to cover all files under drivers/smc Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://patch.msgid.link/20241009-smc-starspell-v1-1-b8b395bbaf82@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 09:05:20 -07:00
D. Wythe	6fd27ea183	net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC Eric report a panic on IPPROTO_SMC, and give the facts that when INET_PROTOSW_ICSK was set, icsk->icsk_sync_mss must be set too. Bug: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000086000005 EC = 0x21: IABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault user pgtable: 4k pages, 48-bit VAs, pgdp=00000001195d1000 [0000000000000000] pgd=0800000109c46003, p4d=0800000109c46003, pud=0000000000000000 Internal error: Oops: 0000000086000005 [#1] PREEMPT SMP Modules linked in: CPU: 1 UID: 0 PID: 8037 Comm: syz.3.265 Not tainted 6.11.0-rc7-syzkaller-g5f5673607153 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : 0x0 lr : cipso_v4_sock_setattr+0x2a8/0x3c0 net/ipv4/cipso_ipv4.c:1910 sp : ffff80009b887a90 x29: ffff80009b887aa0 x28: ffff80008db94050 x27: 0000000000000000 x26: 1fffe0001aa6f5b3 x25: dfff800000000000 x24: ffff0000db75da00 x23: 0000000000000000 x22: ffff0000d8b78518 x21: 0000000000000000 x20: ffff0000d537ad80 x19: ffff0000d8b78000 x18: 1fffe000366d79ee x17: ffff8000800614a8 x16: ffff800080569b84 x15: 0000000000000001 x14: 000000008b336894 x13: 00000000cd96feaa x12: 0000000000000003 x11: 0000000000040000 x10: 00000000000020a3 x9 : 1fffe0001b16f0f1 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f x5 : 0000000000000040 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000000002 x1 : 0000000000000000 x0 : ffff0000d8b78000 Call trace: 0x0 netlbl_sock_setattr+0x2e4/0x338 net/netlabel/netlabel_kapi.c:1000 smack_netlbl_add+0xa4/0x154 security/smack/smack_lsm.c:2593 smack_socket_post_create+0xa8/0x14c security/smack/smack_lsm.c:2973 security_socket_post_create+0x94/0xd4 security/security.c:4425 __sock_create+0x4c8/0x884 net/socket.c:1587 sock_create net/socket.c:1622 [inline] __sys_socket_create net/socket.c:1659 [inline] __sys_socket+0x134/0x340 net/socket.c:1706 __do_sys_socket net/socket.c:1720 [inline] __se_sys_socket net/socket.c:1718 [inline] __arm64_sys_socket+0x7c/0x94 net/socket.c:1718 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Code: ???????? ???????? ???????? ???????? (????????) ---[ end trace 0000000000000000 ]--- This patch add a toy implementation that performs a simple return to prevent such panic. This is because MSS can be set in sock_create_kern or smc_setsockopt, similar to how it's done in AF_SMC. However, for AF_SMC, there is currently no way to synchronize MSS within __sys_connect_file. This toy implementation lays the groundwork for us to support such feature for IPPROTO_SMC in the future. Fixes: `d25a92ccae` ("net/smc: Introduce IPPROTO_SMC") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://patch.msgid.link/1728456916-67035-1-git-send-email-alibuda@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:48:11 -07:00
Eric Dumazet	40dddd4b8b	ppp: fix ppp_async_encode() illegal access syzbot reported an issue in ppp_async_encode() [1] In this case, pppoe_sendmsg() is called with a zero size. Then ppp_async_encode() is called with an empty skb. BUG: KMSAN: uninit-value in ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline] BUG: KMSAN: uninit-value in ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675 ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline] ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675 ppp_async_send+0x130/0x1b0 drivers/net/ppp/ppp_async.c:634 ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2280 [inline] ppp_input+0x1f1/0xe60 drivers/net/ppp/ppp_generic.c:2304 pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379 sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113 __release_sock+0x1da/0x330 net/core/sock.c:3072 release_sock+0x6b/0x250 net/core/sock.c:3626 pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4092 [inline] slab_alloc_node mm/slub.c:4135 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4187 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678 alloc_skb include/linux/skbuff.h:1322 [inline] sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732 pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 1 UID: 0 PID: 5411 Comm: syz.1.14 Not tainted 6.12.0-rc1-syzkaller-00165-g360c1f1f24c6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot+1d121645899e7692f92a@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009185802.3763282-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:47:13 -07:00
Simon Horman	aeb218d900	docs: netdev: document guidance on cleanup patches The purpose of this section is to document what is the current practice regarding clean-up patches which address checkpatch warnings and similar problems. I feel there is a value in having this documented so others can easily refer to it. Clearly this topic is subjective. And to some extent the current practice discourages a wider range of patches than is described here. But I feel it is best to start somewhere, with the most well established part of the current practice. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009-doc-mc-clean-v2-1-e637b665fa81@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:35:20 -07:00
Jakub Kicinski	bdb5d2481a	Merge branch 'net-introduce-tx-h-w-shaping-api' Paolo Abeni says: ==================== net: introduce TX H/W shaping API We have a plurality of shaping-related drivers API, but none flexible enough to meet existing demand from vendors[1]. This series introduces new device APIs to configure in a flexible way TX H/W shaping. The new functionalities are exposed via a newly defined generic netlink interface and include introspection capabilities. Some self-tests are included, on top of a dummy netdevsim implementation. Finally a basic implementation for the iavf driver is provided. Some usage examples: * Configure shaping on a given queue: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \ --do set --json '{"ifindex": '$IFINDEX', "shaper": {"handle": {"scope": "queue", "id":'$QUEUEID'}, "bw-max": 2000000}}' * Container B/W sharing The orchestration infrastructure wants to group the container-related queues under a RR scheduling and limit the aggregate bandwidth: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID2'}, "weight": '$W2'}], {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope":"node"}, "bw-max": 10000000}' {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 0}} Q1 \ \ Q2 -- node 0 ------- netdev / (bw-max: 10M) Q3 / * Delegation A containers wants to limit the aggregate B/W bandwidth of 2 of the 3 queues it owns - the starting configuration is the one from the previous point: SPEC=Documentation/netlink/specs/net_shaper.yaml ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID2'}, "weight": '$W2'}], "handle": {"scope": "node"}, "bw-max": 5000000 }' {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 1}} Q1 -- node 1 --------\ / (bw-max: 5M) \ Q2 / node 0 ------- netdev /(bw-max: 10M) Q3 ------------------/ In a group operation, when prior to the op itself, the leaves have different parents, the user must specify the parent handle for the group. I.e., starting from the previous config: ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope": "node"}, "bw-max": 3000000 }' Netlink error: Invalid argument nl_len = 96 (80) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'All the leaves shapers must have the same old parent'} ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope": "node"}, "parent": {"scope": "node", "id": 1}, "bw-max": 3000000 } {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 2}} Q1 -- node 2 --- /(bw-max:3M)\ Q3 / \ ---- node 1 \ / (bw-max: 5M)\ Q2 node 0 ------- netdev (bw-max: 10M) * Cleanup: Still starting from config 1To delete a single queue shaper ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID3'}}' Q1 -- node 2 --- (bw-max:3M)\ \ ---- node 1 \ / (bw-max: 5M)\ Q2 node 0 ------- netdev (bw-max: 10M) Deleting a node shaper relinks all its leaves to the node's parent: ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id":2}}' Q1 ---\ \ node 1----- \ / (bw-max: 5M)\ Q2----/ node 0 ------- netdev (bw-max: 10M) Deleting the last shaper under a node shaper deletes the node, too: ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID1'}}' ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID2'}}' ./tools/net/ynl/cli.py --spec $SPEC --do get --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id": 1}}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.handle'} Such delete recurses on parents that are left over with no leaves: ./tools/net/ynl/cli.py --spec $SPEC --do get --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id": 0}}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.handle'} v8: https://lore.kernel.org/cover.1727704215.git.pabeni@redhat.com v7: https://lore.kernel.org/cover.1725919039.git.pabeni@redhat.com v6: https://lore.kernel.org/cover.1725457317.git.pabeni@redhat.com v5: https://lore.kernel.org/cover.1724944116.git.pabeni@redhat.com v4: https://lore.kernel.org/cover.1724165948.git.pabeni@redhat.com v3: https://lore.kernel.org/cover.1722357745.git.pabeni@redhat.com RFC v2: https://lore.kernel.org/cover.1721851988.git.pabeni@redhat.com RFC v1: https://lore.kernel.org/cover.1719518113.git.pabeni@redhat.com ==================== Link: https://patch.msgid.link/cover.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:32:46 -07:00
Sudheer Mogilappagari	4c1a457cb8	iavf: add support to exchange qos capabilities During driver initialization VF determines QOS capability is allowed by PF and receives QOS parameters. After which quanta size for queues is configured which is not configurable and is set to 1KB currently. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/72cbeb9c88d40e557053c57d7531c96bed490576.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:23 -07:00
Sudheer Mogilappagari	ef490bbb22	iavf: Add net_shaper_ops support Implement net_shaper_ops support for IAVF. This enables configuration of rate limiting on per queue basis. Customer intends to enforce bandwidth limit on Tx traffic steered to the queue by configuring rate limits on the queue. To set rate limiting for a queue, update shaper object of given queues in driver and send VIRTCHNL_OP_CONFIG_QUEUE_BW to PF to update HW configuration. Deleting shaper configured for queue is nothing but configuring shaper with bw_max 0. The PF restores the default rate limiting config when bw_max is zero. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/5a882cb51998c4c2c3d21fed521498eba1c8f079.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:23 -07:00
Wenjun Wu	015307754a	ice: Support VF queue rate limit and quanta size configuration Add support to configure VF queue rate limit and quanta size. For quanta size configuration, the quanta profiles are divided evenly by PF numbers. For each port, the first quanta profile is reserved for default. When VF is asked to set queue quanta size, PF will search for an available profile, change the fields and assigned this profile to the queue. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/fddefc2c1ec3ab32b241ce444af401da19e834dd.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:23 -07:00
Wenjun Wu	608a5c05c3	virtchnl: support queue rate limit and quanta size configuration This patch adds new virtchnl opcodes and structures for rate limit and quanta size configuration, which include: 1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each VF per queue. 2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue. 3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The configuration is previously set by DCB and PF, and now is the potential QoS capability of VF. VF can take it as reference to configure queue TC mapping. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/839002f7bd6f63b985a060a51b079f6e6dbbe237.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:23 -07:00
Paolo Abeni	b3ea416419	testing: net-drv: add basic shaper test Leverage a basic/dummy netdevsim implementation to do functional coverage for NL interface. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/43092afbf38365c796088bf8fc155e523ab434ae.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:23 -07:00
Paolo Abeni	ecd82cfee3	net-shapers: implement cap validation in the core Use the device capabilities to reject invalid attribute values before pushing them to the H/W. Note that validating the metric explicitly avoids NL_SET_BAD_ATTR() usage, to provide unambiguous error messages to the user. Validating the nesting requires the knowledge of the new parent for the given shaper; as such is a chicken-egg problem: to validate the leaf nesting we need to know the node scope, to validate the node nesting we need to know the leafs parent scope. To break the circular dependency, place the leafs nesting validation after the parsing. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/54667601813e4c0348f39bf8ad2446ffc9fcd383.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:23 -07:00
Paolo Abeni	553ea9f1ef	net: shaper: implement introspection support The netlink op is a simple wrapper around the device callback. Extend the existing fetch_dev() helper adding an attribute argument for the requested device. Reuse such helper in the newly implemented operation. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/66eb62f22b3a5ba06ca91d01ae77515e5f447e15.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:22 -07:00
Paolo Abeni	14bba9285a	netlink: spec: add shaper introspection support Allow the user-space to fine-grain query the shaping features supported by the NIC on each domain. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/3ddd10e450e3fe7d4b944c0d0b886d4483529ee6.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:22 -07:00
Paolo Abeni	ff7d4deb1f	net-shapers: implement shaper cleanup on queue deletion hook into netif_set_real_num_tx_queues() to cleanup any shaper configured on top of the to-be-destroyed TX queues. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/6da4ee03cae2b2a757d7b59e88baf09cc94c5ef1.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:22 -07:00
Paolo Abeni	bf230c497d	net-shapers: implement delete support for NODE scope shaper Leverage the previously introduced group operation to implement the removal of NODE scope shaper, re-linking its leaves under the the parent node before actually deleting the specified NODE scope shaper. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/763d484b5b69e365acccfd8031b183c647a367a4.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:22 -07:00
Paolo Abeni	5d5d4700e7	net-shapers: implement NL group operation Allow grouping multiple leaves shaper under the given root. The node and the leaves shapers are created, if needed, otherwise the existing shapers are re-linked as requested. Try hard to pre-allocated the needed resources, to avoid non trivial H/W configuration rollbacks in case of any failure. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/8a721274fde18b872d1e3a61aaa916bb7b7996d3.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:22 -07:00
Paolo Abeni	93954b40f6	net-shapers: implement NL set and delete operations Both NL operations directly map on the homonymous device shaper callbacks, update accordingly the shapers cache and are serialized via a per device lock. Implement the cache modification helpers to additionally deal with NODE scope shaper. That will be needed by the group() operation implemented in the next patch. The delete implementation is partial: does not handle NODE scope shaper yet. Such support will require infrastructure from the next patch and will be implemented later in the series. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/1e6a34a4095b35d773d2b9c476164671bbcf8397.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:22 -07:00
Paolo Abeni	4b623f9f0f	net-shapers: implement NL get operation Introduce the basic infrastructure to implement the net-shaper core functionality. Each network devices carries a net-shaper cache, the NL get() operation fetches the data from such cache. The cache is initially empty, will be fill by the set()/group() operation implemented later and is destroyed at device cleanup time. The net_shaper_fill_handle(), net_shaper_ctx_init(), and net_shaper_generic_pre() implementations handle generic index type attributes, despite the current caller always pass a constant value to avoid more noise in later patches using them with different attributes. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/ddd10fd645a9367803ad02fca4a5664ea5ace170.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:22 -07:00
Paolo Abeni	04e65df94b	netlink: spec: add shaper YAML spec Define the user-space visible interface to query, configure and delete network shapers via yaml definition. Add dummy implementations for the relevant NL callbacks. set() and delete() operations touch a single shaper creating/updating or deleting it. The group() operation creates a shaper's group, nesting multiple input shapers under the specified output shaper. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/7a33a1ff370bdbcd0cd3f909575c912cd56f41da.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:21 -07:00
Paolo Abeni	13d68a1643	genetlink: extend info user-storage to match NL cb ctx This allows a more uniform implementation of non-dump and dump operations, and will be used later in the series to avoid some per-operation allocation. Additionally rename the NL_ASSERT_DUMP_CTX_FITS macro, to fit a more extended usage. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/1130cc2896626b84587a2a5f96a5c6829638f4da.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 08:30:21 -07:00
Paolo Abeni	ffc8fa91be	Merge branch 'rtnetlink-handle-error-of-rtnl_register_module' Kuniyuki Iwashima says: ==================== rtnetlink: Handle error of rtnl_register_module(). While converting phonet to per-netns RTNL, I found a weird comment /* Further rtnl_register_module() cannot fail / that was true but no longer true after commit `addf9b90de` ("net: rtnetlink: use rcu to free rtnl message handlers"). Many callers of rtnl_register_module() just ignore the returned value but should handle them properly. This series introduces two helpers, rtnl_register_many() and rtnl_unregister_many(), to do that easily and fix such callers. All rtnl_register() and rtnl_register_module() will be converted to _many() variant and some rtnl_lock() will be saved in _many() later in net-next. Changes: v4: Add more context in changelog of each patch v3: https://lore.kernel.org/all/20241007124459.5727-1-kuniyu@amazon.com/ * Move module owner to struct rtnl_msg_handler Make struct rtnl_msg_handler args/vars const * Update mctp goto labels v2: https://lore.kernel.org/netdev/20241004222358.79129-1-kuniyu@amazon.com/ * Remove __exit from mctp_neigh_exit(). v1: https://lore.kernel.org/netdev/20241003205725.5612-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20241008184737.9619-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-10 15:39:38 +02:00
Kuniyuki Iwashima	b5e837c860	phonet: Handle error of rtnl_register_module(). Before commit `addf9b90de` ("net: rtnetlink: use rcu to free rtnl message handlers"), once the first rtnl_register_module() allocated rtnl_msg_handlers[PF_PHONET], the following calls never failed. However, after the commit, rtnl_register_module() could fail silently to allocate rtnl_msg_handlers[PF_PHONET][msgtype] and requires error handling for each call. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's use rtnl_register_many() to handle the errors easily. Fixes: `addf9b90de` ("net: rtnetlink: use rcu to free rtnl message handlers") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Rémi Denis-Courmont <courmisch@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-10 15:39:36 +02:00
Kuniyuki Iwashima	5be2062e30	mpls: Handle error of rtnl_register_module(). Since introduced, mpls_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: `03c0566542` ("mpls: Netlink commands to add, remove, and dump routes") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-10 15:39:35 +02:00
Kuniyuki Iwashima	d51705614f	mctp: Handle error of rtnl_register_module(). Since introduced, mctp has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: `583be982d9` ("mctp: Add device handling and netlink interface") Fixes: `831119f887` ("mctp: Add neighbour netlink interface") Fixes: `06d2f4c583` ("mctp: Add netlink route management") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-10 15:39:35 +02:00
Kuniyuki Iwashima	cba5e43b0b	bridge: Handle error of rtnl_register_module(). Since introduced, br_vlan_rtnl_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: `8dcea18708` ("net: bridge: vlan: add rtm definitions and dump support") Fixes: `f26b296585` ("net: bridge: vlan: add new rtm message support") Fixes: `adb3ce9bcb` ("net: bridge: vlan: add del rtm message support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-10 15:39:35 +02:00
Kuniyuki Iwashima	78b7b99183	vxlan: Handle error of rtnl_register_module(). Since introduced, vxlan_vnifilter_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: `f9c4bb0b24` ("vxlan: vni filtering support on collect metadata device") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-10 15:39:35 +02:00
Kuniyuki Iwashima	07cc7b0b94	rtnetlink: Add bulk registration helpers for rtnetlink message handlers. Before commit `addf9b90de` ("net: rtnetlink: use rcu to free rtnl message handlers"), once rtnl_msg_handlers[protocol] was allocated, the following rtnl_register_module() for the same protocol never failed. However, after the commit, rtnl_msg_handler[protocol][msgtype] needs to be allocated in each rtnl_register_module(), so each call could fail. Many callers of rtnl_register_module() do not handle the returned error, and we need to add many error handlings. To handle that easily, let's add wrapper functions for bulk registration of rtnetlink message handlers. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-10 15:39:35 +02:00
Christian Marangi	16aef66643	net: phy: Validate PHY LED OPs presence before registering Validate PHY LED OPs presence before registering and parsing them. Defining LED nodes for a PHY driver that actually doesn't supports them is redundant and useless. It's also the case with Generic PHY driver used and a DT having LEDs node for the specific PHY. Skip it and report the error with debug print enabled. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241008194718.9682-1-ansuelsmth@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-10 13:55:15 +02:00
Paolo Abeni	9a3cd877dc	Merge tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Restrict xtables extensions to families that are safe, syzbot found a way to combine ebtables with extensions that are never used by userspace tools. From Florian Westphal. 2) Set l3mdev inconditionally whenever possible in nft_fib to fix lookup mismatch, also from Florian. netfilter pull request 24-10-09 * tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: conntrack_vrf.sh: add fib test case netfilter: fib: check correct rtable in vrf setups netfilter: xtables: avoid NFPROTO_UNSPEC where needed ==================== Link: https://patch.msgid.link/20241009213858.3565808-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-10 13:50:55 +02:00
Paolo Abeni	88dc9aebd0	Merge branch 'net-mlx5-qos-refactor-esw-qos-to-support-new-features' Tariq Toukan says: ==================== net/mlx5: qos: Refactor esw qos to support new features This patch series by Cosmin and Carolina prepares the mlx5 qos infra for the upcoming feature of cross E-Switch scheduling. Noop cleanups: net/mlx5: qos: Flesh out element_attributes in mlx5_ifc.h net/mlx5: qos: Rename vport 'tsar' into 'sched_elem'. net/mlx5: qos: Consistently name vport vars as 'vport' net/mlx5: qos: Refactor and document bw_share calculation net/mlx5: qos: Rename rate group 'list' as 'parent_entry' Refactor the code with the goal of moving groups out of E-Switches: net/mlx5: qos: Maintain rate group vport members in a list net/mlx5: qos: Always create group0 net/mlx5: qos: Drop 'esw' param from vport qos functions net/mlx5: qos: Store the eswitch in a mlx5_esw_rate_group Move groups from an E-Switch into an mlx5_qos_domain: net/mlx5: qos: Store rate groups in a qos domain Refactor locking to use a new mutex in the qos domain: net/mlx5: qos: Refactor locking to a qos domain mutex In follow-up patchsets, we'll allow qos domains to be shared between E-Switches of the same NIC. The two top patches are simple enhancements. ==================== Link: https://patch.msgid.link/20241008183222.137702-1-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-10 13:12:03 +02:00
Carolina Jubran	e1013c7929	net/mlx5: Add support check for TSAR types in QoS scheduling Introduce a new function, mlx5_qos_tsar_type_supported(), to handle the validation of TSAR types within QoS scheduling contexts. Refactor the existing code to use this new function, replacing direct checks for TSAR type support in the NIC scheduling hierarchy. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-10 13:12:00 +02:00
Carolina Jubran	f91c69f43c	net/mlx5: Unify QoS element type checks across NIC and E-Switch Refactor the QoS element type support check by introducing a new function, mlx5_qos_element_type_supported(), which handles element type validation for both NIC and E-Switch schedulers. This change removes the redundant esw_qos_element_type_supported() function and unifies the element type checks into a single implementation. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-10 13:12:00 +02:00

1 2 3 4 5 ...

1309647 Commits