linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-02 03:57:34 -04:00

Author	SHA1	Message	Date
Aaron Conole	05398aa409	selftests: openvswitch: add a test for ipv4 forwarding This is a simple ipv4 bidirectional connectivity test. Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Adrian Moreno <amorenoz@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-08-03 15:05:41 +02:00
Adrian Moreno	9f1179fbbd	selftests: openvswitch: support key masks The default value for the mask actually depends on the value (e.g: if the value is non-null, the default is full-mask), so change the convert functions to accept the full, possibly masked string and let them figure out how to parse the different values. Also, implement size-aware int parsing. With this patch we can now express flows such as the following: "eth(src=0a:ca:fe:ca:fe:0a/ff:ff:00:00:ff:00)" "eth(src=0a:ca:fe:ca:fe:0a)" -> mask = ff:ff:ff:ff:ff:ff "ipv4(src=192.168.1.1)" -> mask = 255.255.255.255 "ipv4(src=192.168.1.1/24)" "ipv4(src=192.168.1.1/255.255.255.0)" "tcp(src=8080)" -> mask = 0xffff "tcp(src=8080/0xf0f0)" Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-08-03 15:05:41 +02:00
Aaron Conole	918423fda9	selftests: openvswitch: add an initial flow programming case The openvswitch self-tests can test much of the control side of the module (ie: what a vswitchd implementation would process), but the actual packet forwarding cases aren't supported, making the testing of limited value. Add some flow parsing and an initial ARP based test case using arping utility. This lets us display flows, add some basic output flows with simple matches, and test against a known good forwarding case. Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Adrian Moreno <amorenoz@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-08-03 15:05:41 +02:00
David Howells	ce650a1663	udp6: Fix __ip6_append_data()'s handling of MSG_SPLICE_PAGES __ip6_append_data() can has a similar problem to __ip_append_data()[1] when asked to splice into a partially-built UDP message that has more than the frag-limit data and up to the MTU limit, but in the ipv6 case, it errors out with EINVAL. This can be triggered with something like: pipe(pfd); sfd = socket(AF_INET6, SOCK_DGRAM, 0); connect(sfd, ...); send(sfd, buffer, 8137, MSG_CONFIRM\|MSG_MORE); write(pfd[1], buffer, 8); splice(pfd[0], 0, sfd, 0, 0x4ffe0ul, 0); where the amount of data given to send() is dependent on the MTU size (in this instance an interface with an MTU of 8192). The problem is that the calculation of the amount to copy in __ip6_append_data() goes negative in two places, but a check has been put in to give an error in this case. This happens because when pagedlen > 0 (which happens for MSG_ZEROCOPY and MSG_SPLICE_PAGES), the terms in: copy = datalen - transhdrlen - fraggap - pagedlen; then mostly cancel when pagedlen is substituted for, leaving just -fraggap. Fix this by: (1) Insert a note about the dodgy calculation of 'copy'. (2) If MSG_SPLICE_PAGES, clear copy if it is negative from the above equation, so that 'offset' isn't regressed and 'length' isn't increased, which will mean that length and thus copy should match the amount left in the iterator. (3) When handling MSG_SPLICE_PAGES, give a warning and return -EIO if we're asked to splice more than is in the iterator. It might be better to not give the warning or even just give a 'short' write. (4) If MSG_SPLICE_PAGES, override the copy<0 check. [!] Note that this should also affect MSG_ZEROCOPY, but that will return -EINVAL for the range of send sizes that requires the skbuff to be split. Signed-off-by: David Howells <dhowells@redhat.com> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: David Ahern <dsahern@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/000000000000881d0606004541d1@google.com/ [1] Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/1580952.1690961810@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-08-03 14:56:19 +02:00
Ruan Jinjie	6abce66ba9	net: gemini: Do not check for 0 return after calling platform_get_irq() It is not possible for platform_get_irq() to return 0. Use the return value from platform_get_irq(). Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20230802085216.659238-1-ruanjinjie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-08-03 14:50:28 +02:00
Ruan Jinjie	c1e9e5e0b9	drivers: net: xgene: Do not check for 0 return after calling platform_get_irq() It is not possible for platform_get_irq() to return 0. Use the return value from platform_get_irq(). Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20230802090657.969923-1-ruanjinjie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-08-03 14:50:15 +02:00
Yue Haibing	c956910d5a	tipc: Remove unused function declarations Commit `d50ccc2d39` ("tipc: add 128-bit node identifier") declared but never implemented tipc_node_id2hash(). Also commit `5c216e1d28` ("tipc: Allow run-time alteration of default link settings") never implemented tipc_media_set_priority() and tipc_media_set_window(), commit `cad2929dc4` ("tipc: update a binding service via broadcast") only declared tipc_named_bcast(). Since commit `be07f05639` ("tipc: simplify the finalize work queue") tipc_sched_net_finalize() is removed and declaration is unused. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230802034659.39840-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-08-03 12:51:45 +02:00
Daniel Golle	571e9c4968	net: ethernet: mtk_eth_soc: support per-flow accounting on MT7988 NETSYS_V3 uses 64 bits for each counters while older SoCs are using 48/40 bits for each counter. Support reading per-flow byte and package counters on NETSYS_V3. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/37a0928fa8c1253b197884c68ce1f54239421ac5.1690946442.git.daniel@makrotopia.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-08-03 11:28:37 +02:00
Mateusz Kowalski	f11e5bd159	bonding: support balance-alb with openvswitch Commit `d5410ac7b0` ("net:bonding:support balance-alb interface with vlan to bridge") introduced a support for balance-alb mode for interfaces connected to the linux bridge by fixing missing matching of MAC entry in FDB. In our testing we discovered that it still does not work when the bond is connected to the OVS bridge as show in diagram below: eth1(mac:eth1_mac)--bond0(balance-alb,mac:eth0_mac)--eth0(mac:eth0_mac) \| bond0.150(mac:eth0_mac) \| ovs_bridge(ip:bridge_ip,mac:eth0_mac) This patch fixes it by checking not only if the device is a bridge but also if it is an openvswitch. Signed-off-by: Mateusz Kowalski <mko@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/9fe7297c-609e-208b-c77b-3ceef6eb51a4@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-08-03 10:25:42 +02:00
Jakub Kicinski	b23ec2bd7b	Merge branch 'introduce-ndo_hwtstamp_get-and-ndo_hwtstamp_set' Vladimir Oltean says: ==================== Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() Based on previous RFCs from Maxim Georgiev: https://lore.kernel.org/netdev/20230502043150.17097-1-glipus@gmail.com/ this series attempts to introduce new API for the hardware timestamping control path (SIOCGHWTSTAMP and SIOCSHWTSTAMP handling). I don't have any board with phylib hardware timestamping, so I would appreciate testing (especially on lan966x, the most intricate conversion). I was, however, able to test netdev level timestamping, because I also have some more unsubmitted conversions in progress: https://github.com/vladimiroltean/linux/commits/ndo-hwtstamp-v9 I hope that the concerns expressed in the comments of previous series were addressed, and that Köry Maincent's series: https://lore.kernel.org/netdev/20230406173308.401924-1-kory.maincent@bootlin.com/ can make progress in parallel with the conversion of the rest of drivers. ==================== Link: https://lore.kernel.org/r/20230801142824.1772134-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:09 -07:00
Vladimir Oltean	fd770e856e	net: remove phy_has_hwtstamp() -> phy_mii_ioctl() decision from converted drivers It is desirable that the new .ndo_hwtstamp_set() API gives more uniformity, less overhead and future flexibility w.r.t. the PHY timestamping behavior. Currently there are some drivers which allow PHY timestamping through the procedure mentioned in Documentation/networking/timestamping.rst. They don't do anything locally if phy_has_hwtstamp() is set, except for lan966x which installs PTP packet traps. Centralize that behavior in a new dev_set_hwtstamp_phylib() code function, which calls either phy_mii_ioctl() for the phylib PHY, or .ndo_hwtstamp_set() of the netdev, based on a single policy (currently simplistic: phy_has_hwtstamp()). Any driver converted to .ndo_hwtstamp_set() will automatically opt into the centralized phylib timestamping policy. Unconverted drivers still get to choose whether they let the PHY handle timestamping or not. Netdev drivers with integrated PHY drivers that don't use phylib presumably don't set dev->phydev, and those will always see HWTSTAMP_SOURCE_NETDEV requests even when converted. The timestamping policy will remain 100% up to them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20230801142824.1772134-13-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:06 -07:00
Vladimir Oltean	60495b6622	net: phy: provide phylib stubs for hardware timestamping operations net/core/dev_ioctl.c (built-in code) will want to call phy_mii_ioctl() for hardware timestamping purposes. This is not directly possible, because phy_mii_ioctl() is a symbol provided under CONFIG_PHYLIB. Do something similar to what was done in DSA in commit `5a17818682` ("net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stub"), and arrange some indirect calls to phy_mii_ioctl() through a stub structure containing function pointers, that's provided by phylib as built-in even when CONFIG_PHYLIB=m, and which phy_init() populates at runtime (module insertion). Note: maybe the ownership of the ethtool_phy_ops singleton is backwards, and the methods exposed by that should be later merged into phylib_stubs. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230801142824.1772134-12-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:06 -07:00
Vladimir Oltean	70ef7d87f6	net: transfer rtnl_lock() requirement from ethtool_set_ethtool_phy_ops() to caller phy_init() and phy_exit() will have to do more stuff under rtnl_lock() in a future change. Since rtnl_unlock() -> netdev_run_todo() does a lot of stuff under the hood, it's a pity to lock and unlock the rtnetlink mutex twice in a row. Change the calling convention such that the only caller of ethtool_set_ethtool_phy_ops(), phy_device.c, provides a context where the rtnl_mutex is already acquired. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230801142824.1772134-11-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:06 -07:00
Vladimir Oltean	54e1ed69c4	net: lan966x: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set() The hardware timestamping through ndo_eth_ioctl() is going away. Convert the lan966x driver to the new API before that can be removed. After removing the timestamping logic from lan966x_port_ioctl(), the rest is equivalent to phy_do_ioctl(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20230801142824.1772134-10-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:06 -07:00
Vladimir Oltean	7bdde44463	net: sparx5: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set() The hardware timestamping through ndo_eth_ioctl() is going away. Convert the sparx5 driver to the new API before that can be removed. After removing the timestamping logic from sparx5_port_ioctl(), the rest is equivalent to phy_do_ioctl(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Link: https://lore.kernel.org/r/20230801142824.1772134-9-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:06 -07:00
Vladimir Oltean	547b006d19	net: fec: delete fec_ptp_disable_hwts() Commit `340746398b` ("net: fec: fix hardware time stamping by external devices") was overly cautious with calling fec_ptp_disable_hwts() when cmd == SIOCSHWTSTAMP and use_fec_hwts == false, because use_fec_hwts is based on a runtime invariant (phy_has_hwtstamp()). Thus, if use_fec_hwts is false, then fep->hwts_tx_en and fep->hwts_rx_en cannot be changed at runtime; their values depend on the initial memory allocation, which already sets them to zeroes. If the core will ever gain support for switching timestamping layers, it will arrange for a more organized calling convention and disable timestamping in the previous layer as a first step. This means that the code in the FEC driver is not necessary in any case. The purpose of this change is to arrange the phy_has_hwtstamp() code in a way in which it can be refactored away into generic logic. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://lore.kernel.org/r/20230801142824.1772134-8-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:06 -07:00
Vladimir Oltean	ef5eb9c5ce	net: fec: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set() The hardware timestamping through ndo_eth_ioctl() is going away. Convert the FEC driver to the new API before that can be removed. After removing the timestamping logic from fec_enet_ioctl(), the rest is equivalent to phy_do_ioctl_running(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://lore.kernel.org/r/20230801142824.1772134-7-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:06 -07:00
Maxim Georgiev	c0dabeb4c6	net: bonding: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set() bonding is one of the stackable net devices which pass the hardware timestamping ops to the real device through ndo_eth_ioctl(). This prevents converting any device driver to the new hwtimestamping API without regressions. Remove that limitation in bonding by using the newly introduced helpers for timestamping through lower devices, that handle both the new and the old driver API. Signed-off-by: Maxim Georgiev <glipus@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230801142824.1772134-6-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:06 -07:00
Maxim Georgiev	0bca3f7f9a	net: macvlan: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set() macvlan is one of the stackable net devices which pass the hardware timestamping ops to the real device through ndo_eth_ioctl(). This prevents converting any device driver to the new hwtimestamping API without regressions. Remove that limitation in macvlan by using the newly introduced helpers for timestamping through lower devices, that handle both the new and the old driver API. macvlan only implements ndo_eth_ioctl() for these 2 operations, so delete that method. Signed-off-by: Maxim Georgiev <glipus@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230801142824.1772134-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:06 -07:00
Maxim Georgiev	65c9fde15a	net: vlan: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set() 8021q is one of the stackable net devices which pass the hardware timestamping ops to the real device through ndo_eth_ioctl(). This prevents converting any device driver to the new hwtimestamping API without regressions. Remove that limitation in the vlan driver by using the newly introduced helpers for timestamping through lower devices, that handle both the new and the old driver API. Signed-off-by: Maxim Georgiev <glipus@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230801142824.1772134-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:05 -07:00
Maxim Georgiev	e47d01fea6	net: add hwtstamping helpers for stackable net devices The stackable net devices with hwtstamping support (vlan, macvlan, bonding) only pass the hwtstamping ops to the lower (real) device. These drivers are the first that need to be converted to the new timestamping API, because if they aren't prepared to handle that, then no real device driver cannot be converted to the new API either. After studying what vlan_dev_ioctl(), macvlan_eth_ioctl() and bond_eth_ioctl() have in common, here we propose two generic implementations of ndo_hwtstamp_get() and ndo_hwtstamp_set() which can be called by those 3 drivers, with "dev" being their lower device. These helpers cover both cases, when the lower driver is converted to the new API or unconverted. We need some hacks in case of an unconverted driver, namely to stuff some pointers in struct kernel_hwtstamp_config which shouldn't have been there (since the new API isn't supposed to need it). These will be removed when all drivers will have been converted to the new API. Signed-off-by: Maxim Georgiev <glipus@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230801142824.1772134-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:05 -07:00
Maxim Georgiev	66f7223039	net: add NDOs for configuring hardware timestamping Current hardware timestamping API for NICs requires implementing .ndo_eth_ioctl() for SIOCGHWTSTAMP and SIOCSHWTSTAMP. That API has some boilerplate such as request parameter translation between user and kernel address spaces, handling possible translation failures correctly, etc. Since it is the same all across the board, it would be desirable to handle it through generic code. Here we introduce .ndo_hwtstamp_get() and .ndo_hwtstamp_set(), which implement that boilerplate and allow drivers to just act upon requests. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Maxim Georgiev <glipus@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20230801142824.1772134-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 19:11:05 -07:00
Jakub Kicinski	72c1a28473	Merge branch 'net-extend-alloc_skb_with_frags-max-size' Eric Dumazet says: ==================== net: extend alloc_skb_with_frags() max size alloc_skb_with_frags(), while being able to use high order allocations, limits the payload size to PAGE_SIZE * MAX_SKB_FRAGS Reviewing Tahsin Erdogan patch [1], it was clear to me we need to remove this limitation. [1] https://lore.kernel.org/netdev/20230731230736.109216-1-trdgn@amazon.com/ v2: Addressed Willem feedback on 1st patch. ==================== Link: https://lore.kernel.org/r/20230801205254.400094-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:44:59 -07:00
Eric Dumazet	37dfe5b8dd	net: tap: change tap_alloc_skb() to allow bigger paged allocations tap_alloc_skb() is currently calling sock_alloc_send_pskb() forcing order-0 page allocations. Switch to PAGE_ALLOC_COSTLY_ORDER, to increase max size by 8x. Also add logic to increase the linear part if needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tahsin Erdogan <trdgn@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230801205254.400094-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:44:56 -07:00
Eric Dumazet	ae6db08f8b	net/packet: change packet_alloc_skb() to allow bigger paged allocations packet_alloc_skb() is currently calling sock_alloc_send_pskb() forcing order-0 page allocations. Switch to PAGE_ALLOC_COSTLY_ORDER, to increase max size by 8x. Also add logic to increase the linear part if needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tahsin Erdogan <trdgn@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230801205254.400094-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:44:55 -07:00
Eric Dumazet	ce7c7fef14	net: tun: change tun_alloc_skb() to allow bigger paged allocations tun_alloc_skb() is currently calling sock_alloc_send_pskb() forcing order-0 page allocations. Switch to PAGE_ALLOC_COSTLY_ORDER, to increase max allocation size by 8x. Also add logic to increase the linear part if needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tahsin Erdogan <trdgn@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230801205254.400094-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:44:55 -07:00
Eric Dumazet	09c2c90705	net: allow alloc_skb_with_frags() to allocate bigger packets Refactor alloc_skb_with_frags() to allow bigger packets allocations. Instead of assuming that only order-0 allocations will be attempted, use the caller supplied max order. v2: try harder to use high-order pages, per Willem feedback. Link: https://lore.kernel.org/netdev/CANn89iJQfmc_KeUr3TeXvsLQwo3ZymyoCr7Y6AnHrkWSuz0yAg@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tahsin Erdogan <trdgn@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230801205254.400094-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:44:55 -07:00
Yue Haibing	49c467dca3	sctp: Remove unused function declarations These declarations are never implemented since beginning of git history. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230731141030.32772-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:42:03 -07:00
Jakub Kicinski	edd8b295f9	Merge branch 'mlx5-ipsec-packet-offload-support-in-eswitch-mode' Leon Romanovsky says: ==================== mlx5 IPsec packet offload support in eswitch mode This series from Jianbo adds mlx5 IPsec packet offload support in eswitch offloaded mode. It works exactly like "regular" IPsec, nothing special, except now users can switch to switchdev before adding IPsec rules. devlink dev eswitch set pci/0000:06:00.0 mode switchdev Same configurations as here: https://lore.kernel.org/netdev/cover.1670005543.git.leonro@nvidia.com/ Packet offload mode: ip xfrm state offload packet dev <if-name> dir <in\|out> ip xfrm policy .... offload packet dev <if-name> Crypto offload mode: ip xfrm state offload crypto dev <if-name> dir <in\|out> or (backward compatibility) ip xfrm state offload dev <if-name> dir <in\|out> v0: https://lore.kernel.org/all/cover.1689064922.git.leonro@nvidia.com ==================== Link: https://lore.kernel.org/r/cover.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:39 -07:00
Jianbo Liu	c8e350e62f	net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev For IPsec packet offload mode, the order of TC offload and IPsec offload on the same netdevice is not aligned with the order in the non-offload software. For example, for RX, the software performs TC first and then IPsec transformation, but the implementation for offload does that in the opposite way. To resolve the difference for now, either IPsec offload or TC offload, not both, is allowed for a specific interface. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/8e2e5e3b0984d785066e8663aaf97b3ba1bb873f.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:30 -07:00
Jianbo Liu	6e56ab1c90	net/mlx5e: Add get IPsec offload stats for uplink representor As IPsec offload is supported in switchdev mode, HW stats can be can be obtained from uplink rep. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/b43c91c452f1db9c35c10639a029aa10fd8b7895.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:30 -07:00
Jianbo Liu	d1569537a8	net/mlx5e: Modify and restore TC rules for IPSec TX rules After IPsec policy/state TX rules are added, any TC flow rule, which forwards packets to uplink, is modified to forward to IPsec TX tables. As these tables are destroyed dynamically, whenever there is no reference to them, the destinations of this kind of rules must be restored to uplink. There is a special case for packet encapsulation, as the packet_reformat_id in the extended destination is used to reformat packets, but only for the VPORT destination. To forward packet to IPsec table and do encapsulation in one FTE, move the packet_reformat_id to flow context, instead of using the extended destination. As a limitation, multiple encapsulations with table forwarding, and one together with other VPORT destinations, are not allowed, so add a check when offloading TC rules. TC rules are not allowed before IPsec TX rule is added, so only need to restore TC rules after flush IPSec TX rules. As they are saved in the vport_rep rhashtables, we walk all the rules in the rhashtables, and find TC rules with destinations pointing to IPsec tables, and modify them one by one. To avoid concurrent issue, this handling is done under the protection of eswitch mode_lock. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/7bcb2c7e2ecf0e0d06b095c8dcc6a37ea7f02faf.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:30 -07:00
Jianbo Liu	366e46242b	net/mlx5e: Make IPsec offload work together with eswitch and TC The eswitch mode is not allowed to change if there are any IPsec rules. Besides, by using mlx5_esw_try_lock() to get eswitch mode lock, IPsec rules are not allowed to be offloaded if there are any TC rules. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/e442b512b21a931fbdfb87d57ae428c37badd58a.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:30 -07:00
Jianbo Liu	1632649d2d	net/mlx5: Compare with old_dest param to modify rule destination The rule destination must be compared with the old_dest passed in. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/24adc60d05d7492359ba343c6da1ebbe9fe284f6.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:30 -07:00
Jianbo Liu	c6c2bf5db4	net/mlx5e: Support IPsec packet offload for TX in switchdev mode The IPsec encryption is done at the last, so add new prio for IPsec offload in FDB, and put it just lower than the slow path prio and higher than the per-vport prio. Three levels are added for TX. The first one is for ip xfrm policy. The sa table is created in the second level for ip xfrm state. The status table is created at the last to count the number of packets encrypted. The rules, which forward packets to uplink, are changed to forward them to IPsec TX tables first. These rules are restored after those tables are destroyed, which is done immediately when there is no reference to them, just as what does in legacy mode. The support for slow path is added here, by refreshing uplink's channels. But, the handling for TC fast path, which is more complicated, will be added later. Besides, reg c4 is used instead to match reqid. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/cfd0e6ffaf0b8c55ebaa9fb0649b7c504b6b8ec6.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:29 -07:00
Jianbo Liu	f46e92d664	net/mlx5e: Refactor IPsec TX tables creation Add attribute for IPsec TX creation, pass all needed parameters in it, so tx_create() can be used by eswitch. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/24d5ab988b0db2d39b7fde321b44ffe885d47828.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:29 -07:00
Jianbo Liu	91bafc638e	net/mlx5e: Handle IPsec offload for RX datapath in switchdev mode Reuse tun opts bits in reg c1, to pass IPsec obj id to datapath. As this is only for RX SA and there are only 11 bits, xarray is used to map IPsec obj id to an index, which is between 1 and 0x7ff, and replace obj id to write to reg c1. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/43d60fbcc9cd672a97d7e2a2f7fe6a3d9e9a776d.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:29 -07:00
Jianbo Liu	1762f132d5	net/mlx5e: Support IPsec packet offload for RX in switchdev mode As decryption must be done first, add new prio for IPsec offload in FDB, and put it just lower than BYPASS prio and higher than TC prio. Three levels are added for RX. The first one is for ip xfrm policy. SA table is created in the second level for ip xfrm state. The status table is created in the last to check the decryption result. If success, packets continue with the next process, or dropped otherwise. For now, the set of reg c1 is removed for swtichdev mode, and the datapath process will be added in the next patch. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/c91063554cf643fb50b99cf093e8a9bf11729de5.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:29 -07:00
Jianbo Liu	6e125265d5	net/mlx5e: Refactor IPsec RX tables creation and destruction Add attribute for IPsec RX creation, so rx_create() can be used by eswitch in later patch. And move the code for TTC dest connect/disconnect, which are needed only in NIC mode, to individual functions. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/87478d928479b6a4eee41901204546ea05741815.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:29 -07:00
Jianbo Liu	f5c5abc4c0	net/mlx5e: Prepare IPsec packet offload for switchdev mode As the uplink representor is created only in switchdev mode, add a local variable for IPsec to indicate the device is in this mode. In this mode, IPsec ROCE is disabled, and crypto offload is kept as it is. However, as the tables for packet offload are created in FDB, ipsec->rx_esw and ipsec->tx_esw are added. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/ee242398f3b0a18007749fe79ff6ff19445a0280.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:29 -07:00
Jianbo Liu	33b18a0f75	net/mlx5e: Change the parameter of IPsec RX skb handle function Refactor the function to pass in reg B value only. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/3b3c53f64660d464893eaecc41298b1ce49c6baa.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:29 -07:00
Jianbo Liu	fbd517549c	net/mlx5e: Add function to get IPsec offload namespace Add function to get namespace in different directions. It will be extended for switchdev mode in later patch, but no functionality change for now. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/ac2982c34f1ed3288d4670cacfd7e1b87a8c96d9.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 18:37:29 -07:00
Brett Creeley	30ff01ee99	pds_core: Fix documentation for pds_client_register The documentation above pds_client_register states that it returns 0 on success and negative on error. However, it actually returns a positive client ID on success and negative on error. Fix the documentation to state exactly that. Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20230801165833.1622-1-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 12:28:32 -07:00
Yue Haibing	f85b1c7da7	net: switchdev: Remove unused typedef switchdev_obj_dump_cb_t() Commit `29ab586c3d` ("net: switchdev: Remove bridge bypass support from switchdev") leave this unused. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230801144209.27512-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 12:28:26 -07:00
Yue Haibing	e12f2a6d1b	netlabel: Remove unused declaration netlbl_cipsov4_doi_free() Since commit `b1edeb1023` ("netlabel: Replace protocol/NetLabel linking with refrerence counts") this declaration is unused and can be removed. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/20230801143453.24452-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 12:28:22 -07:00
Yue Haibing	2fca1b5ef8	ila: Remove unnecessary file net/ila.h Commit `642c2c9558` ("ila: xlat changes") removed ila_xlat_outgoing() and ila_xlat_incoming() functions, then this file became unnecessary. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230801143129.40652-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 12:28:16 -07:00
Yue Haibing	9e63a99c56	udp: Remove unused function declaration udp_bpf_get_proto() commit `8a59f9d1e3` ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") left behind this. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230801133902.3660-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 12:10:46 -07:00
Ruan Jinjie	497c3a5fb3	cirrus: cs89x0: fix the return value handle and remove redundant dev_warn() for platform_get_irq() There is no possible for platform_get_irq() to return 0 and the return value of platform_get_irq() is more sensible to show the error reason. And there is no need to call the dev_warn() function directly to print a custom message when handling an error from platform_get_irq() function as it is going to display an appropriate error message in case of a failure. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20230801133121.416319-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 12:10:39 -07:00
Kurt Kanzenbach	ae3683a342	net: dsa: hellcreek: Replace bogus comment Replace bogus comment about matching the latched timestamp to one of the received frames. That comment is probably copied from mv88e6xxx and true for these switches. However, the hellcreek switch is configured to insert the timestamp directly into the PTP packets. While here, remove the other comments regarding the list splicing and locking as well, because it doesn't add any value. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20230801131647.84697-1-kurt@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 12:10:35 -07:00
Ruan Jinjie	ae336f30d5	bnx2x: Remove unnecessary ternary operators There are a little ternary operators, the true or false judgement of which is unnecessary in C language semantics. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230801111928.300231-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-02 12:09:13 -07:00

1 2 3 4 5 ...

1201292 Commits