linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-05 05:09:17 -04:00

Author	SHA1	Message	Date
Yang Li	f978fa41f6	net: libwx: clean up one inconsistent indenting drivers/net/ethernet/wangxun/libwx/wx_lib.c:1835 wx_setup_all_rx_resources() warn: inconsistent indenting Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3981 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20230208013227.111605-1-yang.lee@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-08 20:49:38 -08:00
Jakub Kicinski	7eadc0a013	Merge tag 'mlx5-updates-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-02-07 1) Minor and trivial code Cleanups 2) Minor fixes for net-next 3) From Shay: dynamic FW trace strings update. * tag 'mlx5-updates-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: fw_tracer, Add support for unrecognized string net/mlx5: fw_tracer, Add support for strings DB update event net/mlx5: fw_tracer, allow 0 size string DBs net/mlx5: fw_tracer: Fix debug print net/mlx5: fs, Remove redundant assignment of size net/mlx5: fs_core, Remove redundant variable err net/mlx5: Fix memory leak in error flow of port set buffer net/mlx5e: Remove incorrect debugfs_create_dir NULL check in TLS net/mlx5e: Remove incorrect debugfs_create_dir NULL check in hairpin net/mlx5: fs, Remove redundant vport_number assignment net/mlx5e: Remove redundant code for handling vlan actions net/mlx5e: Don't listen to remove flows event net/mlx5: fw reset: Skip device ID check if PCI link up failed net/mlx5: Remove redundant health work lock mlx5: reduce stack usage in mlx5_setup_tc ==================== Link: https://lore.kernel.org/r/20230208003712.68386-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-08 19:05:58 -08:00
David S. Miller	e6ebe6c123	Merge branch 'taprio-auto-qmaxsdu-new-tx' Vladimir Oltean says: ==================== taprio automatic queueMaxSDU and new TXQ selection procedure This patch set addresses 2 design limitations in the taprio software scheduler: 1. Software scheduling fundamentally prioritizes traffic incorrectly, in a way which was inspired from Intel igb/igc drivers and does not follow the inputs user space gives (traffic classes and TC to TXQ mapping). Patch 05/15 handles this, 01/15 - 04/15 are preparations for this work. 2. Software scheduling assumes that the gate for a traffic class closes as soon as the next interval begins. But this isn't true. If consecutive schedule entries have that traffic class gate open, there is no "gate close" event and taprio should keep dequeuing from that TC without interruptions. Patches 06/15 - 15/15 handle this. Patch 10/15 is a generic Qdisc change required for this to work. Future development directions which depend on this patch set are: - Propagating the automatic queueMaxSDU calculation down to offloading device drivers, instead of letting them calculate this, as vsc9959_tas_guard_bands_update() does today. - A software data path for tc-taprio with preemptible traffic and Hold/Release events. v1 at: https://patchwork.kernel.org/project/netdevbpf/cover/20230128010719.2182346-1-vladimir.oltean@nxp.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:53 +00:00
Vladimir Oltean	39b02d6d10	net/sched: taprio: don't segment unnecessarily Improve commit `497cc00224` ("taprio: Handle short intervals and large packets") to only perform segmentation when skb->len exceeds what taprio_dequeue() expects. In practice, this will make the biggest difference when a traffic class gate is always open in the schedule. This is because the max_frm_len will be U32_MAX, and such large skb->len values as Kurt reported will be sent just fine unsegmented. What I don't seem to know how to handle is how to make sure that the segmented skbs themselves are smaller than the maximum frame size given by the current queueMaxSDU[tc]. Nonetheless, we still need to drop those, otherwise the Qdisc will hang. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:53 +00:00
Vladimir Oltean	2d5e8071c4	net/sched: taprio: split segmentation logic from qdisc_enqueue() The majority of the taprio_enqueue()'s function is spent doing TCP segmentation, which doesn't look right to me. Compilers shouldn't have a problem in inlining code no matter how we write it, so move the segmentation logic to a separate function. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:53 +00:00
Vladimir Oltean	fed87cc671	net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations taprio today has a huge problem with small TC gate durations, because it might accept packets in taprio_enqueue() which will never be sent by taprio_dequeue(). Since not much infrastructure was available, a kludge was added in commit `497cc00224` ("taprio: Handle short intervals and large packets"), which segmented large TCP segments, but the fact of the matter is that the issue isn't specific to large TCP segments (and even worse, the performance penalty in segmenting those is absolutely huge). In commit `a54fc09e4c` ("net/sched: taprio: allow user input of per-tc max SDU"), taprio gained support for queueMaxSDU, which is precisely the mechanism through which packets should be dropped at qdisc_enqueue() if they cannot be sent. After that patch, it was necessary for the user to manually limit the maximum MTU per TC. This change adds the necessary logic for taprio to further limit the values specified (or not specified) by the user to some minimum values which never allow oversized packets to be sent. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:53 +00:00
Vladimir Oltean	a878fd46fe	net/sched: keep the max_frm_len information inside struct sched_gate_list I have one practical reason for doing this and one concerning correctness. The practical reason has to do with a follow-up patch, which aims to mix 2 sources of max_sdu (one coming from the user and the other automatically calculated based on TC gate durations @current link speed). Among those 2 sources of input, we must always select the smaller max_sdu value, but this can change at various link speeds. So the max_sdu coming from the user must be kept separated from the value that is operationally used (the minimum of the 2), because otherwise we overwrite it and forget what the user asked us to do. To solve that, this patch proposes that struct sched_gate_list contains the operationally active max_frm_len, and q->max_sdu contains just what was requested by the user. The reason having to do with correctness is based on the following observation: the admin sched_gate_list becomes operational at a given base_time in the future. Until then, it is inactive and applies no shaping, all gates are open, etc. So the queueMaxSDU dropping shouldn't apply either (this is a mechanism to ensure that packets smaller than the largest gate duration for that TC don't hang the port; clearly it makes little sense if the gates are always open). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:53 +00:00
Vladimir Oltean	a3d91b2c6f	net/sched: taprio: warn about missing size table Vinicius intended taprio to take the L1 overhead into account when estimating packet transmission time through user input, specifically through the qdisc size table (man tc-stab). Something like this: tc qdisc replace dev $eth root stab overhead 24 taprio \ num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 \ sched-entry S 0x7e 9000000 \ sched-entry S 0x82 1000000 \ max-sdu 0 0 0 0 0 0 0 200 \ flags 0x0 clockid CLOCK_TAI Without the overhead being specified, transmission times will be underestimated and will cause late transmissions. For an offloading driver, it might even cause TX hangs if there is no open gate large enough to send the maximum sized packets for that TC (including L1 overhead). Properly knowing the L1 overhead will ensure that we are able to auto-calculate the queueMaxSDU per traffic class just right, and avoid these hangs due to head-of-line blocking. We can't make the stab mandatory due to existing setups, but we can warn the user that it's important with a warning netlink extack. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220505160357.298794-1-vladimir.oltean@nxp.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:53 +00:00
Vladimir Oltean	1f62879e36	net/sched: make stab available before ops->init() call Some qdiscs like taprio turn out to be actually pretty reliant on a well configured stab, to not underestimate the skb transmission time (by properly accounting for L1 overhead). In a future change, taprio will need the stab, if configured by the user, to be available at ops->init() time. It will become even more important in upcoming work, when the overhead will be used for the queueMaxSDU calculation that is passed to an offloading driver. However, rcu_assign_pointer(sch->stab, stab) is called right after ops->init(), making it unavailable, and I don't really see a good reason for that. Move it earlier, which nicely seems to simplify the error handling path as well. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:53 +00:00
Vladimir Oltean	a1e6ad30fa	net/sched: taprio: calculate guard band against actual TC gate close time taprio_dequeue_from_txq() looks at the entry->end_time to determine whether the skb will overrun its traffic class gate, as if at the end of the schedule entry there surely is a "gate close" event for it. Hint: maybe there isn't. For each schedule entry, introduce an array of kernel times which actually tracks when in the future will there be an actual gate close event for that traffic class, and use that in the guard band overrun calculation. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:53 +00:00
Vladimir Oltean	d2ad689dec	net/sched: taprio: calculate budgets per traffic class Currently taprio assumes that the budget for a traffic class expires at the end of the current interval as if the next interval contains a "gate close" event for this traffic class. This is, however, an unfounded assumption. Allow schedule entry intervals to be fused together for a particular traffic class by calculating the budget until the gate actually closes. This means we need to keep budgets per traffic class, and we also need to update the budget consumption procedure. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:53 +00:00
Vladimir Oltean	e551755111	net/sched: taprio: rename close_time to end_time There is a confusion in terms in taprio which makes what is called "close_time" to be actually used for 2 things: 1. determining when an entry "closes" such that transmitted skbs are never allowed to overrun that time (?!) 2. an aid for determining when to advance and/or restart the schedule using the hrtimer It makes more sense to call this so-called "close_time" "end_time", because it's not clear at all to me what "closes". Future patches will hopefully make better use of the term "to close". This is an absolutely mechanical change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:52 +00:00
Vladimir Oltean	a306a90c8f	net/sched: taprio: calculate tc gate durations Current taprio code operates on a very simplistic (and incorrect) assumption: that egress scheduling for a traffic class can only take place for the duration of the current interval, or i.o.w., it assumes that at the end of each schedule entry, there is a "gate close" event for all traffic classes. As an example, traffic sent with the schedule below will be jumpy, even though all 8 TC gates are open, so there is absolutely no "gate close" event (effectively a transition from BIT(tc)==1 to BIT(tc)==0 in consecutive schedule entries): tc qdisc replace dev veth0 parent root taprio \ num_tc 2 \ map 0 1 \ queues 1@0 1@1 \ base-time 0 \ sched-entry S 0xff 4000000000 \ clockid CLOCK_TAI \ flags 0x0 This qdisc simply does not have what it takes in terms of logic to actually compute the durations of traffic classes. Also, it does not recognize the need to use this information on a per-traffic-class basis: it always looks at entry->interval and entry->close_time. This change proposes that each schedule entry has an array called tc_gate_duration[tc]. This holds the information: "for how long will this traffic class gate remain open, starting from this schedule entry". If the traffic class gate is always open, that value is equal to the cycle time of the schedule. We'll also need to keep track, for the purpose of queueMaxSDU[tc] calculation, what is the maximum time duration for a traffic class having an open gate. This gives us directly what is the maximum sized packet that this traffic class will have to accept. For everything else it has to qdisc_drop() it in qdisc_enqueue(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:52 +00:00
Vladimir Oltean	2f530df76c	net/sched: taprio: give higher priority to higher TCs in software dequeue mode Current taprio software implementation is haunted by the shadow of the igb/igc hardware model. It iterates over child qdiscs in increasing order of TXQ index, therefore giving higher xmit priority to TXQ 0 and lower to TXQ N. According to discussions with Vinicius, that is the default (perhaps even unchangeable) prioritization scheme used for the NICs that taprio was first written for (igb, igc), and we have a case of two bugs canceling out, resulting in a functional setup on igb/igc, but a less sane one on other NICs. To the best of my understanding, taprio should prioritize based on the traffic class, so it should really dequeue starting with the highest traffic class and going down from there. We get to the TXQ using the tc_to_txq[] netdev property. TXQs within the same TC have the same (strict) priority, so we should pick from them as fairly as we can. We can achieve that by implementing something very similar to q->curband from multiq_dequeue(). Since igb/igc really do have TXQ 0 of higher hardware priority than TXQ 1 etc, we need to preserve the behavior for them as well. We really have no choice, because in txtime-assist mode, taprio is essentially a software scheduler towards offloaded child tc-etf qdiscs, so the TXQ selection really does matter (not all igb TXQs support ETF/SO_TXTIME, says Kurt Kanzenbach). To preserve the behavior, we need a capability bit so that taprio can determine if it's running on igb/igc, or on something else. Because igb doesn't offload taprio at all, we can't piggyback on the qdisc_offload_query_caps() call from taprio_enable_offload(), but instead we need a separate call which is also made for software scheduling. Introduce two static keys to minimize the performance penalty on systems which only have igb/igc NICs, and on systems which only have other NICs. For mixed systems, taprio will have to dynamically check whether to dequeue using one prioritization algorithm or using the other. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:52 +00:00
Vladimir Oltean	4c22942734	net/sched: taprio: avoid calling child->ops->dequeue(child) twice Simplify taprio_dequeue_from_txq() by noticing that we can goto one call earlier than the previous skb_found label. This is possible because we've unified the treatment of the child->ops->dequeue(child) return call, we always try other TXQs now, instead of abandoning the root dequeue completely if we failed in the peek() case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:52 +00:00
Vladimir Oltean	92f966674f	net/sched: taprio: refactor one skb dequeue from TXQ to separate function Future changes will refactor the TXQ selection procedure, and a lot of stuff will become messy, the indentation of the bulk of the dequeue procedure would increase, etc. Break out the bulk of the function into a new one, which knows the TXQ (child qdisc) we should perform a dequeue from. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:52 +00:00
Vladimir Oltean	1638bbbe4e	net/sched: taprio: continue with other TXQs if one dequeue() failed This changes the handling of an unlikely condition to not stop dequeuing if taprio failed to dequeue the peeked skb in taprio_dequeue(). I've no idea when this can happen, but the only side effect seems to be that the atomic_sub_return() call right above will have consumed some budget. This isn't a big deal, since either that made us remain without any budget (and therefore, we'd exit on the next peeked skb anyway), or we could send some packets from other TXQs. I'm making this change because in a future patch I'll be refactoring the dequeue procedure to simplify it, and this corner case will have to go away. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:52 +00:00
Vladimir Oltean	ecc0cc9863	net/sched: taprio: delete peek() implementation There isn't any code in the network stack which calls taprio_peek(). We only see qdisc->ops->peek() being called on child qdiscs of other classful qdiscs, never from the generic qdisc code. Whereas taprio is never a child qdisc, it is always root. This snippet of a comment from qdisc_peek_dequeued() seems to confirm: /* we can reuse ->gso_skb because peek isn't called for root qdiscs */ Since I've been known to be wrong many times though, I'm not completely removing it, but leaving a stub function in place which emits a warning. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:48:52 +00:00
David S. Miller	6da13bf976	Merge branch 'micrel-lan8841-support' Horatiu Vultur says: ==================== net: micrel: Add support for lan8841 PHY Add support for lan8841 PHY. The first patch add the support for lan8841 PHY which can run at 10/100/1000Mbit. It also has support for other features, but they are not added in this series. The second patch updates the documentation for the dt-bindings which is similar to the ksz9131. v3->v4: - add space between defines and function names - inside lan8841_config_init use only ret variable v2->v3: - reuse ksz9131_config_init - allow only open-drain configuration - change from single patch to a patch series v1->v2: - Remove hardcoded values - Fix typo in commit message ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:16:07 +00:00
Horatiu Vultur	33e581d76e	dt-bindings: net: micrel-ksz90x1.txt: Update for lan8841 The lan8841 has the same bindings as ksz9131, so just reuse the entire section of ksz9131. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:16:07 +00:00
Horatiu Vultur	a8f1a19d27	net: micrel: Add support for lan8841 PHY The LAN8841 is completely integrated triple-speed (10BASE-T/ 100BASE-TX/ 1000BASE-T) Ethernet physical layer transceivers for transmission and reception of data on standard CAT-5, as well as CAT-5e and CAT-6, unshielded twisted pair (UTP) cables. The LAN8841 offers the industry-standard GMII/MII as well as the RGMII. Some of the features of the PHY are: - Wake on LAN - Auto-MDIX - IEEE 1588-2008 (V2) - LinkMD Capable diagnosis Currently the patch offers support only for link configuration. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:16:07 +00:00
Horatiu Vultur	9ed138ff37	net: lan966x: Add support for TC flower filter statistics Add flower filter packet statistics. This will just read the TCAM counter of the rule, which mention how many packages were hit by this rule. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-08 09:13:08 +00:00
Jakub Kicinski	1fe8a3b61f	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: various virtualization cleanups Jacob Keller says: This series contains a variety of refactors and cleanups in the VF code for the ice driver. Its primary focus is cleanup and simplification of the VF operations and addition of a few new operations that will be required by Scalable IOV, as well as some other refactors needed for the handling of VF subfunctions. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: remove unnecessary virtchnl_ether_addr struct use ice: introduce .irq_close VF operation ice: introduce clear_reset_state operation ice: convert vf_ops .vsi_rebuild to .create_vsi ice: introduce ice_vf_init_host_cfg function ice: add a function to initialize vf entry ice: Pull common tasks into ice_vf_post_vsi_rebuild ice: move ice_vf_vsi_release into ice_vf_lib.c ice: move vsi_type assignment from ice_vsi_alloc to ice_vsi_cfg ice: refactor VSI setup to use parameter structure ice: drop unnecessary VF parameter from several VSI functions ice: fix function comment referring to ice_vsi_alloc ice: Add more usage of existing function ice_get_vf_vsi(vf) ==================== Link: https://lore.kernel.org/r/20230206214813.20107-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 21:40:40 -08:00
Moshe Shemesh	cb6b2e11a4	devlink: Fix memleak in health diagnose callback The callback devlink_nl_cmd_health_reporter_diagnose_doit() miss devlink_fmsg_free(), which leads to memory leak. Fix it by adding devlink_fmsg_free(). Fixes: `e994a75fb7` ("devlink: remove reporter reference counting") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/1675698976-45993-1-git-send-email-moshe@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:21:33 -08:00
James Hershaw	f817554786	nfp: flower: add check for flower VF netdevs for get/set_eeprom Move the nfp_net_get_port_mac_by_hwinfo() check to ahead in the get/set_eeprom() functions to in order to check for a VF netdev, which this function does not support. It is debatable if this is a fix or an enhancement, and we have chosen to go for the latter. It does address a problem introduced by commit `74b4f1739d` ("nfp: flower: change get/set_eeprom logic and enable for flower reps"). However, the ethtool->len == 0 check avoids the problem manifesting as a run-time bug (NULL pointer dereference of app). Signed-off-by: James Hershaw <james.hershaw@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230206154836.2803995-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:20:37 -08:00
Jakub Kicinski	b24e9de398	Merge branch 'mlxsw-misc-devlink-changes' Petr Machata says: ==================== mlxsw: Misc devlink changes This patchset adjusts mlxsw to recent devlink changes in net-next. Patch #1 removes a devl_param_driverinit_value_set() call that was unnecessary, but now additionally triggers a WARN_ON. Patches #2-#4 are non-functional preparations for the following patches. Patch #5 fixes a use-after-free that is triggered while changing network namespaces. Patch #6 makes mlxsw consistent with netdevsim by having mlxsw register its devlink instance before its sub-objects. It helps us avoid a warning described in the commit message. ==================== Link: https://lore.kernel.org/r/cover.1675692666.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:18:52 -08:00
Ido Schimmel	9d9a90cda4	mlxsw: core: Register devlink instance before sub-objects Recent changes made it possible to register the devlink instance before its sub-objects and under the instance lock. Among other things, it allows us to avoid warnings such as this one [1]. The warning is generated because a buggy firmware is generating a health event during driver initialization, before the devlink instance is registered. Move the registration of the devlink instance to the beginning of the initialization flow to avoid such problems. A similar change was implemented in netdevsim in commit `82a3aef2e6` ("netdevsim: move devlink registration under the instance lock"). [1] WARNING: CPU: 3 PID: 49 at net/devlink/leftover.c:7509 devlink_recover_notify.constprop.0+0xaf/0xc0 [...] Call Trace: <TASK> devlink_health_report+0x45/0x1d0 mlxsw_core_health_event_work+0x24/0x30 [mlxsw_core] process_one_work+0x1db/0x390 worker_thread+0x49/0x3b0 kthread+0xe5/0x110 ret_from_fork+0x1f/0x30 </TASK> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:18:50 -08:00
Ido Schimmel	74cbc3c03c	mlxsw: spectrum_acl_tcam: Move devlink param to TCAM code Cited commit added 'DEVLINK_CMD_PARAM_DEL' notifications whenever the network namespace of the devlink instance is changed. Specifically, the notifications are generated after calling reload_down(), but before calling reload_up(). At this stage, the data structures accessed while reading the value of the "acl_region_rehash_interval" devlink parameter are uninitialized, resulting in a use-after-free [1]. Fix by moving the registration and unregistration of the devlink parameter to the TCAM code where it is actually used. This means that the parameter is unregistered during reload_down() and then re-registered during reload_up(), avoiding the use-after-free between these two operations. Reproducer: # ip netns add test123 # devlink dev reload pci/0000:06:00.0 netns test123 [1] BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xb2/0xd0 Read of size 4 at addr ffff888162fd37d8 by task devlink/1323 [...] Call Trace: <TASK> dump_stack_lvl+0x95/0xbd print_report+0x181/0x4a1 kasan_report+0xdb/0x200 mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xb2/0xd0 mlxsw_sp_params_acl_region_rehash_intrvl_get+0x32/0x80 devlink_nl_param_fill.constprop.0+0x29a/0x11e0 devlink_param_notify.constprop.0+0xb9/0x250 devlink_notify_unregister+0xbc/0x470 devlink_reload+0x1aa/0x440 devlink_nl_cmd_reload+0x559/0x11b0 genl_family_rcv_msg_doit.isra.0+0x1f8/0x2e0 genl_rcv_msg+0x558/0x7f0 netlink_rcv_skb+0x170/0x440 genl_rcv+0x2d/0x40 netlink_unicast+0x53f/0x810 netlink_sendmsg+0x961/0xe80 __sys_sendto+0x2a4/0x420 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: `7d7e9169a3` ("devlink: move devlink reload notifications back in between _down() and _up() calls") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:18:49 -08:00
Ido Schimmel	194ab94760	mlxsw: spectrum_acl_tcam: Reorder functions to avoid forward declarations Move the initialization and de-initialization code further below in order to avoid forward declarations in the next patch. No functional changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:18:49 -08:00
Ido Schimmel	61fe3b9102	mlxsw: spectrum_acl_tcam: Make fini symmetric to init Move mutex_destroy() to the end to make the function symmetric with mlxsw_sp_acl_tcam_init(). No functional changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:18:49 -08:00
Ido Schimmel	65823e07b1	mlxsw: spectrum_acl_tcam: Add missing mutex_destroy() Pair mutex_init() with a mutex_destroy() in the error path. Found during code review. No functional changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:18:49 -08:00
Danielle Ratson	8b50ac2985	mlxsw: spectrum: Remove pointless call to devlink_param_driverinit_value_set() The "acl_region_rehash_interval" devlink parameter is a "runtime" parameter, making the call to devl_param_driverinit_value_set() pointless. Before cited commit the function simply returned an error (that was not checked), but now it emits a WARNING [1]. Fix by removing the function call. [1] WARNING: CPU: 0 PID: 7 at net/devlink/leftover.c:10974 devl_param_driverinit_value_set+0x8c/0x90 [...] Call Trace: <TASK> mlxsw_sp2_params_register+0x83/0xb0 [mlxsw_spectrum] __mlxsw_core_bus_device_register+0x5e5/0x990 [mlxsw_core] mlxsw_core_bus_device_register+0x42/0x60 [mlxsw_core] mlxsw_pci_probe+0x1f0/0x230 [mlxsw_pci] local_pci_probe+0x1a/0x40 work_for_cpu_fn+0xf/0x20 process_one_work+0x1db/0x390 worker_thread+0x1d5/0x3b0 kthread+0xe5/0x110 ret_from_fork+0x1f/0x30 </TASK> Fixes: `85fe0b324c` ("devlink: make devlink_param_driverinit_value_set() return void") Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:18:49 -08:00
Vladimir Oltean	cf52bd238b	net: enetc: add support for MAC Merge statistics counters Add PF driver support for the following: - Viewing the standardized MAC Merge layer counters. - Viewing the standardized Ethernet MAC and RMON counters associated with the pMAC. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230206094531.444988-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:13:55 -08:00
Vladimir Oltean	c7b9e80869	net: enetc: add support for MAC Merge layer Add PF driver support for viewing and changing the MAC Merge sublayer parameters, and seeing the verification state machine's current state. The verification handshake with the link partner is driven by hardware. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230206094531.444988-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:13:55 -08:00
Jakub Kicinski	cc74ca303a	Merge branch 'sched-cpumask-improve-on-cpumask_local_spread-locality' Yury Norov says: ==================== sched: cpumask: improve on cpumask_local_spread() locality cpumask_local_spread() currently checks local node for presence of i'th CPU, and then if it finds nothing makes a flat search among all non-local CPUs. We can do it better by checking CPUs per NUMA hops. This has significant performance implications on NUMA machines, for example when using NUMA-aware allocated memory together with NUMA-aware IRQ affinity hints. Performance tests from patch 8 of this series for mellanox network driver show: TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 +-------------------------+-----------+------------------+------------------+ \| \| BW (Gbps) \| TX side CPU util \| RX side CPU util \| +-------------------------+-----------+------------------+------------------+ \| Baseline \| 52.3 \| 6.4 % \| 17.9 % \| +-------------------------+-----------+------------------+------------------+ \| Applied on TX side only \| 52.6 \| 5.2 % \| 18.5 % \| +-------------------------+-----------+------------------+------------------+ \| Applied on RX side only \| 94.9 \| 11.9 % \| 27.2 % \| +-------------------------+-----------+------------------+------------------+ \| Applied on both sides \| 95.1 \| 8.4 % \| 27.3 % \| +-------------------------+-----------+------------------+------------------+ Bottleneck in RX side is released, reached linerate (~1.8x speedup). ~30% less cpu util on TX. ==================== Link: https://lore.kernel.org/r/20230121042436.2661843-1-yury.norov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:03 -08:00
Yury Norov	2ac4980c57	lib/cpumask: update comment for cpumask_local_spread() Now that we have an iterator-based alternative for a very common case of using cpumask_local_spread for all cpus in a row, it's worth to mention that in comment to cpumask_local_spread(). Signed-off-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Tariq Toukan	2acda57736	net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints In the IRQ affinity hints, replace the binary NUMA preference (local / remote) with the improved for_each_numa_hop_cpu() API that minds the actual distances, so that remote NUMAs with short distance are preferred over farther ones. This has significant performance implications when using NUMA-aware allocated memory (follow [1] and derivatives for example). [1] drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel() int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); Performance tests: TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 +-------------------------+-----------+------------------+------------------+ \| \| BW (Gbps) \| TX side CPU util \| RX side CPU util \| +-------------------------+-----------+------------------+------------------+ \| Baseline \| 52.3 \| 6.4 % \| 17.9 % \| +-------------------------+-----------+------------------+------------------+ \| Applied on TX side only \| 52.6 \| 5.2 % \| 18.5 % \| +-------------------------+-----------+------------------+------------------+ \| Applied on RX side only \| 94.9 \| 11.9 % \| 27.2 % \| +-------------------------+-----------+------------------+------------------+ \| Applied on both sides \| 95.1 \| 8.4 % \| 27.3 % \| +-------------------------+-----------+------------------+------------------+ Bottleneck in RX side is released, reached linerate (~1.8x speedup). ~30% less cpu util on TX. * CPU util on active cores only. Setups details (similar for both sides): NIC: ConnectX6-DX dual port, 100 Gbps each. Single port used in the tests. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 256 On-line CPU(s) list: 0-255 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 2 NUMA node(s): 16 Vendor ID: AuthenticAMD CPU family: 25 Model: 1 Model name: AMD EPYC 7763 64-Core Processor Stepping: 1 CPU MHz: 2594.804 BogoMIPS: 4890.73 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 32768K NUMA node0 CPU(s): 0-7,128-135 NUMA node1 CPU(s): 8-15,136-143 NUMA node2 CPU(s): 16-23,144-151 NUMA node3 CPU(s): 24-31,152-159 NUMA node4 CPU(s): 32-39,160-167 NUMA node5 CPU(s): 40-47,168-175 NUMA node6 CPU(s): 48-55,176-183 NUMA node7 CPU(s): 56-63,184-191 NUMA node8 CPU(s): 64-71,192-199 NUMA node9 CPU(s): 72-79,200-207 NUMA node10 CPU(s): 80-87,208-215 NUMA node11 CPU(s): 88-95,216-223 NUMA node12 CPU(s): 96-103,224-231 NUMA node13 CPU(s): 104-111,232-239 NUMA node14 CPU(s): 112-119,240-247 NUMA node15 CPU(s): 120-127,248-255 .. $ numactl -H .. node distances: node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: 10 11 11 11 12 12 12 12 32 32 32 32 32 32 32 32 1: 11 10 11 11 12 12 12 12 32 32 32 32 32 32 32 32 2: 11 11 10 11 12 12 12 12 32 32 32 32 32 32 32 32 3: 11 11 11 10 12 12 12 12 32 32 32 32 32 32 32 32 4: 12 12 12 12 10 11 11 11 32 32 32 32 32 32 32 32 5: 12 12 12 12 11 10 11 11 32 32 32 32 32 32 32 32 6: 12 12 12 12 11 11 10 11 32 32 32 32 32 32 32 32 7: 12 12 12 12 11 11 11 10 32 32 32 32 32 32 32 32 8: 32 32 32 32 32 32 32 32 10 11 11 11 12 12 12 12 9: 32 32 32 32 32 32 32 32 11 10 11 11 12 12 12 12 10: 32 32 32 32 32 32 32 32 11 11 10 11 12 12 12 12 11: 32 32 32 32 32 32 32 32 11 11 11 10 12 12 12 12 12: 32 32 32 32 32 32 32 32 12 12 12 12 10 11 11 11 13: 32 32 32 32 32 32 32 32 12 12 12 12 11 10 11 11 14: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 10 11 15: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 11 10 $ cat /sys/class/net/ens5f0/device/numa_node 14 Affinity hints (127 IRQs): Before: 331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000 332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000 333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000 334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000 335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000 336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000 337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000 338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000 339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 347: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 348: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002 349: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004 350: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000008 351: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000010 352: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000020 353: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000040 354: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080 355: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100 356: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000200 357: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000400 358: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000800 359: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00001000 360: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00002000 361: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00004000 362: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00008000 363: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00010000 364: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00020000 365: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00040000 366: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00080000 367: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00100000 368: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00200000 369: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00400000 370: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00800000 371: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,01000000 372: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,02000000 373: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,04000000 374: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,08000000 375: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,10000000 376: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000 377: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,40000000 378: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,80000000 379: 00000000,00000000,00000000,00000000,00000000,00000000,00000001,00000000 380: 00000000,00000000,00000000,00000000,00000000,00000000,00000002,00000000 381: 00000000,00000000,00000000,00000000,00000000,00000000,00000004,00000000 382: 00000000,00000000,00000000,00000000,00000000,00000000,00000008,00000000 383: 00000000,00000000,00000000,00000000,00000000,00000000,00000010,00000000 384: 00000000,00000000,00000000,00000000,00000000,00000000,00000020,00000000 385: 00000000,00000000,00000000,00000000,00000000,00000000,00000040,00000000 386: 00000000,00000000,00000000,00000000,00000000,00000000,00000080,00000000 387: 00000000,00000000,00000000,00000000,00000000,00000000,00000100,00000000 388: 00000000,00000000,00000000,00000000,00000000,00000000,00000200,00000000 389: 00000000,00000000,00000000,00000000,00000000,00000000,00000400,00000000 390: 00000000,00000000,00000000,00000000,00000000,00000000,00000800,00000000 391: 00000000,00000000,00000000,00000000,00000000,00000000,00001000,00000000 392: 00000000,00000000,00000000,00000000,00000000,00000000,00002000,00000000 393: 00000000,00000000,00000000,00000000,00000000,00000000,00004000,00000000 394: 00000000,00000000,00000000,00000000,00000000,00000000,00008000,00000000 395: 00000000,00000000,00000000,00000000,00000000,00000000,00010000,00000000 396: 00000000,00000000,00000000,00000000,00000000,00000000,00020000,00000000 397: 00000000,00000000,00000000,00000000,00000000,00000000,00040000,00000000 398: 00000000,00000000,00000000,00000000,00000000,00000000,00080000,00000000 399: 00000000,00000000,00000000,00000000,00000000,00000000,00100000,00000000 400: 00000000,00000000,00000000,00000000,00000000,00000000,00200000,00000000 401: 00000000,00000000,00000000,00000000,00000000,00000000,00400000,00000000 402: 00000000,00000000,00000000,00000000,00000000,00000000,00800000,00000000 403: 00000000,00000000,00000000,00000000,00000000,00000000,01000000,00000000 404: 00000000,00000000,00000000,00000000,00000000,00000000,02000000,00000000 405: 00000000,00000000,00000000,00000000,00000000,00000000,04000000,00000000 406: 00000000,00000000,00000000,00000000,00000000,00000000,08000000,00000000 407: 00000000,00000000,00000000,00000000,00000000,00000000,10000000,00000000 408: 00000000,00000000,00000000,00000000,00000000,00000000,20000000,00000000 409: 00000000,00000000,00000000,00000000,00000000,00000000,40000000,00000000 410: 00000000,00000000,00000000,00000000,00000000,00000000,80000000,00000000 411: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000 412: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000 413: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000 414: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000 415: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000 416: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000 417: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000 418: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000 419: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000 420: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000 421: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000 422: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000 423: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000 424: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000 425: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000 426: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000 427: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000 428: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000 429: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000 430: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000 431: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000 432: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000 433: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000 434: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000 435: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000 436: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000 437: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000 438: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000 439: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000 440: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000 441: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000 442: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000 443: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000 444: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000 445: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000 446: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000 447: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000 448: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000 449: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000 450: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000 451: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000 452: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000 453: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000 454: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000 455: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000 456: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000 457: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000 After: 331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000 332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000 333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000 334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000 335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000 336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000 337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000 338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000 339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 347: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000 348: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000 349: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000 350: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000 351: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000 352: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000 353: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000 354: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000 355: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000 356: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000 357: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000 358: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000 359: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000 360: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000 361: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000 362: 00000000,00000000,00000000,00000000,00008000,00000000,00000000,00000000 363: 00000000,00000000,00000000,00000000,01000000,00000000,00000000,00000000 364: 00000000,00000000,00000000,00000000,02000000,00000000,00000000,00000000 365: 00000000,00000000,00000000,00000000,04000000,00000000,00000000,00000000 366: 00000000,00000000,00000000,00000000,08000000,00000000,00000000,00000000 367: 00000000,00000000,00000000,00000000,10000000,00000000,00000000,00000000 368: 00000000,00000000,00000000,00000000,20000000,00000000,00000000,00000000 369: 00000000,00000000,00000000,00000000,40000000,00000000,00000000,00000000 370: 00000000,00000000,00000000,00000000,80000000,00000000,00000000,00000000 371: 00000001,00000000,00000000,00000000,00000000,00000000,00000000,00000000 372: 00000002,00000000,00000000,00000000,00000000,00000000,00000000,00000000 373: 00000004,00000000,00000000,00000000,00000000,00000000,00000000,00000000 374: 00000008,00000000,00000000,00000000,00000000,00000000,00000000,00000000 375: 00000010,00000000,00000000,00000000,00000000,00000000,00000000,00000000 376: 00000020,00000000,00000000,00000000,00000000,00000000,00000000,00000000 377: 00000040,00000000,00000000,00000000,00000000,00000000,00000000,00000000 378: 00000080,00000000,00000000,00000000,00000000,00000000,00000000,00000000 379: 00000100,00000000,00000000,00000000,00000000,00000000,00000000,00000000 380: 00000200,00000000,00000000,00000000,00000000,00000000,00000000,00000000 381: 00000400,00000000,00000000,00000000,00000000,00000000,00000000,00000000 382: 00000800,00000000,00000000,00000000,00000000,00000000,00000000,00000000 383: 00001000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 384: 00002000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 385: 00004000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 386: 00008000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 387: 01000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 388: 02000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 389: 04000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 390: 08000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 391: 10000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 392: 20000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 393: 40000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 394: 80000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 395: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000 396: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000 397: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000 398: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000 399: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000 400: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000 401: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000 402: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000 403: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000 404: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000 405: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000 406: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000 407: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000 408: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000 409: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000 410: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000 411: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000 412: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000 413: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000 414: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000 415: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000 416: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000 417: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000 418: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000 419: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000 420: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000 421: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000 422: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000 423: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000 424: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000 425: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000 426: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000 427: 00000000,00000001,00000000,00000000,00000000,00000000,00000000,00000000 428: 00000000,00000002,00000000,00000000,00000000,00000000,00000000,00000000 429: 00000000,00000004,00000000,00000000,00000000,00000000,00000000,00000000 430: 00000000,00000008,00000000,00000000,00000000,00000000,00000000,00000000 431: 00000000,00000010,00000000,00000000,00000000,00000000,00000000,00000000 432: 00000000,00000020,00000000,00000000,00000000,00000000,00000000,00000000 433: 00000000,00000040,00000000,00000000,00000000,00000000,00000000,00000000 434: 00000000,00000080,00000000,00000000,00000000,00000000,00000000,00000000 435: 00000000,00000100,00000000,00000000,00000000,00000000,00000000,00000000 436: 00000000,00000200,00000000,00000000,00000000,00000000,00000000,00000000 437: 00000000,00000400,00000000,00000000,00000000,00000000,00000000,00000000 438: 00000000,00000800,00000000,00000000,00000000,00000000,00000000,00000000 439: 00000000,00001000,00000000,00000000,00000000,00000000,00000000,00000000 440: 00000000,00002000,00000000,00000000,00000000,00000000,00000000,00000000 441: 00000000,00004000,00000000,00000000,00000000,00000000,00000000,00000000 442: 00000000,00008000,00000000,00000000,00000000,00000000,00000000,00000000 443: 00000000,00010000,00000000,00000000,00000000,00000000,00000000,00000000 444: 00000000,00020000,00000000,00000000,00000000,00000000,00000000,00000000 445: 00000000,00040000,00000000,00000000,00000000,00000000,00000000,00000000 446: 00000000,00080000,00000000,00000000,00000000,00000000,00000000,00000000 447: 00000000,00100000,00000000,00000000,00000000,00000000,00000000,00000000 448: 00000000,00200000,00000000,00000000,00000000,00000000,00000000,00000000 449: 00000000,00400000,00000000,00000000,00000000,00000000,00000000,00000000 450: 00000000,00800000,00000000,00000000,00000000,00000000,00000000,00000000 451: 00000000,01000000,00000000,00000000,00000000,00000000,00000000,00000000 452: 00000000,02000000,00000000,00000000,00000000,00000000,00000000,00000000 453: 00000000,04000000,00000000,00000000,00000000,00000000,00000000,00000000 454: 00000000,08000000,00000000,00000000,00000000,00000000,00000000,00000000 455: 00000000,10000000,00000000,00000000,00000000,00000000,00000000,00000000 456: 00000000,20000000,00000000,00000000,00000000,00000000,00000000,00000000 457: 00000000,40000000,00000000,00000000,00000000,00000000,00000000,00000000 Signed-off-by: Tariq Toukan <tariqt@nvidia.com> [Tweaked API use] Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Valentin Schneider	06ac01721f	sched/topology: Introduce for_each_numa_hop_mask() The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs reachable within a given distance budget, wrap the logic for iterating over all (distance, mask) values inside an iterator macro. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Valentin Schneider	9feae65845	sched/topology: Introduce sched_numa_hop_mask() Tariq has pointed out that drivers allocating IRQ vectors would benefit from having smarter NUMA-awareness - cpumask_local_spread() only knows about the local node and everything outside is in the same bucket. sched_domains_numa_masks is pretty much what we want to hand out (a cpumask of CPUs reachable within a given distance budget), introduce sched_numa_hop_mask() to export those cpumasks. Link: http://lore.kernel.org/r/20220728191203.4055-1-tariqt@nvidia.com Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Yury Norov	b1beed72b8	lib/cpumask: reorganize cpumask_local_spread() logic Now after moving all NUMA logic into sched_numa_find_nth_cpu(), else-branch of cpumask_local_spread() is just a function call, and we can simplify logic by using ternary operator. While here, replace BUG() with WARN_ON(). Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Yury Norov	406d394abf	cpumask: improve on cpumask_local_spread() locality Switch cpumask_local_spread() to use newly added sched_numa_find_nth_cpu(), which takes into account distances to each node in the system. For the following NUMA configuration: root@debian:~# numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 node 0 size: 3869 MB node 0 free: 3740 MB node 1 cpus: 4 5 node 1 size: 1969 MB node 1 free: 1937 MB node 2 cpus: 6 7 node 2 size: 1967 MB node 2 free: 1873 MB node 3 cpus: 8 9 10 11 12 13 14 15 node 3 size: 7842 MB node 3 free: 7723 MB node distances: node 0 1 2 3 0: 10 50 30 70 1: 50 10 70 30 2: 30 70 10 50 3: 70 30 50 10 The new cpumask_local_spread() traverses cpus for each node like this: node 0: 0 1 2 3 6 7 4 5 8 9 10 11 12 13 14 15 node 1: 4 5 8 9 10 11 12 13 14 15 0 1 2 3 6 7 node 2: 6 7 0 1 2 3 8 9 10 11 12 13 14 15 4 5 node 3: 8 9 10 11 12 13 14 15 4 5 6 7 0 1 2 3 Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Yury Norov	cd7f55359c	sched: add sched_numa_find_nth_cpu() The function finds Nth set CPU in a given cpumask starting from a given node. Leveraging the fact that each hop in sched_domains_numa_masks includes the same or greater number of CPUs than the previous one, we can use binary search on hops instead of linear walk, which makes the overall complexity of O(log n) in terms of number of cpumask_weight() calls. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Yury Norov	62f4386e56	cpumask: introduce cpumask_nth_and_andnot Introduce cpumask_nth_and_andnot() based on find_nth_and_andnot_bit(). It's used in the following patch to traverse cpumasks without storing intermediate result in temporary cpumask. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Yury Norov	4324511780	lib/find: introduce find_nth_and_andnot_bit In the following patches the function is used to implement in-place bitmaps traversing without storing intermediate result in temporary bitmaps. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Shay Drory	f713313523	net/mlx5: fw_tracer, Add support for unrecognized string In case FW is publishing a string which isn't found in the driver's string DBs, keep the string as raw data. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-02-07 16:29:56 -08:00
Shay Drory	7dfcd110a4	net/mlx5: fw_tracer, Add support for strings DB update event In case a new string DB is added to the FW, the FW publishes an event notifying the strings DB have updated. Add support in driver for handling this event. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-02-07 16:29:56 -08:00
Shay Drory	b0118ced6b	net/mlx5: fw_tracer, allow 0 size string DBs Device can expose string DB of size 0 which means this string DB is currently not in use. Therefore, allow for 0 size string DBs. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-02-07 16:29:56 -08:00
Shay Drory	988c235227	net/mlx5: fw_tracer: Fix debug print The debug message specify tdsn, but takes as an argument the tmsn. The correct argument is tmsn, hence, fix the print. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-02-07 16:29:55 -08:00
Roi Dayan	beeebdc52c	net/mlx5: fs, Remove redundant assignment of size size is being reassigned in the line after. remove the redundant assignment. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-02-07 16:29:55 -08:00
Maor Dickman	08929f32da	net/mlx5: fs_core, Remove redundant variable err Local variable "err" is not used so it is safe to remove. Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-02-07 16:29:55 -08:00

1 2 3 4 5 ...

1156385 Commits