diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst index 7627b1da01f2..643f5903d1d8 100644 --- a/Documentation/networking/devlink/devlink-port.rst +++ b/Documentation/networking/devlink/devlink-port.rst @@ -191,13 +191,44 @@ API allows to configure following rate object's parameters: ``tx_max`` Maximum TX rate value. +``tx_priority`` + Allows for usage of strict priority arbiter among siblings. This + arbitration scheme attempts to schedule nodes based on their priority + as long as the nodes remain within their bandwidth limit. The higher the + priority the higher the probability that the node will get selected for + scheduling. + +``tx_weight`` + Allows for usage of Weighted Fair Queuing arbitration scheme among + siblings. This arbitration scheme can be used simultaneously with the + strict priority. As a node is configured with a higher rate it gets more + BW relative to it's siblings. Values are relative like a percentage + points, they basically tell how much BW should node take relative to + it's siblings. + ``parent`` Parent node name. Parent node rate limits are considered as additional limits to all node children limits. ``tx_max`` is an upper limit for children. ``tx_share`` is a total bandwidth distributed among children. +``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case +nodes with the same priority form a WFQ subgroup in the sibling group +and arbitration among them is based on assigned weights. + +Arbitration flow from the high level: +#. Choose a node, or group of nodes with the highest priority that stays + within the BW limit and are not blocked. Use ``tx_priority`` as a + parameter for this arbitration. +#. If group of nodes have the same priority perform WFQ arbitration on + that subgroup. Use ``tx_weight`` as a parameter for this arbitration. +#. Select the winner node, and continue arbitration flow among it's children, + until leaf node is reached, and the winner is established. +#. If all the nodes from the highest priority sub-group are satisfied, or + overused their assigned BW, move to the lower priority nodes. + Driver implementations are allowed to support both or either rate object types -and setting methods of their parameters. +and setting methods of their parameters. Additionally driver implementation +may export nodes/leafs and their child-parent relationships. Terms and Definitions ===================== diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst index 0c89ceb8986d..890062da7820 100644 --- a/Documentation/networking/devlink/ice.rst +++ b/Documentation/networking/devlink/ice.rst @@ -254,3 +254,118 @@ Users can request an immediate capture of a snapshot via the 0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 $ devlink region delete pci/0000:01:00.0/device-caps snapshot 1 + +Devlink Rate +============ + +The ``ice`` driver implements devlink-rate API. It allows for offload of +the Hierarchical QoS to the hardware. It enables user to group Virtual +Functions in a tree structure and assign supported parameters: tx_share, +tx_max, tx_priority and tx_weight to each node in a tree. So effectively +user gains an ability to control how much bandwidth is allocated for each +VF group. This is later enforced by the HW. + +It is assumed that this feature is mutually exclusive with DCB performed +in FW and ADQ, or any driver feature that would trigger changes in QoS, +for example creation of the new traffic class. The driver will prevent DCB +or ADQ configuration if user started making any changes to the nodes using +devlink-rate API. To configure those features a driver reload is necessary. +Correspondingly if ADQ or DCB will get configured the driver won't export +hierarchy at all, or will remove the untouched hierarchy if those +features are enabled after the hierarchy is exported, but before any +changes are made. + +This feature is also dependent on switchdev being enabled in the system. +It's required bacause devlink-rate requires devlink-port objects to be +present, and those objects are only created in switchdev mode. + +If the driver is set to the switchdev mode, it will export internal +hierarchy the moment VF's are created. Root of the tree is always +represented by the node_0. This node can't be deleted by the user. Leaf +nodes and nodes with children also can't be deleted. + +.. list-table:: Attributes supported + :widths: 15 85 + + * - Name + - Description + * - ``tx_max`` + - maximum bandwidth to be consumed by the tree Node. Rate Limit is + an absolute number specifying a maximum amount of bytes a Node may + consume during the course of one second. Rate limit guarantees + that a link will not oversaturate the receiver on the remote end + and also enforces an SLA between the subscriber and network + provider. + * - ``tx_share`` + - minimum bandwidth allocated to a tree node when it is not blocked. + It specifies an absolute BW. While tx_max defines the maximum + bandwidth the node may consume, the tx_share marks committed BW + for the Node. + * - ``tx_priority`` + - allows for usage of strict priority arbiter among siblings. This + arbitration scheme attempts to schedule nodes based on their + priority as long as the nodes remain within their bandwidth limit. + Range 0-7. Nodes with priority 7 have the highest priority and are + selected first, while nodes with priority 0 have the lowest + priority. Nodes that have the same priority are treated equally. + * - ``tx_weight`` + - allows for usage of Weighted Fair Queuing arbitration scheme among + siblings. This arbitration scheme can be used simultaneously with + the strict priority. Range 1-200. Only relative values mater for + arbitration. + +``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case +nodes with the same priority form a WFQ subgroup in the sibling group +and arbitration among them is based on assigned weights. + +.. code:: shell + + # enable switchdev + $ devlink dev eswitch set pci/0000:4b:00.0 mode switchdev + + # at this point driver should export internal hierarchy + $ echo 2 > /sys/class/net/ens785np0/device/sriov_numvfs + + $ devlink port function rate show + pci/0000:4b:00.0/node_25: type node parent node_24 + pci/0000:4b:00.0/node_24: type node parent node_0 + pci/0000:4b:00.0/node_32: type node parent node_31 + pci/0000:4b:00.0/node_31: type node parent node_30 + pci/0000:4b:00.0/node_30: type node parent node_16 + pci/0000:4b:00.0/node_19: type node parent node_18 + pci/0000:4b:00.0/node_18: type node parent node_17 + pci/0000:4b:00.0/node_17: type node parent node_16 + pci/0000:4b:00.0/node_14: type node parent node_5 + pci/0000:4b:00.0/node_5: type node parent node_3 + pci/0000:4b:00.0/node_13: type node parent node_4 + pci/0000:4b:00.0/node_12: type node parent node_4 + pci/0000:4b:00.0/node_11: type node parent node_4 + pci/0000:4b:00.0/node_10: type node parent node_4 + pci/0000:4b:00.0/node_9: type node parent node_4 + pci/0000:4b:00.0/node_8: type node parent node_4 + pci/0000:4b:00.0/node_7: type node parent node_4 + pci/0000:4b:00.0/node_6: type node parent node_4 + pci/0000:4b:00.0/node_4: type node parent node_3 + pci/0000:4b:00.0/node_3: type node parent node_16 + pci/0000:4b:00.0/node_16: type node parent node_15 + pci/0000:4b:00.0/node_15: type node parent node_0 + pci/0000:4b:00.0/node_2: type node parent node_1 + pci/0000:4b:00.0/node_1: type node parent node_0 + pci/0000:4b:00.0/node_0: type node + pci/0000:4b:00.0/1: type leaf parent node_25 + pci/0000:4b:00.0/2: type leaf parent node_25 + + # let's create some custom node + $ devlink port function rate add pci/0000:4b:00.0/node_custom parent node_0 + + # second custom node + $ devlink port function rate add pci/0000:4b:00.0/node_custom_1 parent node_custom + + # reassign second VF to newly created branch + $ devlink port function rate set pci/0000:4b:00.0/2 parent node_custom_1 + + # assign tx_weight to the VF + $ devlink port function rate set pci/0000:4b:00.0/2 tx_weight 5 + + # assign tx_share to the VF + $ devlink port function rate set pci/0000:4b:00.0/2 tx_share 500Mbps diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h index 1bdc70aa979d..958c1e435232 100644 --- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h +++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h @@ -848,9 +848,9 @@ struct ice_aqc_txsched_elem { u8 generic; #define ICE_AQC_ELEM_GENERIC_MODE_M 0x1 #define ICE_AQC_ELEM_GENERIC_PRIO_S 0x1 -#define ICE_AQC_ELEM_GENERIC_PRIO_M (0x7 << ICE_AQC_ELEM_GENERIC_PRIO_S) +#define ICE_AQC_ELEM_GENERIC_PRIO_M GENMASK(3, 1) #define ICE_AQC_ELEM_GENERIC_SP_S 0x4 -#define ICE_AQC_ELEM_GENERIC_SP_M (0x1 << ICE_AQC_ELEM_GENERIC_SP_S) +#define ICE_AQC_ELEM_GENERIC_SP_M GENMASK(4, 4) #define ICE_AQC_ELEM_GENERIC_ADJUST_VAL_S 0x5 #define ICE_AQC_ELEM_GENERIC_ADJUST_VAL_M \ (0x3 << ICE_AQC_ELEM_GENERIC_ADJUST_VAL_S) diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c index 039342a0ed15..216370ec60d4 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.c +++ b/drivers/net/ethernet/intel/ice/ice_common.c @@ -1105,6 +1105,9 @@ int ice_init_hw(struct ice_hw *hw) hw->evb_veb = true; + /* init xarray for identifying scheduling nodes uniquely */ + xa_init_flags(&hw->port_info->sched_node_ids, XA_FLAGS_ALLOC); + /* Query the allocated resources for Tx scheduler */ status = ice_sched_query_res_alloc(hw); if (status) { @@ -4600,7 +4603,7 @@ ice_ena_vsi_txq(struct ice_port_info *pi, u16 vsi_handle, u8 tc, u16 q_handle, q_ctx->q_teid = le32_to_cpu(node.node_teid); /* add a leaf node into scheduler tree queue layer */ - status = ice_sched_add_node(pi, hw->num_tx_sched_layers - 1, &node); + status = ice_sched_add_node(pi, hw->num_tx_sched_layers - 1, &node, NULL); if (!status) status = ice_sched_replay_q_bw(pi, q_ctx); @@ -4835,7 +4838,7 @@ ice_ena_vsi_rdma_qset(struct ice_port_info *pi, u16 vsi_handle, u8 tc, for (i = 0; i < num_qsets; i++) { node.node_teid = buf->rdma_qsets[i].qset_teid; ret = ice_sched_add_node(pi, hw->num_tx_sched_layers - 1, - &node); + &node, NULL); if (ret) break; qset_teid[i] = le32_to_cpu(node.node_teid); diff --git a/drivers/net/ethernet/intel/ice/ice_dcb.c b/drivers/net/ethernet/intel/ice/ice_dcb.c index 0b146a0d4205..6be02f9b0b8c 100644 --- a/drivers/net/ethernet/intel/ice/ice_dcb.c +++ b/drivers/net/ethernet/intel/ice/ice_dcb.c @@ -1580,7 +1580,7 @@ ice_update_port_tc_tree_cfg(struct ice_port_info *pi, /* new TC */ status = ice_sched_query_elem(pi->hw, teid2, &elem); if (!status) - status = ice_sched_add_node(pi, 1, &elem); + status = ice_sched_add_node(pi, 1, &elem, NULL); if (status) break; /* update the TC number */ diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c index add90e75f05c..9defb9d0fe88 100644 --- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c @@ -3,6 +3,7 @@ #include "ice_dcb_lib.h" #include "ice_dcb_nl.h" +#include "ice_devlink.h" /** * ice_dcb_get_ena_tc - return bitmap of enabled TCs @@ -364,6 +365,12 @@ int ice_pf_dcb_cfg(struct ice_pf *pf, struct ice_dcbx_cfg *new_cfg, bool locked) /* Enable DCB tagging only when more than one TC */ if (ice_dcb_get_num_tc(new_cfg) > 1) { dev_dbg(dev, "DCB tagging enabled (num TC > 1)\n"); + if (pf->hw.port_info->is_custom_tx_enabled) { + dev_err(dev, "Custom Tx scheduler feature enabled, can't configure DCB\n"); + return -EBUSY; + } + ice_tear_down_devlink_rate_tree(pf); + set_bit(ICE_FLAG_DCB_ENA, pf->flags); } else { dev_dbg(dev, "DCB tagging disabled (num TC = 1)\n"); diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c index 455489e9457d..1d638216484d 100644 --- a/drivers/net/ethernet/intel/ice/ice_devlink.c +++ b/drivers/net/ethernet/intel/ice/ice_devlink.c @@ -8,6 +8,7 @@ #include "ice_devlink.h" #include "ice_eswitch.h" #include "ice_fw_update.h" +#include "ice_dcb_lib.h" static int ice_active_port_option = -1; @@ -713,6 +714,490 @@ ice_devlink_port_unsplit(struct devlink *devlink, struct devlink_port *port, return ice_devlink_port_split(devlink, port, 1, extack); } +/** + * ice_tear_down_devlink_rate_tree - removes devlink-rate exported tree + * @pf: pf struct + * + * This function tears down tree exported during VF's creation. + */ +void ice_tear_down_devlink_rate_tree(struct ice_pf *pf) +{ + struct devlink *devlink; + struct ice_vf *vf; + unsigned int bkt; + + devlink = priv_to_devlink(pf); + + devl_lock(devlink); + mutex_lock(&pf->vfs.table_lock); + ice_for_each_vf(pf, bkt, vf) { + if (vf->devlink_port.devlink_rate) + devl_rate_leaf_destroy(&vf->devlink_port); + } + mutex_unlock(&pf->vfs.table_lock); + + devl_rate_nodes_destroy(devlink); + devl_unlock(devlink); +} + +/** + * ice_enable_custom_tx - try to enable custom Tx feature + * @pf: pf struct + * + * This function tries to enable custom Tx feature, + * it's not possible to enable it, if DCB or ADQ is active. + */ +static bool ice_enable_custom_tx(struct ice_pf *pf) +{ + struct ice_port_info *pi = ice_get_main_vsi(pf)->port_info; + struct device *dev = ice_pf_to_dev(pf); + + if (pi->is_custom_tx_enabled) + /* already enabled, return true */ + return true; + + if (ice_is_adq_active(pf)) { + dev_err(dev, "ADQ active, can't modify Tx scheduler tree\n"); + return false; + } + + if (ice_is_dcb_active(pf)) { + dev_err(dev, "DCB active, can't modify Tx scheduler tree\n"); + return false; + } + + pi->is_custom_tx_enabled = true; + + return true; +} + +/** + * ice_traverse_tx_tree - traverse Tx scheduler tree + * @devlink: devlink struct + * @node: current node, used for recursion + * @tc_node: tc_node struct, that is treated as a root + * @pf: pf struct + * + * This function traverses Tx scheduler tree and exports + * entire structure to the devlink-rate. + */ +static void ice_traverse_tx_tree(struct devlink *devlink, struct ice_sched_node *node, + struct ice_sched_node *tc_node, struct ice_pf *pf) +{ + struct devlink_rate *rate_node = NULL; + struct ice_vf *vf; + int i; + + if (node->parent == tc_node) { + /* create root node */ + rate_node = devl_rate_node_create(devlink, node, node->name, NULL); + } else if (node->vsi_handle && + pf->vsi[node->vsi_handle]->vf) { + vf = pf->vsi[node->vsi_handle]->vf; + if (!vf->devlink_port.devlink_rate) + /* leaf nodes doesn't have children + * so we don't set rate_node + */ + devl_rate_leaf_create(&vf->devlink_port, node, + node->parent->rate_node); + } else if (node->info.data.elem_type != ICE_AQC_ELEM_TYPE_LEAF && + node->parent->rate_node) { + rate_node = devl_rate_node_create(devlink, node, node->name, + node->parent->rate_node); + } + + if (rate_node && !IS_ERR(rate_node)) + node->rate_node = rate_node; + + for (i = 0; i < node->num_children; i++) + ice_traverse_tx_tree(devlink, node->children[i], tc_node, pf); +} + +/** + * ice_devlink_rate_init_tx_topology - export Tx scheduler tree to devlink rate + * @devlink: devlink struct + * @vsi: main vsi struct + * + * This function finds a root node, then calls ice_traverse_tx tree, which + * traverses the tree and exports it's contents to devlink rate. + */ +int ice_devlink_rate_init_tx_topology(struct devlink *devlink, struct ice_vsi *vsi) +{ + struct ice_port_info *pi = vsi->port_info; + struct ice_sched_node *tc_node; + struct ice_pf *pf = vsi->back; + int i; + + tc_node = pi->root->children[0]; + mutex_lock(&pi->sched_lock); + devl_lock(devlink); + for (i = 0; i < tc_node->num_children; i++) + ice_traverse_tx_tree(devlink, tc_node->children[i], tc_node, pf); + devl_unlock(devlink); + mutex_unlock(&pi->sched_lock); + + return 0; +} + +/** + * ice_set_object_tx_share - sets node scheduling parameter + * @pi: devlink struct instance + * @node: node struct instance + * @bw: bandwidth in bytes per second + * @extack: extended netdev ack structure + * + * This function sets ICE_MIN_BW scheduling BW limit. + */ +static int ice_set_object_tx_share(struct ice_port_info *pi, struct ice_sched_node *node, + u64 bw, struct netlink_ext_ack *extack) +{ + int status; + + mutex_lock(&pi->sched_lock); + /* converts bytes per second to kilo bits per second */ + node->tx_share = div_u64(bw, 125); + status = ice_sched_set_node_bw_lmt(pi, node, ICE_MIN_BW, node->tx_share); + mutex_unlock(&pi->sched_lock); + + if (status) + NL_SET_ERR_MSG_MOD(extack, "Can't set scheduling node tx_share"); + + return status; +} + +/** + * ice_set_object_tx_max - sets node scheduling parameter + * @pi: devlink struct instance + * @node: node struct instance + * @bw: bandwidth in bytes per second + * @extack: extended netdev ack structure + * + * This function sets ICE_MAX_BW scheduling BW limit. + */ +static int ice_set_object_tx_max(struct ice_port_info *pi, struct ice_sched_node *node, + u64 bw, struct netlink_ext_ack *extack) +{ + int status; + + mutex_lock(&pi->sched_lock); + /* converts bytes per second value to kilo bits per second */ + node->tx_max = div_u64(bw, 125); + status = ice_sched_set_node_bw_lmt(pi, node, ICE_MAX_BW, node->tx_max); + mutex_unlock(&pi->sched_lock); + + if (status) + NL_SET_ERR_MSG_MOD(extack, "Can't set scheduling node tx_max"); + + return status; +} + +/** + * ice_set_object_tx_priority - sets node scheduling parameter + * @pi: devlink struct instance + * @node: node struct instance + * @priority: value representing priority for strict priority arbitration + * @extack: extended netdev ack structure + * + * This function sets priority of node among siblings. + */ +static int ice_set_object_tx_priority(struct ice_port_info *pi, struct ice_sched_node *node, + u32 priority, struct netlink_ext_ack *extack) +{ + int status; + + if (node->tx_priority >= 8) { + NL_SET_ERR_MSG_MOD(extack, "Priority should be less than 8"); + return -EINVAL; + } + + mutex_lock(&pi->sched_lock); + node->tx_priority = priority; + status = ice_sched_set_node_priority(pi, node, node->tx_priority); + mutex_unlock(&pi->sched_lock); + + if (status) + NL_SET_ERR_MSG_MOD(extack, "Can't set scheduling node tx_priority"); + + return status; +} + +/** + * ice_set_object_tx_weight - sets node scheduling parameter + * @pi: devlink struct instance + * @node: node struct instance + * @weight: value represeting relative weight for WFQ arbitration + * @extack: extended netdev ack structure + * + * This function sets node weight for WFQ algorithm. + */ +static int ice_set_object_tx_weight(struct ice_port_info *pi, struct ice_sched_node *node, + u32 weight, struct netlink_ext_ack *extack) +{ + int status; + + if (node->tx_weight > 200 || node->tx_weight < 1) { + NL_SET_ERR_MSG_MOD(extack, "Weight must be between 1 and 200"); + return -EINVAL; + } + + mutex_lock(&pi->sched_lock); + node->tx_weight = weight; + status = ice_sched_set_node_weight(pi, node, node->tx_weight); + mutex_unlock(&pi->sched_lock); + + if (status) + NL_SET_ERR_MSG_MOD(extack, "Can't set scheduling node tx_weight"); + + return status; +} + +/** + * ice_get_pi_from_dev_rate - get port info from devlink_rate + * @rate_node: devlink struct instance + * + * This function returns corresponding port_info struct of devlink_rate + */ +static struct ice_port_info *ice_get_pi_from_dev_rate(struct devlink_rate *rate_node) +{ + struct ice_pf *pf = devlink_priv(rate_node->devlink); + + return ice_get_main_vsi(pf)->port_info; +} + +static int ice_devlink_rate_node_new(struct devlink_rate *rate_node, void **priv, + struct netlink_ext_ack *extack) +{ + struct ice_sched_node *node; + struct ice_port_info *pi; + + pi = ice_get_pi_from_dev_rate(rate_node); + + if (!ice_enable_custom_tx(devlink_priv(rate_node->devlink))) + return -EBUSY; + + /* preallocate memory for ice_sched_node */ + node = devm_kzalloc(ice_hw_to_dev(pi->hw), sizeof(*node), GFP_KERNEL); + *priv = node; + + return 0; +} + +static int ice_devlink_rate_node_del(struct devlink_rate *rate_node, void *priv, + struct netlink_ext_ack *extack) +{ + struct ice_sched_node *node, *tc_node; + struct ice_port_info *pi; + + pi = ice_get_pi_from_dev_rate(rate_node); + tc_node = pi->root->children[0]; + node = priv; + + if (!rate_node->parent || !node || tc_node == node || !extack) + return 0; + + if (!ice_enable_custom_tx(devlink_priv(rate_node->devlink))) + return -EBUSY; + + /* can't allow to delete a node with children */ + if (node->num_children) + return -EINVAL; + + mutex_lock(&pi->sched_lock); + ice_free_sched_node(pi, node); + mutex_unlock(&pi->sched_lock); + + return 0; +} + +static int ice_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void *priv, + u64 tx_max, struct netlink_ext_ack *extack) +{ + struct ice_sched_node *node = priv; + + if (!ice_enable_custom_tx(devlink_priv(rate_leaf->devlink))) + return -EBUSY; + + if (!node) + return 0; + + return ice_set_object_tx_max(ice_get_pi_from_dev_rate(rate_leaf), + node, tx_max, extack); +} + +static int ice_devlink_rate_leaf_tx_share_set(struct devlink_rate *rate_leaf, void *priv, + u64 tx_share, struct netlink_ext_ack *extack) +{ + struct ice_sched_node *node = priv; + + if (!ice_enable_custom_tx(devlink_priv(rate_leaf->devlink))) + return -EBUSY; + + if (!node) + return 0; + + return ice_set_object_tx_share(ice_get_pi_from_dev_rate(rate_leaf), node, + tx_share, extack); +} + +static int ice_devlink_rate_leaf_tx_priority_set(struct devlink_rate *rate_leaf, void *priv, + u32 tx_priority, struct netlink_ext_ack *extack) +{ + struct ice_sched_node *node = priv; + + if (!ice_enable_custom_tx(devlink_priv(rate_leaf->devlink))) + return -EBUSY; + + if (!node) + return 0; + + return ice_set_object_tx_priority(ice_get_pi_from_dev_rate(rate_leaf), node, + tx_priority, extack); +} + +static int ice_devlink_rate_leaf_tx_weight_set(struct devlink_rate *rate_leaf, void *priv, + u32 tx_weight, struct netlink_ext_ack *extack) +{ + struct ice_sched_node *node = priv; + + if (!ice_enable_custom_tx(devlink_priv(rate_leaf->devlink))) + return -EBUSY; + + if (!node) + return 0; + + return ice_set_object_tx_weight(ice_get_pi_from_dev_rate(rate_leaf), node, + tx_weight, extack); +} + +static int ice_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node, void *priv, + u64 tx_max, struct netlink_ext_ack *extack) +{ + struct ice_sched_node *node = priv; + + if (!ice_enable_custom_tx(devlink_priv(rate_node->devlink))) + return -EBUSY; + + if (!node) + return 0; + + return ice_set_object_tx_max(ice_get_pi_from_dev_rate(rate_node), + node, tx_max, extack); +} + +static int ice_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node, void *priv, + u64 tx_share, struct netlink_ext_ack *extack) +{ + struct ice_sched_node *node = priv; + + if (!ice_enable_custom_tx(devlink_priv(rate_node->devlink))) + return -EBUSY; + + if (!node) + return 0; + + return ice_set_object_tx_share(ice_get_pi_from_dev_rate(rate_node), + node, tx_share, extack); +} + +static int ice_devlink_rate_node_tx_priority_set(struct devlink_rate *rate_node, void *priv, + u32 tx_priority, struct netlink_ext_ack *extack) +{ + struct ice_sched_node *node = priv; + + if (!ice_enable_custom_tx(devlink_priv(rate_node->devlink))) + return -EBUSY; + + if (!node) + return 0; + + return ice_set_object_tx_priority(ice_get_pi_from_dev_rate(rate_node), + node, tx_priority, extack); +} + +static int ice_devlink_rate_node_tx_weight_set(struct devlink_rate *rate_node, void *priv, + u32 tx_weight, struct netlink_ext_ack *extack) +{ + struct ice_sched_node *node = priv; + + if (!ice_enable_custom_tx(devlink_priv(rate_node->devlink))) + return -EBUSY; + + if (!node) + return 0; + + return ice_set_object_tx_weight(ice_get_pi_from_dev_rate(rate_node), + node, tx_weight, extack); +} + +static int ice_devlink_set_parent(struct devlink_rate *devlink_rate, + struct devlink_rate *parent, + void *priv, void *parent_priv, + struct netlink_ext_ack *extack) +{ + struct ice_port_info *pi = ice_get_pi_from_dev_rate(devlink_rate); + struct ice_sched_node *tc_node, *node, *parent_node; + u16 num_nodes_added; + u32 first_node_teid; + u32 node_teid; + int status; + + tc_node = pi->root->children[0]; + node = priv; + + if (!extack) + return 0; + + if (!ice_enable_custom_tx(devlink_priv(devlink_rate->devlink))) + return -EBUSY; + + if (!parent) { + if (!node || tc_node == node || node->num_children) + return -EINVAL; + + mutex_lock(&pi->sched_lock); + ice_free_sched_node(pi, node); + mutex_unlock(&pi->sched_lock); + + return 0; + } + + parent_node = parent_priv; + + /* if the node doesn't exist, create it */ + if (!node->parent) { + mutex_lock(&pi->sched_lock); + status = ice_sched_add_elems(pi, tc_node, parent_node, + parent_node->tx_sched_layer + 1, + 1, &num_nodes_added, &first_node_teid, + &node); + mutex_unlock(&pi->sched_lock); + + if (status) { + NL_SET_ERR_MSG_MOD(extack, "Can't add a new node"); + return status; + } + + if (devlink_rate->tx_share) + ice_set_object_tx_share(pi, node, devlink_rate->tx_share, extack); + if (devlink_rate->tx_max) + ice_set_object_tx_max(pi, node, devlink_rate->tx_max, extack); + if (devlink_rate->tx_priority) + ice_set_object_tx_priority(pi, node, devlink_rate->tx_priority, extack); + if (devlink_rate->tx_weight) + ice_set_object_tx_weight(pi, node, devlink_rate->tx_weight, extack); + } else { + node_teid = le32_to_cpu(node->info.node_teid); + mutex_lock(&pi->sched_lock); + status = ice_sched_move_nodes(pi, parent_node, 1, &node_teid); + mutex_unlock(&pi->sched_lock); + + if (status) + NL_SET_ERR_MSG_MOD(extack, "Can't move existing node to a new parent"); + } + + return status; +} + static const struct devlink_ops ice_devlink_ops = { .supported_flash_update_params = DEVLINK_SUPPORT_FLASH_UPDATE_OVERWRITE_MASK, .reload_actions = BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE), @@ -725,6 +1210,22 @@ static const struct devlink_ops ice_devlink_ops = { .eswitch_mode_set = ice_eswitch_mode_set, .info_get = ice_devlink_info_get, .flash_update = ice_devlink_flash_update, + + .rate_node_new = ice_devlink_rate_node_new, + .rate_node_del = ice_devlink_rate_node_del, + + .rate_leaf_tx_max_set = ice_devlink_rate_leaf_tx_max_set, + .rate_leaf_tx_share_set = ice_devlink_rate_leaf_tx_share_set, + .rate_leaf_tx_priority_set = ice_devlink_rate_leaf_tx_priority_set, + .rate_leaf_tx_weight_set = ice_devlink_rate_leaf_tx_weight_set, + + .rate_node_tx_max_set = ice_devlink_rate_node_tx_max_set, + .rate_node_tx_share_set = ice_devlink_rate_node_tx_share_set, + .rate_node_tx_priority_set = ice_devlink_rate_node_tx_priority_set, + .rate_node_tx_weight_set = ice_devlink_rate_node_tx_weight_set, + + .rate_leaf_parent_set = ice_devlink_set_parent, + .rate_node_parent_set = ice_devlink_set_parent, }; static int @@ -1089,6 +1590,7 @@ int ice_devlink_create_vf_port(struct ice_vf *vf) */ void ice_devlink_destroy_vf_port(struct ice_vf *vf) { + devl_rate_leaf_destroy(&vf->devlink_port); devlink_port_unregister(&vf->devlink_port); } diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.h b/drivers/net/ethernet/intel/ice/ice_devlink.h index fe006d9946f8..6ec96779f52e 100644 --- a/drivers/net/ethernet/intel/ice/ice_devlink.h +++ b/drivers/net/ethernet/intel/ice/ice_devlink.h @@ -18,4 +18,7 @@ void ice_devlink_destroy_vf_port(struct ice_vf *vf); void ice_devlink_init_regions(struct ice_pf *pf); void ice_devlink_destroy_regions(struct ice_pf *pf); +int ice_devlink_rate_init_tx_topology(struct devlink *devlink, struct ice_vsi *vsi); +void ice_tear_down_devlink_rate_tree(struct ice_pf *pf); + #endif /* _ICE_DEVLINK_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index a9fc89aebebe..d6f460ff1b72 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -8580,6 +8580,12 @@ static int ice_setup_tc_mqprio_qdisc(struct net_device *netdev, void *type_data) switch (mode) { case TC_MQPRIO_MODE_CHANNEL: + if (pf->hw.port_info->is_custom_tx_enabled) { + dev_err(dev, "Custom Tx scheduler feature enabled, can't configure ADQ\n"); + return -EBUSY; + } + ice_tear_down_devlink_rate_tree(pf); + ret = ice_validate_mqprio_qopt(vsi, mqprio_qopt); if (ret) { netdev_err(netdev, "failed to validate_mqprio_qopt(), ret %d\n", diff --git a/drivers/net/ethernet/intel/ice/ice_repr.c b/drivers/net/ethernet/intel/ice/ice_repr.c index 0483eb14c288..109761c8c858 100644 --- a/drivers/net/ethernet/intel/ice/ice_repr.c +++ b/drivers/net/ethernet/intel/ice/ice_repr.c @@ -6,6 +6,7 @@ #include "ice_devlink.h" #include "ice_sriov.h" #include "ice_tc_lib.h" +#include "ice_dcb_lib.h" /** * ice_repr_get_sw_port_id - get port ID associated with representor @@ -389,6 +390,7 @@ static void ice_repr_rem(struct ice_vf *vf) */ void ice_repr_rem_from_all_vfs(struct ice_pf *pf) { + struct devlink *devlink; struct ice_vf *vf; unsigned int bkt; @@ -396,6 +398,14 @@ void ice_repr_rem_from_all_vfs(struct ice_pf *pf) ice_for_each_vf(pf, bkt, vf) ice_repr_rem(vf); + + /* since all port representors are destroyed, there is + * no point in keeping the nodes + */ + devlink = priv_to_devlink(pf); + devl_lock(devlink); + devl_rate_nodes_destroy(devlink); + devl_unlock(devlink); } /** @@ -404,6 +414,7 @@ void ice_repr_rem_from_all_vfs(struct ice_pf *pf) */ int ice_repr_add_for_all_vfs(struct ice_pf *pf) { + struct devlink *devlink; struct ice_vf *vf; unsigned int bkt; int err; @@ -416,6 +427,13 @@ int ice_repr_add_for_all_vfs(struct ice_pf *pf) goto err; } + /* only export if ADQ and DCB disabled */ + if (ice_is_adq_active(pf) || ice_is_dcb_active(pf)) + return 0; + + devlink = priv_to_devlink(pf); + ice_devlink_rate_init_tx_topology(devlink, ice_get_main_vsi(pf)); + return 0; err: diff --git a/drivers/net/ethernet/intel/ice/ice_sched.c b/drivers/net/ethernet/intel/ice/ice_sched.c index 118595763bba..6d08b397df2a 100644 --- a/drivers/net/ethernet/intel/ice/ice_sched.c +++ b/drivers/net/ethernet/intel/ice/ice_sched.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2018, Intel Corporation. */ +#include #include "ice_sched.h" /** @@ -142,12 +143,14 @@ ice_aq_query_sched_elems(struct ice_hw *hw, u16 elems_req, * @pi: port information structure * @layer: Scheduler layer of the node * @info: Scheduler element information from firmware + * @prealloc_node: preallocated ice_sched_node struct for SW DB * * This function inserts a scheduler node to the SW DB. */ int ice_sched_add_node(struct ice_port_info *pi, u8 layer, - struct ice_aqc_txsched_elem_data *info) + struct ice_aqc_txsched_elem_data *info, + struct ice_sched_node *prealloc_node) { struct ice_aqc_txsched_elem_data elem; struct ice_sched_node *parent; @@ -176,7 +179,10 @@ ice_sched_add_node(struct ice_port_info *pi, u8 layer, if (status) return status; - node = devm_kzalloc(ice_hw_to_dev(hw), sizeof(*node), GFP_KERNEL); + if (prealloc_node) + node = prealloc_node; + else + node = devm_kzalloc(ice_hw_to_dev(hw), sizeof(*node), GFP_KERNEL); if (!node) return -ENOMEM; if (hw->max_children[layer]) { @@ -355,6 +361,9 @@ void ice_free_sched_node(struct ice_port_info *pi, struct ice_sched_node *node) /* leaf nodes have no children */ if (node->children) devm_kfree(ice_hw_to_dev(hw), node->children); + + kfree(node->name); + xa_erase(&pi->sched_node_ids, node->id); devm_kfree(ice_hw_to_dev(hw), node); } @@ -872,13 +881,15 @@ void ice_sched_cleanup_all(struct ice_hw *hw) * @num_nodes: number of nodes * @num_nodes_added: pointer to num nodes added * @first_node_teid: if new nodes are added then return the TEID of first node + * @prealloc_nodes: preallocated nodes struct for software DB * * This function add nodes to HW as well as to SW DB for a given layer */ -static int +int ice_sched_add_elems(struct ice_port_info *pi, struct ice_sched_node *tc_node, struct ice_sched_node *parent, u8 layer, u16 num_nodes, - u16 *num_nodes_added, u32 *first_node_teid) + u16 *num_nodes_added, u32 *first_node_teid, + struct ice_sched_node **prealloc_nodes) { struct ice_sched_node *prev, *new_node; struct ice_aqc_add_elem *buf; @@ -924,7 +935,11 @@ ice_sched_add_elems(struct ice_port_info *pi, struct ice_sched_node *tc_node, *num_nodes_added = num_nodes; /* add nodes to the SW DB */ for (i = 0; i < num_nodes; i++) { - status = ice_sched_add_node(pi, layer, &buf->generic[i]); + if (prealloc_nodes) + status = ice_sched_add_node(pi, layer, &buf->generic[i], prealloc_nodes[i]); + else + status = ice_sched_add_node(pi, layer, &buf->generic[i], NULL); + if (status) { ice_debug(hw, ICE_DBG_SCHED, "add nodes in SW DB failed status =%d\n", status); @@ -940,6 +955,22 @@ ice_sched_add_elems(struct ice_port_info *pi, struct ice_sched_node *tc_node, new_node->sibling = NULL; new_node->tc_num = tc_node->tc_num; + new_node->tx_weight = ICE_SCHED_DFLT_BW_WT; + new_node->tx_share = ICE_SCHED_DFLT_BW; + new_node->tx_max = ICE_SCHED_DFLT_BW; + new_node->name = kzalloc(SCHED_NODE_NAME_MAX_LEN, GFP_KERNEL); + if (!new_node->name) + return -ENOMEM; + + status = xa_alloc(&pi->sched_node_ids, &new_node->id, NULL, XA_LIMIT(0, UINT_MAX), + GFP_KERNEL); + if (status) { + ice_debug(hw, ICE_DBG_SCHED, "xa_alloc failed for sched node status =%d\n", + status); + break; + } + + snprintf(new_node->name, SCHED_NODE_NAME_MAX_LEN, "node_%u", new_node->id); /* add it to previous node sibling pointer */ /* Note: siblings are not linked across branches */ @@ -1003,7 +1034,7 @@ ice_sched_add_nodes_to_hw_layer(struct ice_port_info *pi, } return ice_sched_add_elems(pi, tc_node, parent, layer, num_nodes, - num_nodes_added, first_node_teid); + num_nodes_added, first_node_teid, NULL); } /** @@ -1268,7 +1299,7 @@ int ice_sched_init_port(struct ice_port_info *pi) ICE_AQC_ELEM_TYPE_ENTRY_POINT) hw->sw_entry_point_layer = j; - status = ice_sched_add_node(pi, j, &buf[i].generic[j]); + status = ice_sched_add_node(pi, j, &buf[i].generic[j], NULL); if (status) goto err_init_port; } @@ -2154,7 +2185,7 @@ ice_sched_get_free_vsi_parent(struct ice_hw *hw, struct ice_sched_node *node, * This function removes the child from the old parent and adds it to a new * parent */ -static void +void ice_sched_update_parent(struct ice_sched_node *new_parent, struct ice_sched_node *node) { @@ -2188,7 +2219,7 @@ ice_sched_update_parent(struct ice_sched_node *new_parent, * * This function move the child nodes to a given parent. */ -static int +int ice_sched_move_nodes(struct ice_port_info *pi, struct ice_sched_node *parent, u16 num_items, u32 *list) { @@ -3560,7 +3591,7 @@ ice_sched_set_eir_srl_excl(struct ice_port_info *pi, * node's RL profile ID of type CIR, EIR, or SRL, and removes old profile * ID from local database. The caller needs to hold scheduler lock. */ -static int +int ice_sched_set_node_bw(struct ice_port_info *pi, struct ice_sched_node *node, enum ice_rl_type rl_type, u32 bw, u8 layer_num) { @@ -3596,6 +3627,57 @@ ice_sched_set_node_bw(struct ice_port_info *pi, struct ice_sched_node *node, ICE_AQC_RL_PROFILE_TYPE_M, old_id); } +/** + * ice_sched_set_node_priority - set node's priority + * @pi: port information structure + * @node: tree node + * @priority: number 0-7 representing priority among siblings + * + * This function sets priority of a node among it's siblings. + */ +int +ice_sched_set_node_priority(struct ice_port_info *pi, struct ice_sched_node *node, + u16 priority) +{ + struct ice_aqc_txsched_elem_data buf; + struct ice_aqc_txsched_elem *data; + + buf = node->info; + data = &buf.data; + + data->valid_sections |= ICE_AQC_ELEM_VALID_GENERIC; + data->generic |= FIELD_PREP(ICE_AQC_ELEM_GENERIC_PRIO_M, priority); + + return ice_sched_update_elem(pi->hw, node, &buf); +} + +/** + * ice_sched_set_node_weight - set node's weight + * @pi: port information structure + * @node: tree node + * @weight: number 1-200 representing weight for WFQ + * + * This function sets weight of the node for WFQ algorithm. + */ +int +ice_sched_set_node_weight(struct ice_port_info *pi, struct ice_sched_node *node, u16 weight) +{ + struct ice_aqc_txsched_elem_data buf; + struct ice_aqc_txsched_elem *data; + + buf = node->info; + data = &buf.data; + + data->valid_sections = ICE_AQC_ELEM_VALID_CIR | ICE_AQC_ELEM_VALID_EIR | + ICE_AQC_ELEM_VALID_GENERIC; + data->cir_bw.bw_alloc = cpu_to_le16(weight); + data->eir_bw.bw_alloc = cpu_to_le16(weight); + + data->generic |= FIELD_PREP(ICE_AQC_ELEM_GENERIC_SP_M, 0x0); + + return ice_sched_update_elem(pi->hw, node, &buf); +} + /** * ice_sched_set_node_bw_lmt - set node's BW limit * @pi: port information structure @@ -3606,7 +3688,7 @@ ice_sched_set_node_bw(struct ice_port_info *pi, struct ice_sched_node *node, * It updates node's BW limit parameters like BW RL profile ID of type CIR, * EIR, or SRL. The caller needs to hold scheduler lock. */ -static int +int ice_sched_set_node_bw_lmt(struct ice_port_info *pi, struct ice_sched_node *node, enum ice_rl_type rl_type, u32 bw) { diff --git a/drivers/net/ethernet/intel/ice/ice_sched.h b/drivers/net/ethernet/intel/ice/ice_sched.h index 4f91577fed56..9c100747445a 100644 --- a/drivers/net/ethernet/intel/ice/ice_sched.h +++ b/drivers/net/ethernet/intel/ice/ice_sched.h @@ -6,6 +6,8 @@ #include "ice_common.h" +#define SCHED_NODE_NAME_MAX_LEN 32 + #define ICE_QGRP_LAYER_OFFSET 2 #define ICE_VSI_LAYER_OFFSET 4 #define ICE_AGG_LAYER_OFFSET 6 @@ -69,6 +71,29 @@ int ice_aq_query_sched_elems(struct ice_hw *hw, u16 elems_req, struct ice_aqc_txsched_elem_data *buf, u16 buf_size, u16 *elems_ret, struct ice_sq_cd *cd); + +int +ice_sched_set_node_bw_lmt(struct ice_port_info *pi, struct ice_sched_node *node, + enum ice_rl_type rl_type, u32 bw); + +int +ice_sched_set_node_bw(struct ice_port_info *pi, struct ice_sched_node *node, + enum ice_rl_type rl_type, u32 bw, u8 layer_num); + +int +ice_sched_add_elems(struct ice_port_info *pi, struct ice_sched_node *tc_node, + struct ice_sched_node *parent, u8 layer, u16 num_nodes, + u16 *num_nodes_added, u32 *first_node_teid, + struct ice_sched_node **prealloc_node); + +int +ice_sched_move_nodes(struct ice_port_info *pi, struct ice_sched_node *parent, + u16 num_items, u32 *list); + +int ice_sched_set_node_priority(struct ice_port_info *pi, struct ice_sched_node *node, + u16 priority); +int ice_sched_set_node_weight(struct ice_port_info *pi, struct ice_sched_node *node, u16 weight); + int ice_sched_init_port(struct ice_port_info *pi); int ice_sched_query_res_alloc(struct ice_hw *hw); void ice_sched_get_psm_clk_freq(struct ice_hw *hw); @@ -81,7 +106,11 @@ struct ice_sched_node * ice_sched_find_node_by_teid(struct ice_sched_node *start_node, u32 teid); int ice_sched_add_node(struct ice_port_info *pi, u8 layer, - struct ice_aqc_txsched_elem_data *info); + struct ice_aqc_txsched_elem_data *info, + struct ice_sched_node *prealloc_node); +void +ice_sched_update_parent(struct ice_sched_node *new_parent, + struct ice_sched_node *node); void ice_free_sched_node(struct ice_port_info *pi, struct ice_sched_node *node); struct ice_sched_node *ice_sched_get_tc_node(struct ice_port_info *pi, u8 tc); struct ice_sched_node * diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h index e1abfcee96dc..e3f622cad425 100644 --- a/drivers/net/ethernet/intel/ice/ice_type.h +++ b/drivers/net/ethernet/intel/ice/ice_type.h @@ -524,7 +524,14 @@ struct ice_sched_node { struct ice_sched_node *sibling; /* next sibling in the same layer */ struct ice_sched_node **children; struct ice_aqc_txsched_elem_data info; + char *name; + struct devlink_rate *rate_node; + u64 tx_max; + u64 tx_share; u32 agg_id; /* aggregator group ID */ + u32 id; + u32 tx_priority; + u32 tx_weight; u16 vsi_handle; u8 in_use; /* suspended or in use */ u8 tx_sched_layer; /* Logical Layer (1-9) */ @@ -706,7 +713,9 @@ struct ice_port_info { /* List contain profile ID(s) and other params per layer */ struct list_head rl_prof_list[ICE_AQC_TOPO_MAX_LEVEL_NUM]; struct ice_qos_cfg qos_cfg; + struct xarray sched_node_ids; u8 is_vf:1; + u8 is_custom_tx_enabled:1; }; struct ice_switch_info { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c index 9bc7be95db54..084a910bb4e7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c @@ -91,7 +91,7 @@ int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_ if (err) goto reg_err; - err = devl_rate_leaf_create(dl_port, vport); + err = devl_rate_leaf_create(dl_port, vport, NULL); if (err) goto rate_err; @@ -160,7 +160,7 @@ int mlx5_esw_devlink_sf_port_register(struct mlx5_eswitch *esw, struct devlink_p if (err) return err; - err = devl_rate_leaf_create(dl_port, vport); + err = devl_rate_leaf_create(dl_port, vport, NULL); if (err) goto rate_err; diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c index 705872eb7564..e14686594a71 100644 --- a/drivers/net/netdevsim/dev.c +++ b/drivers/net/netdevsim/dev.c @@ -1401,7 +1401,7 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev, enum nsim_dev_port_typ if (nsim_dev_port_is_vf(nsim_dev_port)) { err = devl_rate_leaf_create(&nsim_dev_port->devlink_port, - nsim_dev_port); + nsim_dev_port, NULL); if (err) goto err_nsim_destroy; } diff --git a/include/net/devlink.h b/include/net/devlink.h index 611a23a3deb2..074a79b8933f 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -114,6 +114,9 @@ struct devlink_rate { refcount_t refcnt; }; }; + + u32 tx_priority; + u32 tx_weight; }; struct devlink_port { @@ -1511,10 +1514,18 @@ struct devlink_ops { u64 tx_share, struct netlink_ext_ack *extack); int (*rate_leaf_tx_max_set)(struct devlink_rate *devlink_rate, void *priv, u64 tx_max, struct netlink_ext_ack *extack); + int (*rate_leaf_tx_priority_set)(struct devlink_rate *devlink_rate, void *priv, + u32 tx_priority, struct netlink_ext_ack *extack); + int (*rate_leaf_tx_weight_set)(struct devlink_rate *devlink_rate, void *priv, + u32 tx_weight, struct netlink_ext_ack *extack); int (*rate_node_tx_share_set)(struct devlink_rate *devlink_rate, void *priv, u64 tx_share, struct netlink_ext_ack *extack); int (*rate_node_tx_max_set)(struct devlink_rate *devlink_rate, void *priv, u64 tx_max, struct netlink_ext_ack *extack); + int (*rate_node_tx_priority_set)(struct devlink_rate *devlink_rate, void *priv, + u32 tx_priority, struct netlink_ext_ack *extack); + int (*rate_node_tx_weight_set)(struct devlink_rate *devlink_rate, void *priv, + u32 tx_weight, struct netlink_ext_ack *extack); int (*rate_node_new)(struct devlink_rate *rate_node, void **priv, struct netlink_ext_ack *extack); int (*rate_node_del)(struct devlink_rate *rate_node, void *priv, @@ -1606,7 +1617,12 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 controller, u16 pf, u32 sf, bool external); -int devl_rate_leaf_create(struct devlink_port *port, void *priv); +struct devlink_rate * +devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name, + struct devlink_rate *parent); +int +devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv, + struct devlink_rate *parent); void devl_rate_leaf_destroy(struct devlink_port *devlink_port); void devl_rate_nodes_destroy(struct devlink *devlink); void devlink_port_linecard_set(struct devlink_port *devlink_port, diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 2f24b53a87a5..498d0d5d0957 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -607,6 +607,9 @@ enum devlink_attr { DEVLINK_ATTR_SELFTESTS, /* nested */ + DEVLINK_ATTR_RATE_TX_PRIORITY, /* u32 */ + DEVLINK_ATTR_RATE_TX_WEIGHT, /* u32 */ + /* add new attributes above here, update the policy in devlink.c */ __DEVLINK_ATTR_MAX, diff --git a/net/core/devlink.c b/net/core/devlink.c index 7f789bbcbbd7..d93bc95cd7cb 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -1203,6 +1203,14 @@ static int devlink_nl_rate_fill(struct sk_buff *msg, devlink_rate->tx_max, DEVLINK_ATTR_PAD)) goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_RATE_TX_PRIORITY, + devlink_rate->tx_priority)) + goto nla_put_failure; + + if (nla_put_u32(msg, DEVLINK_ATTR_RATE_TX_WEIGHT, + devlink_rate->tx_weight)) + goto nla_put_failure; + if (devlink_rate->parent) if (nla_put_string(msg, DEVLINK_ATTR_RATE_PARENT_NODE_NAME, devlink_rate->parent->name)) @@ -1879,10 +1887,8 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate, int err = -EOPNOTSUPP; parent = devlink_rate->parent; - if (parent && len) { - NL_SET_ERR_MSG_MOD(info->extack, "Rate object already has parent."); - return -EBUSY; - } else if (parent && !len) { + + if (parent && !len) { if (devlink_rate_is_leaf(devlink_rate)) err = ops->rate_leaf_parent_set(devlink_rate, NULL, devlink_rate->priv, NULL, @@ -1896,7 +1902,7 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate, refcount_dec(&parent->refcnt); devlink_rate->parent = NULL; - } else if (!parent && len) { + } else if (len) { parent = devlink_rate_node_get_by_name(devlink, parent_name); if (IS_ERR(parent)) return -ENODEV; @@ -1923,6 +1929,10 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate, if (err) return err; + if (devlink_rate->parent) + /* we're reassigning to other parent in this case */ + refcount_dec(&devlink_rate->parent->refcnt); + refcount_inc(&parent->refcnt); devlink_rate->parent = parent; } @@ -1936,6 +1946,8 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate, { struct nlattr *nla_parent, **attrs = info->attrs; int err = -EOPNOTSUPP; + u32 priority; + u32 weight; u64 rate; if (attrs[DEVLINK_ATTR_RATE_TX_SHARE]) { @@ -1964,6 +1976,34 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate, devlink_rate->tx_max = rate; } + if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY]) { + priority = nla_get_u32(attrs[DEVLINK_ATTR_RATE_TX_PRIORITY]); + if (devlink_rate_is_leaf(devlink_rate)) + err = ops->rate_leaf_tx_priority_set(devlink_rate, devlink_rate->priv, + priority, info->extack); + else if (devlink_rate_is_node(devlink_rate)) + err = ops->rate_node_tx_priority_set(devlink_rate, devlink_rate->priv, + priority, info->extack); + + if (err) + return err; + devlink_rate->tx_priority = priority; + } + + if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT]) { + weight = nla_get_u32(attrs[DEVLINK_ATTR_RATE_TX_WEIGHT]); + if (devlink_rate_is_leaf(devlink_rate)) + err = ops->rate_leaf_tx_weight_set(devlink_rate, devlink_rate->priv, + weight, info->extack); + else if (devlink_rate_is_node(devlink_rate)) + err = ops->rate_node_tx_weight_set(devlink_rate, devlink_rate->priv, + weight, info->extack); + + if (err) + return err; + devlink_rate->tx_weight = weight; + } + nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME]; if (nla_parent) { err = devlink_nl_rate_parent_node_set(devlink_rate, info, @@ -1995,6 +2035,18 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops, NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the leafs"); return false; } + if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_leaf_tx_priority_set) { + NL_SET_ERR_MSG_ATTR(info->extack, + attrs[DEVLINK_ATTR_RATE_TX_PRIORITY], + "TX priority set isn't supported for the leafs"); + return false; + } + if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT] && !ops->rate_leaf_tx_weight_set) { + NL_SET_ERR_MSG_ATTR(info->extack, + attrs[DEVLINK_ATTR_RATE_TX_WEIGHT], + "TX weight set isn't supported for the leafs"); + return false; + } } else if (type == DEVLINK_RATE_TYPE_NODE) { if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_node_tx_share_set) { NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the nodes"); @@ -2009,6 +2061,18 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops, NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the nodes"); return false; } + if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_node_tx_priority_set) { + NL_SET_ERR_MSG_ATTR(info->extack, + attrs[DEVLINK_ATTR_RATE_TX_PRIORITY], + "TX priority set isn't supported for the nodes"); + return false; + } + if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT] && !ops->rate_node_tx_weight_set) { + NL_SET_ERR_MSG_ATTR(info->extack, + attrs[DEVLINK_ATTR_RATE_TX_WEIGHT], + "TX weight set isn't supported for the nodes"); + return false; + } } else { WARN(1, "Unknown type of rate object"); return false; @@ -9187,6 +9251,8 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = { [DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 }, [DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING }, [DEVLINK_ATTR_SELFTESTS] = { .type = NLA_NESTED }, + [DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U32 }, + [DEVLINK_ATTR_RATE_TX_WEIGHT] = { .type = NLA_U32 }, }; static const struct genl_small_ops devlink_nl_ops[] = { @@ -10320,14 +10386,61 @@ void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 contro } EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set); +/** + * devl_rate_node_create - create devlink rate node + * @devlink: devlink instance + * @priv: driver private data + * @node_name: name of the resulting node + * @parent: parent devlink_rate struct + * + * Create devlink rate object of type node + */ +struct devlink_rate * +devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name, + struct devlink_rate *parent) +{ + struct devlink_rate *rate_node; + + rate_node = devlink_rate_node_get_by_name(devlink, node_name); + if (!IS_ERR(rate_node)) + return ERR_PTR(-EEXIST); + + rate_node = kzalloc(sizeof(*rate_node), GFP_KERNEL); + if (!rate_node) + return ERR_PTR(-ENOMEM); + + if (parent) { + rate_node->parent = parent; + refcount_inc(&rate_node->parent->refcnt); + } + + rate_node->type = DEVLINK_RATE_TYPE_NODE; + rate_node->devlink = devlink; + rate_node->priv = priv; + + rate_node->name = kstrdup(node_name, GFP_KERNEL); + if (!rate_node->name) { + kfree(rate_node); + return ERR_PTR(-ENOMEM); + } + + refcount_set(&rate_node->refcnt, 1); + list_add(&rate_node->list, &devlink->rate_list); + devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW); + return rate_node; +} +EXPORT_SYMBOL_GPL(devl_rate_node_create); + /** * devl_rate_leaf_create - create devlink rate leaf * @devlink_port: devlink port object to create rate object on * @priv: driver private data + * @parent: parent devlink_rate struct * * Create devlink rate object of type leaf on provided @devlink_port. */ -int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv) +int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv, + struct devlink_rate *parent) { struct devlink *devlink = devlink_port->devlink; struct devlink_rate *devlink_rate; @@ -10341,6 +10454,11 @@ int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv) if (!devlink_rate) return -ENOMEM; + if (parent) { + devlink_rate->parent = parent; + refcount_inc(&devlink_rate->parent->refcnt); + } + devlink_rate->type = DEVLINK_RATE_TYPE_LEAF; devlink_rate->devlink = devlink; devlink_rate->devlink_port = devlink_port;