net: remove cpu stall in txq_trans_update()

txq_trans_update() currently uses txq->xmit_lock_owner
to conditionally update txq->trans_start.

For regular devices, txq->xmit_lock_owner is updated
from HARD_TX_LOCK() and HARD_TX_UNLOCK(), and this apparently
causes cpu stalls.

Using dev->lltx, which sits in a read-mostly cache-line,
and already used in HARD_TX_LOCK() and HARD_TX_UNLOCK()
helps cpu prediction.

On an AMD EPYC 7B12 dual socket server, tcp_rr with 128 threads
and 30,000 flows gets a 5 % increase in throughput.

As explained in commit 95ecba62e2 ("net: fix races in
netdev_tx_sent_queue()/dev_watchdog()") I am planning
to no longer update txq->trans_start in the fast path
in a followup patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250408202742.2145516-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Eric Dumazet
2025-04-08 20:27:42 +00:00
committed by Jakub Kicinski
parent 9c056ec6dd
commit 229671ac60
2 changed files with 5 additions and 4 deletions

View File

@@ -427,7 +427,7 @@ static void am65_cpsw_nuss_ndo_host_tx_timeout(struct net_device *ndev,
if (netif_tx_queue_stopped(netif_txq)) {
/* try recover if stopped by us */
txq_trans_update(netif_txq);
txq_trans_update(ndev, netif_txq);
netif_tx_wake_queue(netif_txq);
}
}

View File

@@ -4693,9 +4693,10 @@ static inline void __netif_tx_unlock_bh(struct netdev_queue *txq)
/*
* txq->trans_start can be read locklessly from dev_watchdog()
*/
static inline void txq_trans_update(struct netdev_queue *txq)
static inline void txq_trans_update(const struct net_device *dev,
struct netdev_queue *txq)
{
if (txq->xmit_lock_owner != -1)
if (!dev->lltx)
WRITE_ONCE(txq->trans_start, jiffies);
}
@@ -5214,7 +5215,7 @@ static inline netdev_tx_t netdev_start_xmit(struct sk_buff *skb, struct net_devi
rc = __netdev_start_xmit(ops, skb, dev, more);
if (rc == NETDEV_TX_OK)
txq_trans_update(txq);
txq_trans_update(dev, txq);
return rc;
}