55160 Commits

Author SHA1 Message Date
Linus Torvalds
7b8e9264f5 Merge tag 'net-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter and CAN.

  Current release - regressions:

   - netfilter: nf_conncount: fix leaked ct in error paths

   - sched: act_mirred: fix loop detection

   - sctp: fix potential deadlock in sctp_clone_sock()

   - can: fix build dependency

   - eth: mlx5e: do not update BQL of old txqs during channel
     reconfiguration

  Previous releases - regressions:

   - sched: ets: always remove class from active list before deleting it

   - inet: frags: flush pending skbs in fqdir_pre_exit()

   - netfilter: nf_nat: remove bogus direction check

   - mptcp:
      - schedule rtx timer only after pushing data
      - avoid deadlock on fallback while reinjecting

   - can: gs_usb: fix error handling

   - eth:
      - mlx5e:
         - avoid unregistering PSP twice
         - fix double unregister of HCA_PORTS component
      - bnxt_en: fix XDP_TX path
      - mlxsw: fix use-after-free when updating multicast route stats

  Previous releases - always broken:

   - ethtool: avoid overflowing userspace buffer on stats query

   - openvswitch: fix middle attribute validation in push_nsh() action

   - eth:
      - mlx5: fw_tracer, validate format string parameters
      - mlxsw: spectrum_router: fix neighbour use-after-free
      - ipvlan: ignore PACKET_LOOPBACK in handle_mode_l2()

  Misc:

   - Jozsef Kadlecsik retires from maintaining netfilter

   - tools: ynl: fix build on systems with old kernel headers"

* tag 'net-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  net: hns3: add VLAN id validation before using
  net: hns3: using the num_tqps to check whether tqp_index is out of range when vf get ring info from mbx
  net: hns3: using the num_tqps in the vf driver to apply for resources
  net: enetc: do not transmit redirected XDP frames when the link is down
  selftests/tc-testing: Test case exercising potential mirred redirect deadlock
  net/sched: act_mirred: fix loop detection
  sctp: Clear inet_opt in sctp_v6_copy_ip_options().
  sctp: Fetch inet6_sk() after setting ->pinet6 in sctp_clone_sock().
  net/handshake: duplicate handshake cancellations leak socket
  net/mlx5e: Don't include PSP in the hard MTU calculations
  net/mlx5e: Do not update BQL of old txqs during channel reconfiguration
  net/mlx5e: Trigger neighbor resolution for unresolved destinations
  net/mlx5e: Use ip6_dst_lookup instead of ipv6_dst_lookup_flow for MAC init
  net/mlx5: Serialize firmware reset with devlink
  net/mlx5: fw_tracer, Handle escaped percent properly
  net/mlx5: fw_tracer, Validate format string parameters
  net/mlx5: Drain firmware reset in shutdown callback
  net/mlx5: fw reset, clear reset requested on drain_fw_reset
  net: dsa: mxl-gsw1xx: manually clear RANEG bit
  net: dsa: mxl-gsw1xx: fix .shutdown driver operation
  ...
2025-12-19 07:55:35 +12:00
Jian Shen
6ef935e659 net: hns3: add VLAN id validation before using
Currently, the VLAN id may be used without validation when
receive a VLAN configuration mailbox from VF. The length of
vlan_del_fail_bmap is BITS_TO_LONGS(VLAN_N_VID). It may cause
out-of-bounds memory access once the VLAN id is bigger than
or equal to VLAN_N_VID.

Therefore, VLAN id needs to be checked to ensure it is within
the range of VLAN_N_VID.

Fixes: fe4144d47e ("net: hns3: sync VLAN filter entries when kill VLAN ID failed")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251211023737.2327018-4-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 16:58:28 +01:00
Jian Shen
d180c11aa8 net: hns3: using the num_tqps to check whether tqp_index is out of range when vf get ring info from mbx
Currently, rss_size = num_tqps / tc_num. If tc_num is 1, then num_tqps
equals rss_size. However, if the tc_num is greater than 1, then rss_size
will be less than num_tqps, causing the tqp_index check for subsequent TCs
using rss_size to always fail.

This patch uses the num_tqps to check whether tqp_index is out of range,
instead of rss_size.

Fixes: 326334aad0 ("net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx()")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251211023737.2327018-3-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 16:58:28 +01:00
Jian Shen
c2a1626974 net: hns3: using the num_tqps in the vf driver to apply for resources
Currently, hdev->htqp is allocated using hdev->num_tqps, and kinfo->tqp
is allocated using kinfo->num_tqps. However, kinfo->num_tqps is set to
min(new_tqps, hdev->num_tqps);  Therefore, kinfo->num_tqps may be smaller
than hdev->num_tqps, which causes some hdev->htqp[i] to remain
uninitialized in hclgevf_knic_setup().

Thus, this patch allocates hdev->htqp and kinfo->tqp using hdev->num_tqps,
ensuring that the lengths of hdev->htqp and kinfo->tqp are consistent
and that all elements are properly initialized.

Fixes: e2cb1dec97 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251211023737.2327018-2-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 16:58:28 +01:00
Wei Fang
2939203ffe net: enetc: do not transmit redirected XDP frames when the link is down
In the current implementation, the enetc_xdp_xmit() always transmits
redirected XDP frames even if the link is down, but the frames cannot
be transmitted from TX BD rings when the link is down, so the frames
are still kept in the TX BD rings. If the XDP program is uninstalled,
users will see the following warning logs.

fsl_enetc 0000:00:00.0 eno0: timeout for tx ring #6 clear

More worse, the TX BD ring cannot work properly anymore, because the
HW PIR and CIR are not equal after the re-initialization of the TX
BD ring. At this point, the BDs between CIR and PIR are invalid,
which will cause a hardware malfunction.

Another reason is that there is internal context in the ring prefetch
logic that will retain the state from the first incarnation of the ring
and continue prefetching from the stale location when we re-initialize
the ring. The internal context is only reset by an FLR. That is to say,
for LS1028A ENETC, software cannot set the HW CIR and PIR when
initializing the TX BD ring.

It does not make sense to transmit redirected XDP frames when the link is
down. Add a link status check to prevent transmission in this condition.
This fixes part of the issue, but more complex cases remain. For example,
the TX BD ring may still contain unsent frames when the link goes down.
Those situations require additional patches, which will build on this
one.

Fixes: 9d2b68cc10 ("net: enetc: add support for XDP_REDIRECT")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251211020919.121113-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 16:45:13 +01:00
Cosmin Ratiu
4198a14c8c net/mlx5e: Don't include PSP in the hard MTU calculations
Commit [1] added the 40 bytes required by the PSP header+trailer and the
UDP header to MLX5E_ETH_HARD_MTU, which limits the device-wide max
software MTU that could be set. This is not okay, because most packets
are not PSP packets and it doesn't make sense to always reserve space
for headers which won't get added in most cases.

As it turns out, for TCP connections, PSP overhead is already taken into
account in the TCP MSS calculations via inet_csk(sk)->icsk_ext_hdr_len.
This was added in commit [2]. This means that the extra space reserved
in the hard MTU for mlx5 ends up unused and wasted.

Remove the unnecessary 40 byte reservation from hard MTU.

[1] commit e5a1861a29 ("net/mlx5e: Implement PSP Tx data path")
[2] commit e97269257f ("net: psp: update the TCP MSS to reflect PSP
packet overhead")

Fixes: e5a1861a29 ("net/mlx5e: Implement PSP Tx data path")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-10-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 13:39:30 +01:00
Tariq Toukan
c8591decd9 net/mlx5e: Do not update BQL of old txqs during channel reconfiguration
During channel reconfiguration (e.g., ethtool private flags changes),
the driver can trigger a kernel BUG_ON in dql_completed() with the error
"kernel BUG at lib/dynamic_queue_limits.c:99".

The issue occurs in the following sequence:

During mlx5e_safe_switch_params(), old channels are deactivated via
mlx5e_deactivate_txqsq(). New channels are created and activated, taking
ownership of the netdev_queues and their BQL state.

When old channels are closed via mlx5e_close_txqsq(), there may be
pending TX descriptors (sq->cc != sq->pc) that were in-flight during the
deactivation.

mlx5e_free_txqsq_descs() frees these pending descriptors and attempts to
complete them via netdev_tx_completed_queue().

However, the BQL state (dql->num_queued and dql->num_completed) have
been reset in mlx5e_activate_txqsq and belong to the new queue owner,
leading to dql->num_queued - dql->num_completed < nbytes.

This triggers BUG_ON(count > num_queued - num_completed) in
dql_completed().

Fixes: 3b88a535a8 ("net/mlx5e: Defer channels closure to reduce interface down time")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-9-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 13:39:29 +01:00
Jianbo Liu
9ab89bde13 net/mlx5e: Trigger neighbor resolution for unresolved destinations
When initializing the MAC addresses for an outbound IPsec packet offload
rule in mlx5e_ipsec_init_macs, the call to dst_neigh_lookup is used to
find the next-hop neighbor (typically the gateway in tunnel mode).
This call might create a new neighbor entry if one doesn't already
exist. This newly created entry starts in the INCOMPLETE state, as the
kernel hasn't yet sent an ARP or NDISC probe to resolve the MAC
address. In this case, neigh_ha_snapshot will correctly return an
all-zero MAC address.

IPsec packet offload requires the actual next-hop MAC address to
program the rule correctly. If the neighbor state is INCOMPLETE when
the rule is created, the hardware rule is programmed with an all-zero
destination MAC address. Packets sent using this rule will be
subsequently dropped by the receiving network infrastructure or host.

This patch adds a check specifically for the outbound offload path. If
neigh_ha_snapshot returns an all-zero MAC address, it proactively
calls neigh_event_send(n, NULL). This ensures the kernel immediately
sends the initial ARP or NDISC probe if one isn't already pending,
accelerating the resolution process. This helps prevent the hardware
rule from being programmed with an invalid MAC address and avoids
packet drops due to unresolved neighbors.

Fixes: 71670f766b ("net/mlx5e: Support routed networks during IPsec MACs initialization")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-8-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 13:39:29 +01:00
Jianbo Liu
e35d7da8dd net/mlx5e: Use ip6_dst_lookup instead of ipv6_dst_lookup_flow for MAC init
Replace ipv6_stub->ipv6_dst_lookup_flow() with ip6_dst_lookup() in
mlx5e_ipsec_init_macs() since IPsec transformations are not needed
during Security Association setup - only basic routing information is
required for nexthop MAC address resolution.

This resolves an issue where XfrmOutNoStates error counter would be
incremented when xfrm policy is configured before xfrm state, as the
IPsec-aware routing function would attempt policy checks during SA
initialization.

Fixes: 71670f766b ("net/mlx5e: Support routed networks during IPsec MACs initialization")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-7-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 13:39:29 +01:00
Shay Drory
367e501f8b net/mlx5: Serialize firmware reset with devlink
The firmware reset mechanism can be triggered by asynchronous events,
which may race with other devlink operations like devlink reload or
devlink dev eswitch set, potentially leading to inconsistent states.

This patch addresses the race by using the devl_lock to serialize the
firmware reset against other devlink operations. When a reset is
requested, the driver attempts to acquire the lock. If successful, it
sets a flag to block devlink reload or eswitch changes, ACKs the reset
to firmware and then releases the lock. If the lock is already held by
another operation, the driver NACKs the firmware reset request,
indicating that the reset cannot proceed.

Firmware reset does not keep the devl_lock and instead uses an internal
firmware reset bit. This is because firmware resets can be triggered by
asynchronous events, and processed in different threads. It is illegal
and unsafe to acquire a lock in one thread and attempt to release it in
another, as lock ownership is intrinsically thread-specific.

This change ensures that firmware resets and other devlink operations
are mutually exclusive during the critical reset request phase,
preventing race conditions.

Fixes: 38b9f903f2 ("net/mlx5: Handle sync reset request event")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mateusz Berezecki <mberezecki@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 13:39:29 +01:00
Shay Drory
c0289f67f7 net/mlx5: fw_tracer, Handle escaped percent properly
The firmware tracer's format string validation and parameter counting
did not properly handle escaped percent signs (%%). This caused
fw_tracer to count more parameters when trace format strings contained
literal percent characters.

To fix it, allow %% to pass string validation and skip %% sequences when
counting parameters since they represent literal percent signs rather
than format specifiers.

Fixes: 70dd6fdb89 ("net/mlx5: FW tracer, parse traces and kernel tracing support")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reported-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Closes: https://lore.kernel.org/netdev/hanz6rzrb2bqbplryjrakvkbmv4y5jlmtthnvi3thg5slqvelp@t3s3erottr6s/
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 13:39:29 +01:00
Shay Drory
b35966042d net/mlx5: fw_tracer, Validate format string parameters
Add validation for format string parameters in the firmware tracer to
prevent potential security vulnerabilities and crashes from malformed
format strings received from firmware.

The firmware tracer receives format strings from the device firmware and
uses them to format trace messages. Without proper validation, bad
firmware could provide format strings with invalid format specifiers
(e.g., %s, %p, %n) that could lead to crashes, or other undefined
behavior.

Add mlx5_tracer_validate_params() to validate that all format specifiers
in trace strings are limited to safe integer/hex formats (%x, %d, %i,
%u, %llx, %lx, etc.). Reject strings containing other format types that
could be used to access arbitrary memory or cause crashes.
Invalid format strings are added to the trace output for visibility with
"BAD_FORMAT: " prefix.

Fixes: 70dd6fdb89 ("net/mlx5: FW tracer, parse traces and kernel tracing support")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reported-by: Breno Leitao <leitao@debian.org>
Closes: https://lore.kernel.org/netdev/hanz6rzrb2bqbplryjrakvkbmv4y5jlmtthnvi3thg5slqvelp@t3s3erottr6s/
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 13:39:29 +01:00
Moshe Shemesh
5846a365fc net/mlx5: Drain firmware reset in shutdown callback
Invoke drain_fw_reset() in the shutdown callback to ensure all
firmware reset handling is completed before shutdown proceeds.

Fixes: 16d42d3133 ("net/mlx5: Drain fw_reset when removing device")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 13:39:29 +01:00
Moshe Shemesh
89a898d63f net/mlx5: fw reset, clear reset requested on drain_fw_reset
drain_fw_reset() waits for ongoing firmware reset events and blocks new
event handling, but does not clear the reset requested flag, and may
keep sync reset polling.

To fix it, call mlx5_sync_reset_clear_reset_requested() to clear the
flag, stop sync reset polling, and resume health polling, ensuring
health issues are still detected after the firmware reset drain.

Fixes: 16d42d3133 ("net/mlx5: Drain fw_reset when removing device")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 13:39:29 +01:00
Alexey Simakov
50b3db3e11 broadcom: b44: prevent uninitialized value usage
On execution path with raised B44_FLAG_EXTERNAL_PHY, b44_readphy()
leaves bmcr value uninitialized and it is used later in the code.

Add check of this flag at the beginning of the b44_nway_reset() and
exit early of the function with restarting autonegotiation if an
external PHY is used.

Fixes: 753f492093 ("[B44]: port to native ssb support")
Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexey Simakov <bigalex934@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251205155815.4348-1-bigalex934@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10 01:02:20 -08:00
Arnd Bergmann
9e7477a427 net: ti: icssg-prueth: add PTP_1588_CLOCK_OPTIONAL dependency
The new icssg-prueth driver needs the same dependency as the other parts
that use the ptp-1588:

WARNING: unmet direct dependencies detected for TI_ICSS_IEP
  Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PTP_1588_CLOCK_OPTIONAL [=m] && TI_PRUSS [=y]
  Selected by [y]:
  - TI_PRUETH [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PRU_REMOTEPROC [=y] && NET_SWITCHDEV [=y]

Add the correct dependency on the two drivers missing it, and remove
the pointless 'imply' in the process.

Fixes: e654b85a69 ("net: ti: icssg-prueth: Add ICSSG Ethernet driver for AM65x SR1.0 platforms")
Fixes: 511f6c1ae0 ("net: ti: icssm-prueth: Adds ICSSM Ethernet driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251204100138.1034175-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10 00:49:56 -08:00
Linus Torvalds
55aa394a5e Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
 "This has another new RDMA driver 'bng_en' for latest generation
  Broadcom NICs. There might be one more new driver still to come.

  Otherwise it is a fairly quite cycle. Summary:

   - Minor driver bug fixes and updates to cxgb4, rxe, rdmavt, bnxt_re,
     mlx5

   - Many bug fix patches for irdma

   - WQ_PERCPU annotations and system_dfl_wq changes

   - Improved mlx5 support for "other eswitches" and multiple PFs

   - 1600Gbps link speed reporting support. Four Digits Now!

   - New driver bng_en for latest generation Broadcom NICs

   - Bonding support for hns

   - Adjust mlx5's hmm based ODP to work with the very large address
     space created by the new 5 level paging default on x86

   - Lockdep fixups in rxe and siw"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (65 commits)
  RDMA/rxe: reclassify sockets in order to avoid false positives from lockdep
  RDMA/siw: reclassify sockets in order to avoid false positives from lockdep
  RDMA/bng_re: Remove prefetch instruction
  RDMA/core: Reduce cond_resched() frequency in __ib_umem_release
  RDMA/irdma: Fix SRQ shadow area address initialization
  RDMA/irdma: Remove doorbell elision logic
  RDMA/irdma: Do not set IBK_LOCAL_DMA_LKEY for GEN3+
  RDMA/irdma: Do not directly rely on IB_PD_UNSAFE_GLOBAL_RKEY
  RDMA/irdma: Add missing mutex destroy
  RDMA/irdma: Fix SIGBUS in AEQ destroy
  RDMA/irdma: Add a missing kfree of struct irdma_pci_f for GEN2
  RDMA/irdma: Fix data race in irdma_free_pble
  RDMA/irdma: Fix data race in irdma_sc_ccq_arm
  RDMA/mlx5: Add support for 1600_8x lane speed
  RDMA/core: Add new IB rate for XDR (8x) support
  IB/mlx5: Reduce IMR KSM size when 5-level paging is enabled
  RDMA/bnxt_re: Pass correct flag for dma mr creation
  RDMA/bnxt_re: Fix the inline size for GenP7 devices
  RDMA/hns: Support reset recovery for bond
  RDMA/hns: Support link state reporting for bond
  ...
2025-12-04 18:54:37 -08:00
Michael Chan
0373d5c387 bnxt_en: Fix XDP_TX path
For XDP_TX action in bnxt_rx_xdp(), clearing of the event flags is not
correct.  __bnxt_poll_work() -> bnxt_rx_pkt() -> bnxt_rx_xdp() may be
looping within NAPI and some event flags may be set in earlier
iterations.  In particular, if BNXT_TX_EVENT is set earlier indicating
some XDP_TX packets are ready and pending, it will be cleared if it is
XDP_TX action again.  Normally, we will set BNXT_TX_EVENT again when we
successfully call __bnxt_xmit_xdp().  But if the TX ring has no more
room, the flag will not be set.  This will cause the TX producer to be
ahead but the driver will not hit the TX doorbell.

For multi-buf XDP_TX, there is no need to clear the event flags and set
BNXT_AGG_EVENT.  The BNXT_AGG_EVENT flag should have been set earlier in
bnxt_rx_pkt().

The visible symptom of this is that the RX ring associated with the
TX XDP ring will eventually become empty and all packets will be dropped.
Because this condition will cause the driver to not refill the RX ring
seeing that the TX ring has forever pending XDP_TX packets.

The fix is to only clear BNXT_RX_EVENT when we have successfully
called __bnxt_xmit_xdp().

Fixes: 7f0a168b04 ("bnxt_en: Add completion ring pointer in TX and RX ring structures")
Reported-by: Pavel Dubovitsky <pdubovitsky@meta.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251203003024.2246699-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-04 17:54:25 -08:00
Tim Hostetler
a479a27f4d gve: Move gve_init_clock to after AQ CONFIGURE_DEVICE_RESOURCES call
commit 46e7860ef9 ("gve: Move ptp_schedule_worker to gve_init_clock")
moved the first invocation of the AQ command REPORT_NIC_TIMESTAMP to
gve_probe(). However, gve_init_clock() invoking REPORT_NIC_TIMESTAMP is
not valid until after gve_probe() invokes the AQ command
CONFIGURE_DEVICE_RESOURCES.

Failure to do so results in the following error:

gve 0000:00:07.0: failed to read NIC clock -11

This was missed earlier because the driver under test was loaded at
runtime instead of boot-time. The boot-time driver had already
initialized the device, causing the runtime driver to successfully call
gve_init_clock() incorrectly.

Fixes: 46e7860ef9 ("gve: Move ptp_schedule_worker to gve_init_clock")
Reviewed-by: Ankit Garg <nktgrg@google.com>
Signed-off-by: Tim Hostetler <thostet@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251202200207.1434749-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-04 17:54:16 -08:00
René Rebe
dd75c723ef r8169: fix RTL8117 Wake-on-Lan in DASH mode
Wake-on-Lan does currently not work for r8169 in DASH mode, e.g. the
ASUS Pro WS X570-ACE with RTL8168fp/RTL8117.

Fix by not returning early in rtl_prepare_power_down when dash_enabled.
While this fixes WoL, it still kills the OOB RTL8117 remote management
BMC connection. Fix by not calling rtl8168_driver_stop if WoL is enabled.

Fixes: 065c27c184 ("r8169: phy power ops")
Signed-off-by: René Rebe <rene@exactco.de>
Cc: stable@vger.kernel.org
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20251202.194137.1647877804487085954.rene@exactco.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-04 17:54:07 -08:00
Ido Schimmel
8ac1dacec4 mlxsw: spectrum_mr: Fix use-after-free when updating multicast route stats
Cited commit added a dedicated mutex (instead of RTNL) to protect the
multicast route list, so that it will not change while the driver
periodically traverses it in order to update the kernel about multicast
route stats that were queried from the device.

One instance of list entry deletion (during route replace) was missed
and it can result in a use-after-free [1].

Fix by acquiring the mutex before deleting the entry from the list and
releasing it afterwards.

[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum]
Read of size 8 at addr ffff8881523c2fa8 by task kworker/2:5/22043

CPU: 2 UID: 0 PID: 22043 Comm: kworker/2:5 Not tainted 6.18.0-rc1-custom-g1a3d6d7cd014 #1 PREEMPT(full)
Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017
Workqueue: mlxsw_core mlxsw_sp_mr_stats_update [mlxsw_spectrum]
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 print_report+0x174/0x4f5
 kasan_report+0xdf/0x110
 mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum]
 process_one_work+0x9cc/0x18e0
 worker_thread+0x5df/0xe40
 kthread+0x3b8/0x730
 ret_from_fork+0x3e9/0x560
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Allocated by task 29933:
 kasan_save_stack+0x30/0x50
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0x8f/0xa0
 mlxsw_sp_mr_route_add+0xd8/0x4770 [mlxsw_spectrum]
 mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum]
 process_one_work+0x9cc/0x18e0
 worker_thread+0x5df/0xe40
 kthread+0x3b8/0x730
 ret_from_fork+0x3e9/0x560
 ret_from_fork_asm+0x1a/0x30

Freed by task 29933:
 kasan_save_stack+0x30/0x50
 kasan_save_track+0x14/0x30
 __kasan_save_free_info+0x3b/0x70
 __kasan_slab_free+0x43/0x70
 kfree+0x14e/0x700
 mlxsw_sp_mr_route_add+0x2dea/0x4770 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:444 [mlxsw_spectrum]
 mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum]
 process_one_work+0x9cc/0x18e0
 worker_thread+0x5df/0xe40
 kthread+0x3b8/0x730
 ret_from_fork+0x3e9/0x560
 ret_from_fork_asm+0x1a/0x30

Fixes: f38656d067 ("mlxsw: spectrum_mr: Protect multicast route list with a lock")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/f996feecfd59fde297964bfc85040b6d83ec6089.1764695650.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-04 17:53:49 -08:00
Ido Schimmel
8b0e69763e mlxsw: spectrum_router: Fix neighbour use-after-free
We sometimes observe use-after-free when dereferencing a neighbour [1].
The problem seems to be that the driver stores a pointer to the
neighbour, but without holding a reference on it. A reference is only
taken when the neighbour is used by a nexthop.

Fix by simplifying the reference counting scheme. Always take a
reference when storing a neighbour pointer in a neighbour entry. Avoid
taking a referencing when the neighbour is used by a nexthop as the
neighbour entry associated with the nexthop already holds a reference.

Tested by running the test that uncovered the problem over 300 times.
Without this patch the problem was reproduced after a handful of
iterations.

[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_neigh_entry_update+0x2d4/0x310
Read of size 8 at addr ffff88817f8e3420 by task ip/3929

CPU: 3 UID: 0 PID: 3929 Comm: ip Not tainted 6.18.0-rc4-virtme-g36b21a067510 #3 PREEMPT(full)
Hardware name: Nvidia SN5600/VMOD0013, BIOS 5.13 05/31/2023
Call Trace:
 <TASK>
 dump_stack_lvl+0x6f/0xa0
 print_address_description.constprop.0+0x6e/0x300
 print_report+0xfc/0x1fb
 kasan_report+0xe4/0x110
 mlxsw_sp_neigh_entry_update+0x2d4/0x310
 mlxsw_sp_router_rif_gone_sync+0x35f/0x510
 mlxsw_sp_rif_destroy+0x1ea/0x730
 mlxsw_sp_inetaddr_port_vlan_event+0xa1/0x1b0
 __mlxsw_sp_inetaddr_lag_event+0xcc/0x130
 __mlxsw_sp_inetaddr_event+0xf5/0x3c0
 mlxsw_sp_router_netdevice_event+0x1015/0x1580
 notifier_call_chain+0xcc/0x150
 call_netdevice_notifiers_info+0x7e/0x100
 __netdev_upper_dev_unlink+0x10b/0x210
 netdev_upper_dev_unlink+0x79/0xa0
 vrf_del_slave+0x18/0x50
 do_set_master+0x146/0x7d0
 do_setlink.isra.0+0x9a0/0x2880
 rtnl_newlink+0x637/0xb20
 rtnetlink_rcv_msg+0x6fe/0xb90
 netlink_rcv_skb+0x123/0x380
 netlink_unicast+0x4a3/0x770
 netlink_sendmsg+0x75b/0xc90
 __sock_sendmsg+0xbe/0x160
 ____sys_sendmsg+0x5b2/0x7d0
 ___sys_sendmsg+0xfd/0x180
 __sys_sendmsg+0x124/0x1c0
 do_syscall_64+0xbb/0xfd0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
[...]

Allocated by task 109:
 kasan_save_stack+0x30/0x50
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0x7b/0x90
 __kmalloc_noprof+0x2c1/0x790
 neigh_alloc+0x6af/0x8f0
 ___neigh_create+0x63/0xe90
 mlxsw_sp_nexthop_neigh_init+0x430/0x7e0
 mlxsw_sp_nexthop_type_init+0x212/0x960
 mlxsw_sp_nexthop6_group_info_init.constprop.0+0x81f/0x1280
 mlxsw_sp_nexthop6_group_get+0x392/0x6a0
 mlxsw_sp_fib6_entry_create+0x46a/0xfd0
 mlxsw_sp_router_fib6_replace+0x1ed/0x5f0
 mlxsw_sp_router_fib6_event_work+0x10a/0x2a0
 process_one_work+0xd57/0x1390
 worker_thread+0x4d6/0xd40
 kthread+0x355/0x5b0
 ret_from_fork+0x1d4/0x270
 ret_from_fork_asm+0x11/0x20

Freed by task 154:
 kasan_save_stack+0x30/0x50
 kasan_save_track+0x14/0x30
 __kasan_save_free_info+0x3b/0x60
 __kasan_slab_free+0x43/0x70
 kmem_cache_free_bulk.part.0+0x1eb/0x5e0
 kvfree_rcu_bulk+0x1f2/0x260
 kfree_rcu_work+0x130/0x1b0
 process_one_work+0xd57/0x1390
 worker_thread+0x4d6/0xd40
 kthread+0x355/0x5b0
 ret_from_fork+0x1d4/0x270
 ret_from_fork_asm+0x11/0x20

Last potentially related work creation:
 kasan_save_stack+0x30/0x50
 kasan_record_aux_stack+0x8c/0xa0
 kvfree_call_rcu+0x93/0x5b0
 mlxsw_sp_router_neigh_event_work+0x67d/0x860
 process_one_work+0xd57/0x1390
 worker_thread+0x4d6/0xd40
 kthread+0x355/0x5b0
 ret_from_fork+0x1d4/0x270
 ret_from_fork_asm+0x11/0x20

Fixes: 6cf3c971dc ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/92d75e21d95d163a41b5cea67a15cd33f547cba6.1764695650.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-04 17:53:49 -08:00
Ido Schimmel
b6b638bda2 mlxsw: spectrum_router: Fix possible neighbour reference count leak
mlxsw_sp_router_schedule_work() takes a reference on a neighbour,
expecting a work item to release it later on. However, we might fail to
schedule the work item, in which case the neighbour reference count will
be leaked.

Fix by taking the reference just before scheduling the work item. Note
that mlxsw_sp_router_schedule_work() can receive a NULL neighbour
pointer, but neigh_clone() handles that correctly.

Spotted during code review, did not actually observe the reference count
leak.

Fixes: 151b89f602 ("mlxsw: spectrum_router: Reuse work neighbor initialization in work scheduler")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/ec2934ae4aca187a8d8c9329a08ce93cca411378.1764695650.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-04 17:53:48 -08:00
Linus Torvalds
43dfc13ca9 Merge tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms (Dan
     Williams)

   - Switch vmd from custom domain number allocator to the common
     allocator to prevent a potential race with new non-VMD buses (Dan
     Williams)

   - Enable Precision Time Measurement (PTM) only if device advertises
     support for a relevant role, to prevent invalid PTM Requests that
     cause ACS violations that are reported as AER Uncorrectable
     Non-Fatal errors (Mika Westerberg)

  Resource management:

   - Prevent resource tree corruption when BAR resize fails (Ilpo
     Järvinen)

   - Restore BARs to the original size if a BAR resize fails (Ilpo
     Järvinen)

   - Remove BAR release from BAR resize attempts by the xe, i915, and
     amdgpu drivers so the PCI core can restore BARs if the resize fails
     (Ilpo Järvinen)

   - Move Resizable BAR code to rebar.c (Ilpo Järvinen)

   - Add pci_rebar_size_supported() and use it in i915 and xe (Ilpo
     Järvinen)

   - Add pci_rebar_get_max_size() and use it in xe and amdgpu (Ilpo
     Järvinen)

  Power management and error handling:

   - For drivers using PCI legacy suspend, save config state at suspend
     so that state (not any earlier state from enumeration, probe, or
     error recovery) will be restored when resuming (Lukas Wunner)

   - For devices with no driver or a driver that lacks power management,
     save config state at hibernate so that state (not any earlier state
     from enumeration, probe, or error recovery) will be restored when
     resuming (Lukas Wunner)

   - Save device config space on device addition, before driver binding,
     so error recovery works more reliably (Lukas Wunner)

   - Drop pci_save_state() from several drivers that no longer need it
     since the PCI core always does it and pci_restore_state() no longer
     invalidates the saved state (Lukas Wunner)

   - Document use of pci_save_state() by drivers to capture the state
     they want restored during error recovery (Lukas Wunner)

  Power control:

   - Add a struct pci_ops.assert_perst() function pointer to
     assert/deassert PCIe PERST# and implement it for the qcom driver
     (Krishna Chaitanya Chundru)

   - Add DT binding and pwrctrl driver for the Toshiba TC9563 PCIe
     switch, which must be held in reset after poweron so the pwrctrl
     driver can configure the switch via I2C before bringing up the
     links (Krishna Chaitanya Chundru)

  Endpoint framework:

   - Convert the endpoint doorbell test to use a threaded IRQ to fix a
     'sleeping while atomic' issue (Bhanu Seshu Kumar Valluri)

   - Add endpoint VNTB MSI doorbell support to reduce latency between
     host and endpoint (Frank Li)

  New native PCIe controller drivers:

   - Add CIX Sky1 host controller DT binding and driver (Hans Zhang)

   - Add NXP S32G host controller DT binding and driver (Vincent
     Guittot)

   - Add Renesas RZ/G3S host controller DT binding and driver (Claudiu
     Beznea)

   - Add SpacemiT K1 host controller DT binding and driver (Alex Elder)

  Amlogic Meson PCIe controller driver:

   - Update DT binding to name DBI region 'dbi', not 'elbi', and update
     driver to support both (Manivannan Sadhasivam)

  Apple PCIe controller driver:

   - Move struct pci_host_bridge allocation from pci_host_common_init()
     to callers, which significantly simplifies pcie-apple (Marc
     Zyngier)

  Broadcom STB PCIe controller driver:

   - Disable advertising ASPM L0s support correctly (Jim Quinlan)

   - Add a panic/die handler to print diagnostic info in case PCIe
     caused an unrecoverable abort (Jim Quinlan)

  Cadence PCIe controller driver:

   - Add module support for Cadence platform host and endpoint
     controller driver (Manikandan K Pillai)

   - Split headers into 'legacy' (LGA) and 'high perf' (HPA) to prepare
     for new CIX Sky1 driver (Manikandan K Pillai)

  MediaTek PCIe controller driver:

   - Convert DT binding to YAML schema (Christian Marangi)

   - Add Airoha AN7583 DT compatible and driver support (Christian
     Marangi)

  Qualcomm PCIe controller driver:

   - Add Qualcomm Kaanapali to SM8550 DT binding (Qiang Yu)

   - Add required 'power-domains' and 'resets' to qcom sa8775p, sc7280,
     sc8280xp, sm8150, sm8250, sm8350, sm8450, sm8550, x1e80100 DT
     schemas (Krzysztof Kozlowski)

   - Look up OPP using both frequency and data rate (not just frequency)
     so RPMh votes can account for both (Krishna Chaitanya Chundru)

  Rockchip DesignWare PCIe controller driver:

   - Add Rockchip RK3528 compatible strings in DT binding (Yao Zi)

  STMicroelectronics STM32MP25 PCIe controller driver:

   - Fix a race between link training and endpoint register
     initialization (Christian Bruel)

   - Align endpoint allocations to match the ATU requirements (Christian
     Bruel)

  Synopsys DesignWare PCIe controller driver:

   - Clear L1 PM Substate Capability 'Supported' bits unless glue driver
     says it's supported, which prevents users from enabling non-working
     L1SS. Currently only qcom and tegra194 support L1SS (Bjorn Helgaas)

   - Remove now-superfluous L1SS disable code from tegra194 (Bjorn
     Helgaas)

   - Configure L1SS support in dw-rockchip when DT says
     'supports-clkreq' (Shawn Lin)

  TI Keystone PCIe controller driver:

   - Fail the probe instead of silently succeeding if ks_pcie_of_data
     didn't specify Root Complex or Endpoint mode (Siddharth Vadapalli)

   - Make keystone buildable as a loadable module, except on ARM32 where
     hook_fault_code() is __init (Siddharth Vadapalli)"

* tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (100 commits)
  MAINTAINERS: Add Manivannan Sadhasivam as PCI/pwrctrl maintainer
  MAINTAINERS: Add CIX Sky1 PCIe controller driver maintainer
  PCI: sky1: Add PCIe host support for CIX Sky1
  dt-bindings: PCI: Add CIX Sky1 PCIe Root Complex bindings
  PCI: cadence: Add support for High Perf Architecture (HPA) controller
  MAINTAINERS: Add NXP S32G PCIe controller driver maintainer
  PCI: s32g: Add NXP S32G PCIe controller driver (RC)
  PCI: dwc: Add register and bitfield definitions
  dt-bindings: PCI: s32g: Add NXP S32G PCIe controller
  PCI: Add Renesas RZ/G3S host controller driver
  PCI: host-generic: Move bridge allocation outside of pci_host_common_init()
  dt-bindings: PCI: Add Renesas RZ/G3S PCIe controller binding
  PCI: Validate pci_rebar_size_supported() input
  Documentation: PCI: Amend error recovery doc with pci_save_state() rules
  treewide: Drop pci_save_state() after pci_restore_state()
  PCI/ERR: Ensure error recoverability at all times
  PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw
  PCI/PM: Reinstate clearing state_saved in legacy and !PM codepaths
  PCI: dw-rockchip: Configure L1SS support
  PCI: tegra194: Remove unnecessary L1SS disable code
  ...
2025-12-04 17:29:41 -08:00
Gerd Bayer
6a107cfe9c net/mlx5: Fix double unregister of HCA_PORTS component
Clear hca_devcom_comp in device's private data after unregistering it in
LAG teardown. Otherwise a slightly lagging second pass through
mlx5_unload_one() might try to unregister it again and trip over
use-after-free.

On s390 almost all PCI level recovery events trigger two passes through
mxl5_unload_one() - one through the poll_health() method and one through
mlx5_pci_err_detected() as callback from generic PCI error recovery.
While testing PCI error recovery paths with more kernel debug features
enabled, this issue reproducibly led to kernel panics with the following
call chain:

 Unable to handle kernel pointer dereference in virtual kernel address space
 Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI
 Fault in home space mode while using kernel ASCE.
 AS:00000000705c4007 R3:0000000000000024
 Oops: 0038 ilc:3 [#1]SMP

 CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted
      6.18.0-20251130.rc7.git0.16131a59cab1.300.fc43.s390x+debug #1 PREEMPT

 Krnl PSW : 0404e00180000000 0000020fc86aa1dc (__lock_acquire+0x5c/0x15f0)
            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
 Krnl GPRS: 0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33 0000000000000000
            0000000000000000 0000000000000000 0000000000000001 0000000000000000
            0000000000000000 0000020fca28b820 0000000000000000 0000010a1ced8100
            0000010a1ced8100 0000020fc9775068 0000018fce14f8b8 0000018fce14f7f8
 Krnl Code: 0000020fc86aa1cc: e3b003400004        lg      %r11,832
            0000020fc86aa1d2: a7840211           brc     8,0000020fc86aa5f4
           *0000020fc86aa1d6: c09000df0b25       larl    %r9,0000020fca28b820
           >0000020fc86aa1dc: d50790002000       clc     0(8,%r9),0(%r2)
            0000020fc86aa1e2: a7840209           brc     8,0000020fc86aa5f4
            0000020fc86aa1e6: c0e001100401       larl    %r14,0000020fca8aa9e8
            0000020fc86aa1ec: c01000e25a00       larl    %r1,0000020fca2f55ec
            0000020fc86aa1f2: a7eb00e8           aghi    %r14,232

 Call Trace:
  __lock_acquire+0x5c/0x15f0
  lock_acquire.part.0+0xf8/0x270
  lock_acquire+0xb0/0x1b0
  down_write+0x5a/0x250
  mlx5_detach_device+0x42/0x110 [mlx5_core]
  mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core]
  mlx5_unload_one+0x42/0x60 [mlx5_core]
  mlx5_pci_err_detected+0x94/0x150 [mlx5_core]
  zpci_event_attempt_error_recovery+0xcc/0x388

Fixes: 5a977b5833 ("net/mlx5: Lag, move devcom registration to LAG layer")
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20251202-fix_lag-v1-1-59e8177ffce0@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-04 15:29:13 +01:00
Cosmin Ratiu
35e93736f6 net/mlx5e: Avoid unregistering PSP twice
PSP is unregistered twice in:
_mlx5e_remove -> mlx5e_psp_unregister
mlx5e_nic_cleanup -> mlx5e_psp_unregister

This leads to a refcount underflow in some conditions:
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 2 PID: 1694 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
[...]
 mlx5e_psp_unregister+0x26/0x50 [mlx5_core]
 mlx5e_nic_cleanup+0x26/0x90 [mlx5_core]
 mlx5e_remove+0xe6/0x1f0 [mlx5_core]
 auxiliary_bus_remove+0x18/0x30
 device_release_driver_internal+0x194/0x1f0
 bus_remove_device+0xc6/0x130
 device_del+0x159/0x3c0
 mlx5_rescan_drivers_locked+0xbc/0x2a0 [mlx5_core]
[...]

Do not directly remove psp from the _mlx5e_remove path, the PSP cleanup
happens as part of profile cleanup.

Fixes: 89ee2d92f6 ("net/mlx5e: Support PSP offload functionality")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1764602008-1334866-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-04 11:55:20 +01:00
Moshe Shemesh
cd7671ef4c net/mlx5: make enable_mpesw idempotent
The enable_mpesw() function returns -EINVAL if ldev->mode is not
MLX5_LAG_MODE_NONE. This means attempting to enable MPESW mode when it's
already enabled will fail. In contrast, disable_mpesw() properly checks
if the mode is MLX5_LAG_MODE_MPESW before proceeding, making it
naturally idempotent and safe to call multiple times.

Fix enable_mpesw() to return success if mpesw is already enabled.

Fixes: a32327a3a0 ("net/mlx5: Lag, Control MultiPort E-Switch single FDB mode")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1764602008-1334866-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-04 11:55:20 +01:00
Wei Fang
e8e032cd24 net: fec: ERR007885 Workaround for XDP TX path
The ERR007885 will lead to a TDAR race condition for mutliQ when the
driver sets TDAR and the UDMA clears TDAR simultaneously or in a small
window (2-4 cycles). And it will cause the udma_tx and udma_tx_arbiter
state machines to hang. Therefore, the commit 53bb20d1fa ("net: fec:
add variable reg_desc_active to speed things up") and the commit
a179aad12b ("net: fec: ERR007885 Workaround for conventional TX") have
added the workaround to fix the potential issue for the conventional TX
path. Similarly, the XDP TX path should also have the potential hang
issue, so add the workaround for XDP TX path.

Fixes: 6d6b39f180 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251128025915.2486943-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-04 09:54:13 +01:00
Linus Torvalds
8f7aa3d3c7 Merge tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Replace busylock at the Tx queuing layer with a lockless list.

     Resulting in a 300% (4x) improvement on heavy TX workloads, sending
     twice the number of packets per second, for half the cpu cycles.

   - Allow constantly busy flows to migrate to a more suitable CPU/NIC
     queue.

     Normally we perform queue re-selection when flow comes out of idle,
     but under extreme circumstances the flows may be constantly busy.

     Add sysctl to allow periodic rehashing even if it'd risk packet
     reordering.

   - Optimize the NAPI skb cache, make it larger, use it in more paths.

   - Attempt returning Tx skbs to the originating CPU (like we already
     did for Rx skbs).

   - Various data structure layout and prefetch optimizations from Eric.

   - Remove ktime_get() from the recvmsg() fast path, ktime_get() is
     sadly quite expensive on recent AMD machines.

   - Extend threaded NAPI polling to allow the kthread busy poll for
     packets.

   - Make MPTCP use Rx backlog processing. This lowers the lock
     pressure, improving the Rx performance.

   - Support memcg accounting of MPTCP socket memory.

   - Allow admin to opt sockets out of global protocol memory accounting
     (using a sysctl or BPF-based policy). The global limits are a poor
     fit for modern container workloads, where limits are imposed using
     cgroups.

   - Improve heuristics for when to kick off AF_UNIX garbage collection.

   - Allow users to control TCP SACK compression, and default to 33% of
     RTT.

   - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
     unnecessarily aggressive rcvbuf growth and overshot when the
     connection RTT is low.

   - Preserve skb metadata space across skb_push / skb_pull operations.

   - Support for IPIP encapsulation in the nftables flowtable offload.

   - Support appending IP interface information to ICMP messages (RFC
     5837).

   - Support setting max record size in TLS (RFC 8449).

   - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.

   - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.

   - Let users configure the number of write buffers in SMC.

   - Add new struct sockaddr_unsized for sockaddr of unknown length,
     from Kees.

   - Some conversions away from the crypto_ahash API, from Eric Biggers.

   - Some preparations for slimming down struct page.

   - YAML Netlink protocol spec for WireGuard.

   - Add a tool on top of YAML Netlink specs/lib for reporting commonly
     computed derived statistics and summarized system state.

  Driver API:

   - Add CAN XL support to the CAN Netlink interface.

   - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
     defined by the OPEN Alliance's "Advanced diagnostic features for
     100BASE-T1 automotive Ethernet PHYs" specification.

   - Add DPLL phase-adjust-gran pin attribute (and implement it in
     zl3073x).

   - Refactor xfrm_input lock to reduce contention when NIC offloads
     IPsec and performs RSS.

   - Add info to devlink params whether the current setting is the
     default or a user override. Allow resetting back to default.

   - Add standard device stats for PSP crypto offload.

   - Leverage DSA frame broadcast to implement simple HSR frame
     duplication for a lot of switches without dedicated HSR offload.

   - Add uAPI defines for 1.6Tbps link modes.

  Device drivers:

   - Add Motorcomm YT921x gigabit Ethernet switch support.

   - Add MUCSE driver for N500/N210 1GbE NIC series.

   - Convert drivers to support dedicated ops for timestamping control,
     and away from the direct IOCTL handling. While at it support GET
     operations for PHY timestamping.

   - Add (and convert most drivers to) a dedicated ethtool callback for
     reading the Rx ring count.

   - Significant refactoring efforts in the STMMAC driver, which
     supports Synopsys turn-key MAC IP integrated into a ton of SoCs.

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support PPS in/out on all pins
      - Intel (100G, ice, idpf):
         - ice: implement standard ethtool and timestamping stats
         - i40e: support setting the max number of MAC addresses per VF
         - iavf: support RSS of GTP tunnels for 5G and LTE deployments
      - nVidia/Mellanox (mlx5):
         - reduce downtime on interface reconfiguration
         - disable being an XDP redirect target by default (same as
           other drivers) to avoid wasting resources if feature is
           unused
      - Meta (fbnic):
         - add support for Linux-managed PCS on 25G, 50G, and 100G links
      - Wangxun:
         - support Rx descriptor merge, and Tx head writeback
         - support Rx coalescing offload
         - support 25G SPF and 40G QSFP modules

   - Ethernet virtual:
      - Google (gve):
         - allow ethtool to configure rx_buf_len
         - implement XDP HW RX Timestamping support for DQ descriptor
           format
      - Microsoft vNIC (mana):
         - support HW link state events
         - handle hardware recovery events when probing the device

   - Ethernet NICs consumer, and embedded:
      - usbnet: add support for Byte Queue Limits (BQL)
      - AMD (amd-xgbe):
         - add device selftests
      - NXP (enetc):
         - add i.MX94 support
      - Broadcom integrated MACs (bcmgenet, bcmasp):
         - bcmasp: add support for PHY-based Wake-on-LAN
      - Broadcom switches (b53):
         - support port isolation
         - support BCM5389/97/98 and BCM63XX ARL formats
      - Lantiq/MaxLinear switches:
         - support bridge FDB entries on the CPU port
         - use regmap for register access
         - allow user to enable/disable learning
         - support Energy Efficient Ethernet
         - support configuring RMII clock delays
         - add tagging driver for MaxLinear GSW1xx switches
      - Synopsys (stmmac):
         - support using the HW clock in free running mode
         - add Eswin EIC7700 support
         - add Rockchip RK3506 support
         - add Altera Agilex5 support
      - Cadence (macb):
         - cleanup and consolidate descriptor and DMA address handling
         - add EyeQ5 support
      - TI:
         - icssg-prueth: support AF_XDP
      - Airoha access points:
         - add missing Ethernet stats and link state callback
         - add AN7583 support
         - support out-of-order Tx completion processing
      - Power over Ethernet:
         - pd692x0: preserve PSE configuration across reboots
         - add support for TPS23881B devices

   - Ethernet PHYs:
      - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
      - Support 50G SerDes and 100G interfaces in Linux-managed PHYs
      - micrel:
         - support for non PTP SKUs of lan8814
         - enable in-band auto-negotiation on lan8814
      - realtek:
         - cable testing support on RTL8224
         - interrupt support on RTL8221B
      - motorcomm: support for PHY LEDs on YT853
      - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
      - mscc: support for PHY LED control

   - CAN drivers:
      - m_can: add support for optional reset and system wake up
      - remove can_change_mtu() obsoleted by core handling
      - mcp251xfd: support GPIO controller functionality

   - Bluetooth:
      - add initial support for PASTa

   - WiFi:
      - split ieee80211.h file, it's way too big
      - improvements in VHT radiotap reporting, S1G, Channel Switch
        Announcement handling, rate tracking in mesh networks
      - improve multi-radio monitor mode support, and add a cfg80211
        debugfs interface for it
      - HT action frame handling on 6 GHz
      - initial chanctx work towards NAN
      - MU-MIMO sniffer improvements

   - WiFi drivers:
      - RealTek (rtw89):
         - support USB devices RTL8852AU and RTL8852CU
         - initial work for RTL8922DE
         - improved injection support
      - Intel:
         - iwlwifi: new sniffer API support
      - MediaTek (mt76):
         - WED support for >32-bit DMA
         - airoha NPU support
         - regdomain improvements
         - continued WiFi7/MLO work
      - Qualcomm/Atheros:
         - ath10k: factory test support
         - ath11k: TX power insertion support
         - ath12k: BSS color change support
         - ath12k: statistics improvements
      - brcmfmac: Acer A1 840 tablet quirk
      - rtl8xxxu: 40 MHz connection fixes/support"

* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
  net: page_pool: sanitise allocation order
  net: page pool: xa init with destroy on pp init
  net/mlx5e: Support XDP target xmit with dummy program
  net/mlx5e: Update XDP features in switch channels
  selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
  net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
  wireguard: netlink: generate netlink code
  wireguard: uapi: generate header with ynl-gen
  wireguard: uapi: move flag enums
  wireguard: uapi: move enum wg_cmd
  wireguard: netlink: add YNL specification
  selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
  selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
  selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
  selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
  selftests: drv-net: introduce Iperf3Runner for measurement use cases
  selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
  net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
  Documentation: net: dsa: mention simple HSR offload helpers
  Documentation: net: dsa: mention availability of RedBox
  ...
2025-12-03 17:24:33 -08:00
Linus Torvalds
4d38b88fd1 Merge tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:

 - Allow creaing nbcon console drivers with an unsafe write_atomic()
   callback that can only be called by the final nbcon_atomic_flush_unsafe().
   Otherwise, the driver would rely on the kthread.

   It is going to be used as the-best-effort approach for an
   experimental nbcon netconsole driver, see

     https://lore.kernel.org/r/20251121-nbcon-v1-2-503d17b2b4af@debian.org

   Note that a safe .write_atomic() callback is supposed to work in NMI
   context. But some networking drivers are not safe even in IRQ
   context:

     https://lore.kernel.org/r/oc46gdpmmlly5o44obvmoatfqo5bhpgv7pabpvb6sjuqioymcg@gjsma3ghoz35

   In an ideal world, all networking drivers would be fixed first and
   the atomic flush would be blocked only in NMI context. But it brings
   the question how reliable networking drivers are when the system is
   in a bad state. They might block flushing more reliable serial
   consoles which are more suitable for serious debugging anyway.

 - Allow to use the last 4 bytes of the printk ring buffer.

 - Prevent queuing IRQ work and block printk kthreads when consoles are
   suspended. Otherwise, they create non-necessary churn or even block
   the suspend.

 - Release console_lock() between each record in the kthread used for
   legacy consoles on RT. It might significantly speed up the boot.

 - Release nbcon context between each record in the atomic flush. It
   prevents stalls of the related printk kthread after it has lost the
   ownership in the middle of a record

 - Add support for NBCON consoles into KDB

 - Add %ptsP modifier for printing struct timespec64 and use it where
   possible

 - Misc code clean up

* tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (48 commits)
  printk: Use console_is_usable on console_unblank
  arch: um: kmsg_dump: Use console_is_usable
  drivers: serial: kgdboc: Drop checks for CON_ENABLED and CON_BOOT
  lib/vsprintf: Unify FORMAT_STATE_NUM handlers
  printk: Avoid irq_work for printk_deferred() on suspend
  printk: Avoid scheduling irq_work on suspend
  printk: Allow printk_trigger_flush() to flush all types
  tracing: Switch to use %ptSp
  scsi: snic: Switch to use %ptSp
  scsi: fnic: Switch to use %ptSp
  s390/dasd: Switch to use %ptSp
  ptp: ocp: Switch to use %ptSp
  pps: Switch to use %ptSp
  PCI: epf-test: Switch to use %ptSp
  net: dsa: sja1105: Switch to use %ptSp
  mmc: mmc_test: Switch to use %ptSp
  media: av7110: Switch to use %ptSp
  ipmi: Switch to use %ptSp
  igb: Switch to use %ptSp
  e1000e: Switch to use %ptSp
  ...
2025-12-03 12:42:36 -08:00
Linus Torvalds
f617d24606 Merge tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull arm64 FPSIMD on-stack buffer updates from Eric Biggers:
 "This is a core arm64 change. However, I was asked to take this because
  most uses of kernel-mode FPSIMD are in crypto or CRC code.

  In v6.8, the size of task_struct on arm64 increased by 528 bytes due
  to the new 'kernel_fpsimd_state' field. This field was added to allow
  kernel-mode FPSIMD code to be preempted.

  Unfortunately, 528 bytes is kind of a lot for task_struct. This
  regression in the task_struct size was noticed and reported.

  Recover that space by making this state be allocated on the stack at
  the beginning of each kernel-mode FPSIMD section.

  To make it easier for all the users of kernel-mode FPSIMD to do that
  correctly, introduce and use a 'scoped_ksimd' abstraction"

* tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (23 commits)
  lib/crypto: arm64: Move remaining algorithms to scoped ksimd API
  lib/crypto: arm/blake2b: Move to scoped ksimd API
  arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack
  arm64/fpu: Enforce task-context only for generic kernel mode FPU
  net/mlx5: Switch to more abstract scoped ksimd guard API on arm64
  arm64/xorblocks:  Switch to 'ksimd' scoped guard API
  crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API
  crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API
  crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API
  crypto/arm64: polyval - Switch to 'ksimd' scoped guard API
  crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API
  crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API
  crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API
  crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API
  raid6: Move to more abstract 'ksimd' guard API
  crypto: aegis128-neon - Move to more abstract 'ksimd' guard API
  crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
  crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
  crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
  lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
  ...
2025-12-02 18:53:50 -08:00
Jakub Kicinski
4de4454299 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes in preparation for the net-next PR.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-02 15:37:53 -08:00
Tariq Toukan
d4aa0cc9bd net/mlx5e: Support XDP target xmit with dummy program
Save per-channel resources in default, in device and host memory.

As no better API exist, make the XDP-redirect-target SQ available by
loading a dummy XDP program.

This improves the latency of interface up/down operations when feature
is disabled.

Perf numbers:
NIC: Connect-X7.
Setup: 248 channels, default mtu and rx/tx ring sizes.

Interface up + down:
Before: 2.246 secs
After:  1.798 secs (-0.448 sec)

Saves ~1.8 msec per channel.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Link: https://patch.msgid.link/1764497617-1326331-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-02 15:20:41 +01:00
Tariq Toukan
96a8395061 net/mlx5e: Update XDP features in switch channels
The XDP features state might depend of the state of other features, like
HW-LRO / HW-GRO.

In general, move the re-evaluation announcement of the XDP features
(xdp_set_features_flag_locked) into the flow where configuration gets
changed. There's no point in updating them elsewhere.

This is a more appropriate place, as this modifies the announced
features while channels are inactive, which avoids the small interval
between channel activation and the proper setting of the XDP features.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Link: https://patch.msgid.link/1764497617-1326331-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-02 15:20:41 +01:00
Florian Fuchs
d8e08149a5 net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
Use the napi functions napi_alloc_skb() and napi_gro_receive() instead
of netdev_alloc_skb() and netif_receive_skb() for more efficient packet
receiving. The switch to napi aware functions increases the RX
throughput, reduces the occurrence of retransmissions and improves the
resilience against SKB allocation failures.

Signed-off-by: Florian Fuchs <fuchsfl@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251130194155.1950980-1-fuchsfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 16:54:53 -08:00
Gal Pressman
87a5112bfc net/mlx5e: Use standard unit definitions for bandwidth conversion
MLX5E_100MB and MLX5E_1GB defines are confusing, MLX5E_100MB is not
equal to 100 * MEGA, and MLX5E_1GB is not equal to one GIGA, as they
hide the Kbps rate conversion required for ieee_maxrate.

Replace hardcoded bandwidth conversion values with standard unit
definitions from linux/units.h. Rename MLX5E_100MB/MLX5E_1GB to
MLX5E_100MB_TO_KB/MLX5E_1GB_TO_KB to clarify these are conversion
factors to Kbps, not absolute bandwidth values.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1764498334-1327918-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 14:56:28 -08:00
Gal Pressman
53f7a77128 net/mlx5e: Use U8_MAX instead of hard coded magic number
Replace hard coded 255 magic number with U8_MAX (the register field is 8
bits).

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1764498334-1327918-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 14:56:27 -08:00
Gal Pressman
e1098bb02f net/mlx5e: Rename upper_limit_mbps to upper_limit_100mbps
Clarify that the limit represents the threshold for using 100 Mbps
units rather than a general Mbps limit.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1764498334-1327918-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 14:56:27 -08:00
Gal Pressman
e1de33c377 net/mlx5e: Use u64 instead of __u64 in ieee_setmaxrate
Change upper_limit_mbps/gbps from __u64 to u64 to follow kernel coding
conventions.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1764498334-1327918-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 14:56:27 -08:00
Jakub Kicinski
596c696301 Revert "r8169: add DASH support for RTL8127AP"
This reverts commit 17e9f841dd.

Nathan reports error messages appearing in dmesg since commit
under Fixes:

  [    3.844125] r8169 0000:01:00.0 (unnamed net_device) (uninitialized): rtl_eriar_cond == 0 (loop: 100, delay: 100).
  [    3.864844] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
  [    3.878825] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
  [    3.892632] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
  [    5.002551] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
  [    5.016286] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
  [    5.030027] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).

Let's drop the bad change and revisit in the next release cycle.

Repoted-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/20251201224238.GA604467@ax162
Fixes: 17e9f841dd ("r8169: add DASH support for RTL8127AP")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 14:53:55 -08:00
Raju Rangoju
ab96af7004 amd-xgbe: schedule NAPI on Rx Buffer Unavailable (RBU)
Under heavy load, Rx Buffer Unavailable (RBU) can occur if Rx processing
is slower than network. When an RBU is signaled, try to schedule NAPI to
help recover from such situation (including cases where an IRQ may be
missed or such)

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20251129175016.3034185-3-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 14:30:06 -08:00
Raju Rangoju
c3b744fd20 amd-xgbe: refactor the dma IRQ handling code path
Refactor the DMA interrupt bottom-half handling to improve the
readability, maintainability, without changing the intended behavior.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20251129175016.3034185-2-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 14:30:06 -08:00
Long Li
9bf66036d6 net: mana: Handle hardware recovery events when probing the device
When MANA is being probed, it's possible that hardware is in recovery
mode and the device may get GDMA_EQE_HWC_RESET_REQUEST over HWC in the
middle of the probe. Detect such condition and go through the recovery
service procedure.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1764193552-9712-1-git-send-email-longli@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 13:53:53 -08:00
Yeounsu Moon
40d5ce4af2 net: dlink: fix several spelling mistakes in comments
This patch fixes multiple spelling mistakes in dl2k driver comments:

- "deivices" -> "devices"
- "Ttransmit" -> "Transmit"
- "catastronphic" -> "catastrophic"
- "Extened" -> "Extended"

Also fix incorrect unit description: `rx_timeout` uses 640ns increments,
not 64ns.
- "64ns" -> "640ns"

These are comment-only changes and do not affect runtime behavior.

Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251130220652.5425-2-yyyynoom@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 11:58:55 -08:00
Breno Leitao
ca8df5b877 net: enetc: convert to use .get_rx_ring_count
Convert the enetc driver to use the new .get_rx_ring_count
ethtool operation instead of implementing .get_rxnfc for handling
ETHTOOL_GRXRINGS command. This simplifies the code in two ways:

1. For enetc_get_rxnfc(): Remove the ETHTOOL_GRXRINGS case from the
   switch statement while keeping other cases for classifier rules.

2. For enetc4_get_rxnfc(): Remove it completely and use
   enetc_get_rxnfc() instead.

Now on, enetc_get_rx_ring_count() is the callback that returns the
number of RX rings for enetc driver.

Also, remove the documentation around enetc4_get_rxnfc(), which was not
matching what the function did(?!).

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251128-gxring_freescale-v1-3-22a978abf29e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 11:54:45 -08:00
Breno Leitao
b2d6339269 net: dpaa2: convert to use .get_rx_ring_count
Convert the dpaa2 driver to use the new .get_rx_ring_count
ethtool operation instead of implementing .get_rxnfc for handling
ETHTOOL_GRXRINGS command. This simplifies the code by removing the
ETHTOOL_GRXRINGS case from the switch statement and replacing it with
a direct return of the queue count.

The driver still maintains .get_rxnfc for other commands including
ETHTOOL_GRXCLSRLCNT, ETHTOOL_GRXCLSRULE, and ETHTOOL_GRXCLSRLALL.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251128-gxring_freescale-v1-2-22a978abf29e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 11:54:45 -08:00
Breno Leitao
d3fbfb8b2c net: gianfar: convert to use .get_rx_ring_count
Convert the gianfar driver to use the new .get_rx_ring_count
ethtool operation instead of implementing .get_rxnfc for handling
ETHTOOL_GRXRINGS command. This simplifies the code by removing the
ETHTOOL_GRXRINGS case from the switch statement and replacing it with
a direct return of the queue count.

The driver still maintains .get_rxnfc for other commands including
ETHTOOL_GRXCLSRLCNT, ETHTOOL_GRXCLSRULE, and ETHTOOL_GRXCLSRLALL.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251128-gxring_freescale-v1-1-22a978abf29e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 11:54:45 -08:00
Breno Leitao
335d78c616 net: bcmgenet: extract GRXRINGS from .get_rxnfc
Commit 84eaf4359c ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.

Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count().

This simplifies the RX ring count retrieval and aligns bcmgenet with the
new ethtool API for querying RX ring parameters.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251127-grxrings_broadcom-v1-2-b0b182864950@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 19:56:22 -08:00
Breno Leitao
bba18f3ba7 net: bnxt: extract GRXRINGS from .get_rxnfc
Commit 84eaf4359c ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.

Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count().

This simplifies the RX ring count retrieval and aligns bnxt with the new
ethtool API for querying RX ring parameters.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251127-grxrings_broadcom-v1-1-b0b182864950@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 19:56:22 -08:00
Rob Miller
72405298e9 bnxt_en: Add Virtual Admin Link State Support for VFs
The firmware can now cache the virtual link admin state (auto/on/off) of
all VFs and as such, the PF driver no longer has to intercept the VF
driver's port_phy_qcfg() call and then provide the link admin state.

If the FW does not have this capability, fall back to the existing
interception method.

The initial default link admin state (auto) is also set initially when
the VFs are created.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
Signed-off-by: Rob Miller <rmiller@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251126215648.1885936-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-27 18:59:29 -08:00