Factor out handling a single nv from fbnic_disable() to make
it reusable for queue ops. Use a __ prefix for the factored
out code. The real fbnic_nv_disable() which will include
fbnic_wrfl() will be added with the qops, to avoid unused
function warnings.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250901211214.1027927-7-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Use netmem_ref instead of struct page pointer in prep for
unreadable memory. fbnic has separate free buffer submission
queues for headers and for data. Refactor the helper which
returns page pointer for a submission buffer to take the
high level queue container, create a separate handler
for header and payload rings. This ties the "upcast" from
netmem to system page to use of sub0 which we know has
system pages.
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250901211214.1027927-5-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
page pools are now at the ring level, move page pool alloc
to fbnic_alloc_rx_qt_resources(), and freeing to
fbnic_free_qt_resources().
This significantly simplifies fbnic_alloc_napi_vector() error
handling, by removing a late failure point.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250901211214.1027927-4-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Move rxq_info and mem model registration from fbnic_alloc_napi_vector()
and fbnic_alloc_nv_resources() to fbnic_alloc_rx_qt_resources().
The rxq_info is now registered later in the process, but that
should not cause any issues.
rxq_info lives in the fbnic_q_triad (qt) struct so qt init is a more
natural place. Encapsulating the logic in the qt functions will also
allow simplifying the cleanup in the NAPI related alloc functions
in the next commit.
Rx does not have a dedicated fbnic_free_rx_qt_resources(),
but we can use xdp_rxq_info_is_reg() to tell whether given
rxq_info was in use (effectively - if it's a qt for an Rx queue).
Having to pass nv into fbnic_alloc_rx_qt_resources() is not
great in terms of layering, but that's temporary, pp will
move soon..
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250901211214.1027927-3-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In preparation for memory providers we need a closer association
between queues and page pools. We used to have a page pool at the
NAPI level to serve all associated queues but with MP the queues
under a NAPI may no longer be created equal.
The "ring" structure in fbnic is a descriptor ring. We have separate
"rings" for payload and header pages ("to device"), as well as a ring
for completions ("from device"). Technically we only need the page
pool pointers in the "to device" rings, so adding the pointer to
the ring struct is a bit wasteful. But it makes passing the structures
around much easier.
For now both "to device" rings store a pointer to the same
page pool. Using more than one queue per NAPI is extremely rare
so don't bother trying to share a single page pool between queues.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250901211214.1027927-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add hardware offloading for L2 switching on R-Car S4.
On S4 brdev is limited to one per-device (not per port). Reasoning
is that hw L2 forwarding support lacks any sort of source port based
filtering, which makes it unusable to offload more than one bridge
device. Either you allow hardware to forward destination MAC to a
port, or you have to send it to CPU. You can't make it forward only
if src and dst ports are in the same brdev.
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Michael Dege <michael.dege@renesas.com>
Link: https://patch.msgid.link/20250901-add_l2_switching-v5-3-5f13e46860d5@renesas.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
It has the same PTP IP block as lan8814, only the number of GPIOs is
different, all the other functionality is the same. So reuse the same
functions as lan8814 for lan8842.
There is a revision of lan8842 called lan8832 which doesn't have the PTP
IP block. So make sure in that case the PTP is not initialized.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250902121832.3258544-3-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case the time arguments used for flexible PPS signal generation are in
the past, consider the arguments to be a time offset relative to the MAC
system time.
This way, past time use case is handled and it avoids the tedious work
of passing an absolute time value for the flexible PPS signal generation
while not breaking existing scripts that may rely on this behavior.
Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Link: https://patch.msgid.link/20250901-relative_flex_pps-v4-2-b874971dfe85@foss.st.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We can safely ignore SerDes interface modes 1000Base-X, 2500Base-X and
SGMII in phylink_mac_config() as they are being taken care of by the PCS
and the SGMII port anyway doesn't have MII_CFG and MII_PCDU registers
and hence gswip_phylink_mac_config() is already a no-op apart from
outputing a misleading error message.
Return early in case of SerDes interface modes to avoid printing that
error message.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/dcb066d6a02e6340314b5ff4f73937757a4f8eb3.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In macb_taprio_setup_replace(), the value of start_time is being
compared against zero which would never be true since start_time
is an unsigned value. Due to this there is a chance that an
incorrect config base time value can be used for computation.
Fix by checking the value of conf->base_time directly.
This issue was reported by static coverity analyzer.
Fixes: 89934dbf16 ("net: macb: Add TAPRIO traffic scheduling support")
Signed-off-by: Chandra Mohan Sundar <chandramohan.explore@gmail.com>
Reviewed-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com>
Link: https://patch.msgid.link/20250901162923.627765-1-chandramohan.explore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the implementation of use_carrier, the link monitoring
method that utilizes ethtool or ioctl to determine the link state of an
interface in a bond. Bonding will always behaves as if use_carrier=1,
which relies on netif_carrier_ok() to determine the link state of
interfaces.
To avoid acquiring RTNL many times per second, bonding inspects
link state under RCU, but not under RTNL. However, ethtool
implementations in drivers may sleep, and therefore this strategy is
unsuitable for use with calls into driver ethtool functions.
The use_carrier option was introduced in 2003, to provide
backwards compatibility for network device drivers that did not support
the then-new netif_carrier_ok/on/off system. Device drivers are now
expected to support netif_carrier_*, and the use_carrier backwards
compatibility logic is no longer necessary.
The option itself remains, but when queried always returns 1,
and may only be set to 1.
Link: https://lore.kernel.org/000000000000eb54bf061cfd666a@google.com
Link: https://lore.kernel.org/20240718122017.d2e33aaac43a.I10ab9c9ded97163aef4e4de10985cd8f7de60d28@changeid
Signed-off-by: Jay Vosburgh <jv@jvosburgh.net>
Reported-by: syzbot+b8c48ea38ca27d150063@syzkaller.appspotmail.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/2029487.1756512517@famine
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Adjacent vfs get their devlink port information from firmware,
use the information (pfnum, function id) from FW when populating the
devlink port attributes.
Before:
$ devlink port show
pci/0000:00:03.0/180225: type eth netdev eth0 flavour pcivf controller 0 pfnum 0 vfnum 49152 external false splittable false
function:
hw_addr 00:00:00:00:00:00
After:
$ devlink port show
pci/0000:00:03.0/180225: type eth netdev enp0s3npf0vf2 flavour pcivf controller 0 pfnum 0 vfnum 2 external false splittable false
function:
hw_addr 00:00:00:00:00:00
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-7-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Adding driver support to query adjacent functions vports, AKA
delegated vports.
Adjacent functions can delegate their sriov vfs to other sibling PF in
the system, to be managed by the eswitch capable sibling PF.
E.g, ECPF to Host PF, multi host PF between each other, etc.
Only supported in switchdev mode.
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-4-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Move the loop that creates the vports ACLs root name spaces to eswitch,
since it is the eswitch responsibility to decide when and how many
vports ACLs root namespaces to create, in the next patches we will use
the fs_core vport ACL root namespace APIs to create/remove root ns
ACLs dynamically for dynamically created vports.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-3-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Before this patch it was a linear array and could only support a certain
number of vports, in the next patches, vport numbers are not bound to a
well known limit, thus convert acl root name space storage to xarray.
In addition create fs_core public API to add/remove vport acl namespaces
as it is the eswitch responsibility to create the vports and their
root name spaces for acls, in the next patch we will move
mlx5_fs_ingress_acls_{init,cleanup} to eswitch and will use
the individual mlx5_fs_vport_{egress,ingresS}_acl_ns_{add,remove}
APIs for dynamically create vports.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-2-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Regarding PTP, ENETC v4 has some changes compared to ENETC v1 (LS1028A),
mainly as follows.
1. ENETC v4 uses a different PTP driver, so the way to get phc_index is
different from LS1028A. Therefore, enetc_get_ts_info() has been modified
appropriately to be compatible with ENETC v1 and v4.
2. The PMa_SINGLE_STEP register has changed in ENETC v4, not only the
register offset, but also some register fields. Therefore, two helper
functions are added, enetc_set_one_step_ts() for ENETC v1 and
enetc4_set_one_step_ts() for ENETC v4.
3. Since the generic helper functions from ptp_clock are used to get
the PHC index of the PTP clock, so FSL_ENETC_CORE depends on Kconfig
symbol "PTP_1588_CLOCK_OPTIONAL". But FSL_ENETC_CORE can only be
selected, so add the dependency to FSL_ENETC, FSL_ENETC_VF and
NXP_ENETC4. Perhaps the best approach would be to change FSL_ENETC_CORE
to a visible menu entry. Then make FSL_ENETC, FSL_ENETC_VF, and
NXP_ENETC4 depend on it, but this is not the goal of this patch, so this
may be changed in the future.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-14-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Move sync packet content modification before dma_map_single() to follow
correct DMA usage process, even though the previous sequence worked due
to hardware DMA-coherence support (LS1028A). But for the upcoming i.MX95,
its ENETC (v4) does not support "dma-coherent", so this step is very
necessary. Otherwise, the originTimestamp and correction fields of the
sent packets will still be the values before the modification.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-13-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Move PTP Sync packet processing from enetc_map_tx_buffs() to a new helper
function enetc_update_ptp_sync_msg() to simplify the original function.
Prepare for upcoming ENETC v4 one-step support. There is no functional
change. It is worth mentioning that ENETC_TXBD_TSTAMP is added to replace
0x3fffffff.
Prepare for upcoming ENETC v4 one-step support.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-11-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, the Tx PTP packets are parsed twice in the enetc driver, once
in enetc_xmit() and once in enetc_map_tx_buffs(). The latter is duplicate
and is unnecessary, since the parsed information can be saved to skb->cb
so that enetc_map_tx_buffs() can get the previously parsed data from
skb->cb. Therefore, add struct enetc_skb_cb as the format of the data
in the skb->cb buffer to save the parsed information of PTP packet. Use
saved information in enetc_map_tx_buffs() to avoid parsing data again.
In addition, rename variables offset1 and offset2 in enetc_map_tx_buffs()
to corr_off and tstamp_off for better readability.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-10-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
TCP tracks the number of orphaned (SOCK_DEAD but not yet destructed)
sockets in tcp_orphan_count.
In some code that was shared with DCCP, tcp_orphan_count is referenced
via sk->sk_prot->orphan_count.
Let's reference tcp_orphan_count directly.
inet_csk_prepare_for_destroy_sock() is moved to inet_connection_sock.c
due to header dependency.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250829215641.711664-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce support for standard MII ioctl operations in the LAN865x
Ethernet driver by implementing the .ndo_eth_ioctl callback. This allows
PHY-related ioctl commands to be handled via phy_do_ioctl_running() and
enables support for ethtool and other user-space tools that rely on ioctl
interface to perform PHY register access using commands like SIOCGMIIREG
and SIOCSMIIREG.
This feature enables improved diagnostics and PHY configuration
capabilities from userspace.
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250828114549.46116-1-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that PPPoE sockets are freed via RCU (SOCK_RCU_FREE), it is no longer
necessary to take a reference count when looking up sockets on the receive
path. Readers are protected by RCU, so the socket memory remains valid
until after a grace period.
Convert fast-path lookups to avoid refcounting:
- Replace get_item() and sk_receive_skb() in pppoe_rcv() with
__get_item() and __sk_receive_skb().
- Rework get_item_by_addr() into __get_item_by_addr() (no refcount and
move RCU lock into pppoe_ioctl)
- Remove unnecessary sock_put() calls.
This avoids cacheline bouncing from atomic reference counting and improves
performance on the receive fast path.
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250828012018.15922-2-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Like ppp_generic.c, convert the PPPoE socket hash table to use RCU for
lookups and a spinlock for updates. This removes rwlock usage and allows
lockless readers on the fast path.
- Mark hash table and list pointers as __rcu.
- Use spin_lock() to protect writers.
- Readers use rcu_dereference() under rcu_read_lock(). All known callers
of get_item() already hold the RCU read lock, so no additional locking
is needed.
- get_item() now uses refcount_inc_not_zero() instead of sock_hold() to
safely take a reference. This prevents crashes if a socket is already
in the process of being freed (sk_refcnt == 0).
- Set SOCK_RCU_FREE to defer socket freeing until after an RCU grace
period.
- Move skb_queue_purge() into sk_destruct callback to ensure purge
happens after an RCU grace period.
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250828012018.15922-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR (net-6.17-rc4).
No conflicts.
Adjacent changes:
drivers/net/ethernet/intel/idpf/idpf_txrx.c
02614eee26 ("idpf: do not linearize big TSO packets")
6c4e684802 ("idpf: remove obsolete stashing code")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth.
Current release - regressions:
- ipv4: fix regression in local-broadcast routes
- vsock: fix error-handling regression introduced in v6.17-rc1
Previous releases - regressions:
- bluetooth:
- mark connection as closed during suspend disconnect
- fix set_local_name race condition
- eth:
- ice: fix NULL pointer dereference on reset
- mlx5: fix memory leak in hws_pool_buddy_init error path
- bnxt_en: fix stats context reservation logic
- hv: fix loss of receive events from host during channel open
Previous releases - always broken:
- page_pool: fix incorrect mp_ops error handling
- sctp: initialize more fields in sctp_v6_from_sk()
- eth:
- octeontx2-vf: fix max packet length errors
- idpf: fix Tx flow scheduling to avoid Tx timeouts
- bnxt_en: fix memory corruption during ifdown
- ice: fix incorrect counter for buffer allocation failures
- mlx5: fix lockdep assertion on sync reset unload event
- fbnic: fixup rtnl_lock and devl_lock handling
- xgmac: do not enable RX FIFO overflow interrupts
- phy: mscc: fix when PTP clock is register and unregister
Misc:
- add Telit Cinterion LE910C4-WWX new compositions"
* tag 'net-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits)
net: ipv4: fix regression in local-broadcast routes
net: macb: Disable clocks once
fbnic: Move phylink resume out of service_task and into open/close
fbnic: Fixup rtnl_lock and devl_lock handling related to mailbox code
net: rose: fix a typo in rose_clear_routes()
l2tp: do not use sock_hold() in pppol2tp_session_get_sock()
sctp: initialize more fields in sctp_v6_from_sk()
MAINTAINERS: rmnet: Update email addresses
net: rose: include node references in rose_neigh refcount
net: rose: convert 'use' field to refcount_t
net: rose: split remove and free operations in rose_remove_neigh()
net: hv_netvsc: fix loss of early receive events from host during channel open.
net: stmmac: Set CIC bit only for TX queues with COE
net: stmmac: xgmac: Correct supported speed modes
net: stmmac: xgmac: Do not enable RX FIFO Overflow interrupts
net/mlx5e: Set local Xoff after FW update
net/mlx5e: Update and set Xon/Xoff upon port speed set
net/mlx5e: Update and set Xon/Xoff upon MTU set
net/mlx5: Prevent flow steering mode changes in switchdev mode
net/mlx5: Nack sync reset when SFs are present
...
Tony Nguyen says:
====================
ice: split ice_virtchnl.c git-blame friendly way
Przemek Kitszel says:
Split ice_virtchnl.c into two more files (+headers), in a way
that git-blame works better.
Then move virtchnl files into a new subdir.
No logic changes.
I have developed (or discovered ;)) how to split a file in a way that
both old and new are nice in terms of git-blame
There was not much discussion on [RFC], so I would like to propose
to go forward with this approach.
There are more commits needed to have it nice, so it forms a git-log vs
git-blame tradeoff, but (after the brief moment that this is on the top)
we spend orders of magnitude more time looking at the blame output (and
commit messages linked from that) - so I find it much better to see
actual logic changes instead of "move xx to yy" stuff (typical for
"squashed/single-commit splits").
Cherry-picks/rebases work the same with this method as with simple
"squashed/single-commit" approach (literally all commits squashed into
one (to have better git-log, but shitty git-blame output).
Rationale for the split itself is, as usual, "file is big and we want to
extend it".
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: finish virtchnl.c split into rss.c
ice: extract virt/rss.c: cleanup - p2
ice: extract virt/rss.c: cleanup - p1
ice: split RSS stuff out of virtchnl.c - copy back
ice: split RSS stuff out of virtchnl.c - tmp rename
ice: finish virtchnl.c split into queues.c
ice: extract virt/queues.c: cleanup - p3
ice: extract virt/queues.c: cleanup - p2
ice: extract virt/queues.c: cleanup - p1
ice: split queue stuff out of virtchnl.c - copy back
ice: split queue stuff out of virtchnl.c - tmp rename
ice: add virt/ and move ice_virtchnl* files there
====================
Link: https://patch.msgid.link/20250827224641.415806-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mlx5 has a Kconfig co-dependency on VXLAN, even tho it doesn't
call any VXLAN function (unlike mlxsw). Perhaps this dates back
to very old days when tunnel ports were fetched directly from
VXLAN.
Remove the dependency to allow MLX5=y + VXLAN=m kernel configs.
But still avoid compiling in the lib/vxlan code if VXLAN=n.
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20250827234319.3504852-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The C45 accessors were setting the GR (register number) field twice,
once with the 16-bit register address truncated to five bits, and
then overwritten with the C45 devad. This is harmless since the field
was being cleared prior to being updated with the C45 devad, except
for the extra work.
Remove the redundant code.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/E1urGBn-00000000DCH-3swS@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
stmmac_bus_clks_config() doesn't need to repeatedly on dereference
priv->plat as this remains the same throughout this function. Not only
does this detract from the function's readability, but it could cause
the value to be reloaded each time. Use a local variable.
Also, the final return can simply return zero, and we can dispense
with the initialiser for 'ret'.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/E1urBvf-000000002ii-37Ce@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>