Pull networking deletions from Jakub Kicinski:
"Delete some obsolete networking code
Old code like amateur radio and NFC have long been a burden to core
networking developers. syzbot loves to find bugs in BKL-era code, and
noobs try to fix them.
If we want to have a fighting chance of surviving the LLM-pocalypse
this code needs to find a dedicated owner or get deleted. We've talked
about these deletions multiple times in the past and every time
someone wanted the code to stay. It is never very clear to me how many
of those people actually use the code vs are just nostalgic to see it
go. Amateur radio did have occasional users (or so I think) but most
users switched to user space implementations since its all super slow
stuff. Nobody stepped up to maintain the kernel code.
We were lucky enough to find someone who wants to help with NFC so
we're giving that a chance. Let's try to put the rest of this code
behind us"
* tag 'net-deletions' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next:
drivers: net: 8390: wd80x3: Remove this driver
drivers: net: 8390: ultra: Remove this driver
drivers: net: 8390: AX88190: Remove this driver
drivers: net: fujitsu: fmvj18x: Remove this driver
drivers: net: smsc: smc91c92: Remove this driver
drivers: net: smsc: smc9194: Remove this driver
drivers: net: amd: nmclan: Remove this driver
drivers: net: amd: lance: Remove this driver
drivers: net: 3com: 3c589: Remove this driver
drivers: net: 3com: 3c574: Remove this driver
drivers: net: 3com: 3c515: Remove this driver
drivers: net: 3com: 3c509: Remove this driver
net: packetengines: remove obsolete yellowfin driver and vendor dir
net: packetengines: remove obsolete hamachi driver
net: remove unused ATM protocols and legacy ATM device drivers
net: remove ax25 and amateur radio (hamradio) subsystem
net: remove ISDN subsystem and Bluetooth CMTP
caif: remove CAIF NETWORK LAYER
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter.
Steady stream of fixes. Last two weeks feel comparable to the two
weeks before the merge window. Lots of AI-aided bug discovery. A newer
big source is Sashiko/Gemini (Roman Gushchin's system), which points
out issues in existing code during patch review (maybe 25% of fixes
here likely originating from Sashiko). Nice thing is these are often
fixed by the respective maintainers, not drive-bys.
Current release - new code bugs:
- kconfig: MDIO_PIC64HPSC should depend on ARCH_MICROCHIP
Previous releases - regressions:
- add async ndo_set_rx_mode and switch drivers which we promised to
be called under the per-netdev mutex to it
- dsa: remove duplicate netdev_lock_ops() for conduit ethtool ops
- hv_sock: report EOF instead of -EIO for FIN
- vsock/virtio: fix MSG_PEEK calculation on bytes to copy
Previous releases - always broken:
- ipv6: fix possible UAF in icmpv6_rcv()
- icmp: validate reply type before using icmp_pointers
- af_unix: drop all SCM attributes for SOCKMAP
- netfilter: fix a number of bugs in the osf (OS fingerprinting)
- eth: intel: fix timestamp interrupt configuration for E825C
Misc:
- bunch of data-race annotations"
* tag 'net-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (148 commits)
rxrpc: Fix error handling in rxgk_extract_token()
rxrpc: Fix re-decryption of RESPONSE packets
rxrpc: Fix rxrpc_input_call_event() to only unshare DATA packets
rxrpc: Fix missing validation of ticket length in non-XDR key preparsing
rxgk: Fix potential integer overflow in length check
rxrpc: Fix conn-level packet handling to unshare RESPONSE packets
rxrpc: Fix potential UAF after skb_unshare() failure
rxrpc: Fix rxkad crypto unalignment handling
rxrpc: Fix memory leaks in rxkad_verify_response()
net: rds: fix MR cleanup on copy error
m68k: mvme147: Make me the maintainer
net: txgbe: fix firmware version check
selftests/bpf: check epoll readiness during reuseport migration
tcp: call sk_data_ready() after listener migration
vhost_net: fix sleeping with preempt-disabled in vhost_net_busy_poll()
ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim
tipc: fix double-free in tipc_buf_append()
llc: Return -EINPROGRESS from llc_ui_connect()
ipv4: icmp: validate reply type before using icmp_pointers
selftests/net: packetdrill: cover RFC 5961 5.2 challenge ACK on both edges
...
Similar to the hamachi driver, the yellowfin driver supports hardware
that is over two decades old and no longer in active use.
Since yellowfin was the last remaining driver in the packetengines
vendor directory, we can now safely remove the entire directory and
drop its associated references from the parent Kconfig and Makefile.
This eliminates dead code and reduces the overall maintenance burden
on the netdev subsystem.
Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260422044820.485660-3-25181214217@stu.xidian.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For the device SP, the firmware version is a 32-bit value where the
lower 20 bits represent the base version number. And the customized
firmware version populates the upper 12 bits with a specific
identification number.
For other devices AML 25G and 40G, the upper 12 bits of the firmware
version is always non-zero, and they have other naming conventions.
Only SP devices need to check this to tell if XPCS will work properly.
So the judgement of MAC type is added here.
And the original logic compared the entire 32-bit value against 0x20010,
which caused the outdated base firmwares bypass the version check
without a warning. Apply a mask 0xfffff to isolate the lower 20 bits for
an accurate base version comparison.
Fixes: ab928c24e6 ("net: txgbe: add FW version warning")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/C787AA5C07598B13+20260422071837.372731-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a mismatch between the passed arguments and the actual
nfp_encode_basic_qdr() function parameter names:
static int nfp_encode_basic_qdr(u64 addr, int dest_island, int cpp_tgt,
int mode, bool addr40, int isld1,
int isld0)
{
...
But "dest_island" and "cpp_tgt" are swapped at every call-site.
For example:
return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island,
mode, addr40, isld1, isld0);
As a result, nfp_encode_basic_qdr() receives "dest_island" as CPP target
type, which is always NFP_CPP_TARGET_QDR(2) for these calls, and "cpp_tgt"
as the destination island ID, which can accidentally match or be outside
the valid NFP_CPP_TARGET_* types (e.g. '-1' for any destination).
Since code already worked for years, also add extra pr_warn() to error
paths in nfp_encode_basic_qdr() to help identify any potential address
verification failures.
Detected using the static analysis tool - Svace.
Fixes: 4cb584e0ee ("nfp: add CPP access core")
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
Link: https://patch.msgid.link/20260422160536.61855-1-aleksei.kodanev@bell-sw.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the amateur radio (AX.25, NET/ROM, ROSE) protocol implementation
and all associated hamradio device drivers from the kernel tree.
This set of protocols has long been a huge bug/syzbot magnet,
and since nobody stepped up to help us deal with the influx
of the AI-generated bug reports we need to move it out of tree
to protect our sanity.
The code is moved to an out-of-tree repo:
https://github.com/linux-netdev/mod-orphan
if it's cleaned up and reworked there we can accept it back.
Minimal stub headers are kept for include/net/ax25.h (AX25_P_IP,
AX25_ADDR_LEN, ax25_address) and include/net/rose.h (ROSE_ADDR_LEN)
so that the conditional integration code in arp.c and tun.c continues
to compile and work when the out-of-tree modules are loaded.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Carlos Bilbao <carlos.bilbao@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260421021824.1293976-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Remove CAIF (Communication CPU to Application CPU Interface), the
ST-Ericsson modem protocol. The subsystem has been orphaned since 2013.
The last meaningful changes from the maintainers were in March 2013:
a8c7687bf2 ("caif_virtio: Check that vringh_config is not null")
b2273be8d2 ("caif_virtio: Use vringh_notify_enable correctly")
0d2e1a2926 ("caif_virtio: Introduce caif over virtio")
Not-so-coincidentally, according to "the Internet" ST-Ericsson officially
shut down its modem joint venture in Aug 2013.
If anyone is using this code please yell!
In the 13 years since, the code has accumulated 200 non-merge commits,
of which 71 were cross-tree API changes, 21 carried Fixes: tags, and
the remaining ~110 were cleanups, doc conversions, treewide refactors,
and one partial removal (caif_hsi, ca75bcf0a8).
We are still getting fixes to this code, in the last 10 days there were
3 reports on security@ about CAIF that I have been CCed on.
UAPI constants (AF_CAIF, ARPHRD_CAIF, N_CAIF, VIRTIO_ID_CAIF) and the
SELinux classmap entry are intentionally kept for ABI stability.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260416182829.1440262-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When netif_is_rxfh_configured() is true (i.e., the user has explicitly
configured the RSS indirection table), virtnet_set_queues() skips the
RSS update path and falls through to the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
command to change the number of queue pairs. However, it does not update
vi->rss_trailer.max_tx_vq to reflect the new queue_pairs value.
This causes a mismatch between vi->curr_queue_pairs and
vi->rss_trailer.max_tx_vq. Any subsequent RSS reconfiguration (e.g.,
via ethtool -X) calls virtnet_commit_rss_command(), which sends the
stale max_tx_vq to the device, silently reverting the queue count.
Reproduction:
1. User configured RSS
ethtool -X eth0 equal 8
2. VQ_PAIRS_SET path; max_tx_vq stays 16
ethtool -L eth0 combined 12
3. RSS commit uses max_tx_vq=16 instead of 12
ethtool -X eth0 equal 4
Fix this by updating vi->rss_trailer.max_tx_vq after a successful
VQ_PAIRS_SET command when RSS is enabled, keeping it in sync with
curr_queue_pairs.
Fixes: 50bfcaedd7 ("virtio_net: Update rss when set queue")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20260416212121.29073-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In mana_remove(), when a NULL port is encountered in the port iteration
loop, 'goto out' skips the mana_destroy_eq(ac) call, leaking the event
queues allocated earlier by mana_create_eq().
This can happen when mana_probe_port() fails for port 0, leaving
ac->ports[0] as NULL. On driver unload or error cleanup, mana_remove()
hits the NULL entry and jumps past mana_destroy_eq().
Change 'goto out' to 'break' so the for-loop exits normally and
mana_destroy_eq() is always reached. Remove the now-unreferenced out:
label.
Fixes: 1e2d0824a9 ("net: mana: Add support for EQ sharing")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Link: https://patch.msgid.link/20260420124741.1056179-6-ernis@linux.microsoft.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In mana_probe(), if mana_probe_port() fails for any port, the error
is stored in 'err' and the loop breaks. However, the subsequent
unconditional 'err = add_adev(gd, "eth")' overwrites this error.
If add_adev() succeeds, mana_probe() returns success despite ports
being left in a partially initialized state (ac->ports[i] == NULL).
Only call add_adev() when there is no prior error, so the probe
correctly fails and triggers mana_remove() cleanup.
Fixes: a69839d432 ("net: mana: Add support for auxiliary device")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Link: https://patch.msgid.link/20260420124741.1056179-5-ernis@linux.microsoft.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If PM resume fails (e.g., mana_attach() returns an error), mana_probe()
calls mana_remove(), which tears down the device and sets
gd->gdma_context = NULL and gd->driver_data = NULL.
However, a failed resume callback does not automatically unbind the
driver. When the device is eventually unbound, mana_remove() is invoked
a second time. Without a NULL check, it dereferences gc->dev with
gc == NULL, causing a kernel panic.
Add an early return if gdma_context or driver_data is NULL so the second
invocation is harmless. Move the dev = gc->dev assignment after the
guard so it cannot dereference NULL.
Fixes: 635096a86e ("net: mana: Support hibernation and kexec")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Link: https://patch.msgid.link/20260420124741.1056179-4-ernis@linux.microsoft.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Move INIT_DELAYED_WORK(gf_stats_work) to before mana_create_eq(),
while keeping schedule_delayed_work() at its original location.
Previously, if any function between mana_create_eq() and the
INIT_DELAYED_WORK call failed, mana_probe() would call mana_remove()
which unconditionally calls cancel_delayed_work_sync(gf_stats_work)
in __flush_work() or debug object warnings with
CONFIG_DEBUG_OBJECTS_WORK enabled.
Fixes: be4f1d67ec ("net: mana: Add standard counter rx_missed_errors")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Link: https://patch.msgid.link/20260420124741.1056179-3-ernis@linux.microsoft.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Move INIT_WORK(link_change_work) to right after the mana_context
allocation, before any error path that could reach mana_remove().
Previously, if mana_create_eq() or mana_query_device_cfg() failed,
mana_probe() would jump to the error path which calls mana_remove().
mana_remove() unconditionally calls disable_work_sync(link_change_work),
but the work struct had not been initialized yet. This can trigger
CONFIG_DEBUG_OBJECTS_WORK enabled.
Fixes: 54133f9b4b ("net: mana: Support HW link state events")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Link: https://patch.msgid.link/20260420124741.1056179-2-ernis@linux.microsoft.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
trim_newline() unconditionally dereferences s[len - 1] after computing
len = strnlen(s, maxlen). When the string is empty, len is 0 and the
expression underflows to s[(size_t)-1], reading (and potentially
writing) one byte before the buffer.
The two callers feed trim_newline() with the result of strscpy() from
configfs store callbacks (dev_name_store, userdatum_value_store).
configfs guarantees count >= 1 reaches the callback, but the byte
itself can be NUL: a userspace write(fd, "\0", 1) leaves the
destination empty after strscpy() and triggers the underflow. The OOB
write only fires if the adjacent byte happens to be '\n', so this is
not a security issue, but the access is undefined behaviour either way.
This pattern is commonly flagged by LLM-based code reviewers. While it
is not a security fix, the underlying access is undefined behaviour and
the change is small and self-contained, so it is a reasonable candidate
for the stable trees.
Guard the dereference on a non-zero length.
Fixes: ae001dc679 ("net: netconsole: move newline trimming to function")
Cc: stable@vger.kernel.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Gustavo Luiz Duarte <gustavold@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260420-netcons_trim_newline-v1-1-dc35889aeedf@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If airoha_qdma_init routine fails before airoha_qdma_tx_irq_init() runs
successfully for all TX NAPIs, airoha_qdma_cleanup() will
unconditionally runs netif_napi_del() on TX NAPIs, triggering a NULL
pointer dereference. Fix the issue relying on q_tx_irq size value to
check if the TX NAPIs is properly initialized in airoha_qdma_cleanup().
Moreover, run netif_napi_add_tx() just if irq_q queue is properly
allocated.
Fixes: 23020f0493 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260420-airoha_qdma_init_rx_queue-fix-v2-2-d99347e5c18d@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If queue entry or DMA descriptor list allocation fails in
airoha_qdma_init_rx_queue routine, airoha_qdma_cleanup() will trigger a
NULL pointer dereference running netif_napi_del() for RX queue NAPIs
since netif_napi_add() has never been executed to this particular RX NAPI.
The issue is due to the early ndesc initialization in
airoha_qdma_init_rx_queue() since airoha_qdma_cleanup() relies on ndesc
value to check if the queue is properly initialized. Fix the issue moving
ndesc initialization at end of airoha_qdma_init_tx routine.
Move page_pool allocation after descriptor list allocation in order to
avoid memory leaks if desc allocation fails.
Fixes: 23020f0493 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260420-airoha_qdma_init_rx_queue-fix-v2-1-d99347e5c18d@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The RTL8365MB_DIGITAL_INTERFACE_SELECT_MODE_MASK macro was shifting
the 4-bit mask (0xF) by only (_extint % 2) bits instead of
(_extint % 2) * 4. This caused the mask to overlap with the adjacent
nibble when configuring odd-numbered external interfaces, selecting
the wrong bits entirely.
Align the shift calculation with the existing ...MODE_OFFSET macro.
Fixes: 4af2950c50 ("net: dsa: realtek-smi: add rtl8365mb subdriver for RTL8365MB-VC")
Signed-off-by: Abdulkader Alrezej <alrazj.abdulkader@gmail.com>
Signed-off-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://patch.msgid.link/400a6387-a444-4576-af6d-26be5410bce3@yahoo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If queue entry list allocation fails in airoha_qdma_init_tx_queue routine,
airoha_qdma_cleanup_tx_queue() will trigger a NULL pointer dereference
accessing the queue entry array. The issue is due to the early ndesc
initialization in airoha_qdma_init_tx_queue(). Fix the issue moving ndesc
initialization at end of airoha_qdma_init_tx routine.
Fixes: 3f47e67dff ("net: airoha: Add the capability to consume out-of-order DMA tx descriptors")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260417-airoha_qdma_cleanup_tx_queue-fix-net-v4-1-e04bcc2c9642@kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The ice_ptp_read_tx_hwtstamp_status_eth56g function calls
ice_read_phy_eth56g with a PHY index. However the function actually expects
a port index. This causes the function to read the wrong PHY_PTP_INT_STATUS
registers, and effectively makes the status wrong for the second set of
ports from 4 to 7.
The ice_read_phy_eth56g function uses the provided port index to determine
which PHY device to read. We could refactor the entire chain to take a PHY
index, but this would impact many code sites. Instead, multiply the PHY
index by the number of ports, so that we read from the first port of each
PHY.
Fixes: 7cab44f1c3 ("ice: Introduce ETH56G PHY model for E825C products")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260420-jk-iwl-net-2026-04-20-ptp-e825c-phy-interrupt-fixes-v1-4-bc2240f42251@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The E800 hardware (apart from E810) has a ready bitmap for the PHY
indicating which timestamp slots currently have an outstanding timestamp
waiting to be read by software.
This bitmap is checked in multiple places using the
ice_get_phy_tx_tstamp_ready():
* ice_ptp_process_tx_tstamp() calls it to determine which timestamps to
attempt reading from the PHY
* ice_ptp_tx_tstamps_pending() calls it in a loop at the end of the
miscellaneous IRQ to check if new timestamps came in while the interrupt
handler was executing.
* ice_ptp_maybe_trigger_tx_interrupt() calls it in the auxiliary work task
to trigger a software interrupt in the event that the hardware logic
gets stuck.
For E82X devices, multiple PHYs share the same block, and the parameter
passed to the ready bitmap is a block number associated with the given
port. For E825-C devices, the PHYs have their own independent blocks and do
not share, so the parameter passed needs to be the port number. For E810
devices, the ice_get_phy_tx_tstamp_ready() always returns all 1s regardless
of what port, since this hardware does not have a ready bitmap. Finally,
for E830 devices, each PF has its own ready bitmap accessible via register,
and the block parameter is unused.
The first call correctly uses the Tx timestamp tracker block parameter to
check the appropriate timestamp block. This works because the tracker is
setup correctly for each timestamp device type.
The second two callers behave incorrectly for all device types other than
the older E822 devices. They both iterate in a loop using
ICE_GET_QUAD_NUM() which is a macro only used by E822 devices. This logic
is incorrect for devices other than the E822 devices.
For E810 the calls would always return true, causing E810 devices to always
attempt to trigger a software interrupt even when they have no reason to.
For E830, this results in duplicate work as the ready bitmap is checked
once per number of quads. Finally, for E825-C, this results in the pending
checks failing to detect timestamps on ports other than the first two.
Fix this by introducing a new hardware API function to ice_ptp_hw.c,
ice_check_phy_tx_tstamp_ready(). This function will check if any timestamps
are available and returns a positive value if any timestamps are pending.
For E810, the function always returns false, so that the re-trigger checks
never happen. For E830, check the ready bitmap just once. For E82x
hardware, check each quad. Finally, for E825-C, check every port.
The interface function returns an integer to enable reporting of error code
if the driver is unable read the ready bitmap. This enables callers to
handle this case properly. The previous implementation assumed that
timestamps are available if they failed to read the bitmap. This is
problematic as it could lead to continuous software IRQ triggering if the
PHY timestamp registers somehow become inaccessible.
This change is especially important for E825-C devices, as the missing
checks could leave a window open where a new timestamp could arrive while
the existing timestamps aren't completed. As a result, the hardware
threshold logic would not trigger a new interrupt. Without the check, the
timestamp is left unhandled, and new timestamps will not cause an interrupt
again until the timestamp is handled. Since both the interrupt check and
the backup check in the auxiliary task do not function properly, the device
may have Tx timestamps permanently stuck failing on a given port.
The faulty checks originate from commit d938a8cca8 ("ice: Auxbus devices
& driver for E822 TS") and commit 712e876371 ("ice: periodically kick Tx
timestamp interrupt"), however at the time of the original coding, both
functions only operated on E822 hardware. This is no longer the case, and
hasn't been since the introduction of the ETH56G PHY model in commit
7cab44f1c3 ("ice: Introduce ETH56G PHY model for E825C products")
Fixes: 7cab44f1c3 ("ice: Introduce ETH56G PHY model for E825C products")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260420-jk-iwl-net-2026-04-20-ptp-e825c-phy-interrupt-fixes-v1-3-bc2240f42251@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In some cases the PHY timestamp block of the E825C can become stuck. This
is known to occur if the software writes 0 to the Tx timestamp threshold,
and with older versions of the ice driver the threshold configuration is
buggy and can race in such that hardware briefly operates with a zero
threshold enabled. There are no other known ways to trigger this behavior,
but once it occurs, the hardware is not recovered by normal reset, a driver
reload, or even a warm power cycle of the system. A cold power cycle is
sufficient to recover hardware, but this is extremely invasive and can
result in significant downtime on customer deployments.
The PHY for each port has a timestamping block which has its own reset
functionality accessible by programming the PHY_REG_GLOBAL register.
Writing to the PHY_REG_GLOBAL_SOFT_RESET_BIT triggers the hardware to
perform a complete reset of the timestamping block of the PHY. This
includes clearing the timestamp status for the port, clearing all
outstanding timestamps in the memory bank, and resetting the PHY timer.
The new ice_ptp_phy_soft_reset_eth56g() function toggles the
PHY_REG_GLOBAL soft reset bit with the required delays, ensuring the
PHY is properly reinitialized without requiring a full device reset.
The sequence clears the reset bit, asserts it, then clears it again,
with short waits between transitions to allow hardware stabilization.
Call this function in the new ice_ptp_init_phc_e825c(), implementing the
E825C device specific variant of the ice_ptp_init_phc(). Note that if
ice_ptp_init_phc() fails, PTP functionality may be disabled, but the driver
will still load to allow basic functionality to continue.
This causes the clock owning PF driver to perform a PHY soft reset for
every port during initialization. This ensures the driver begins life in a
known functional state regardless of how it was previously programmed.
This ensures that we properly reconfigure the hardware after a device reset
or when loading the driver, even if it was previously misconfigured with an
out-of-date or modified driver.
Fixes: 7cab44f1c3 ("ice: Introduce ETH56G PHY model for E825C products")
Signed-off-by: Timothy Miskell <timothy.miskell@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260420-jk-iwl-net-2026-04-20-ptp-e825c-phy-interrupt-fixes-v1-2-bc2240f42251@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The E825C ice_phy_cfg_intr_eth56g() function is responsible for programming
the PHY interrupt for a given port. This function writes to the
PHY_REG_TS_INT_CONFIG register of the port. The register is responsible for
configuring whether the port interrupt logic is enabled, as well as
programming the threshold of waiting timestamps that will trigger an
interrupt from this port.
This threshold value must not be programmed to zero while the interrupt is
enabled. Doing so puts the port in a misconfigured state where the PHY
timestamp interrupt for the quad of connected ports will become stuck.
This occurs, because a threshold of zero results in the timestamp interrupt
status for the port becoming stuck high. The four ports in the connected
quad have their timestamp status indicators muxed together. A new interrupt
cannot be generated until the timestamp status indicators return low for
all four ports.
Normally, the timestamp status for a port will clear once there are fewer
timestamps in that ports timestamp memory bank than the threshold. A
threshold of zero makes this impossible, so the timestamp status for the
port does not clear.
The ice driver never intentionally programs the threshold to zero, indeed
the driver always programs it to a value of 1, intending to get an
interrupt immediately as soon as even a single packet is waiting for a
timestamp.
However, there is a subtle flaw in the programming logic in the
ice_phy_cfg_intr_eth56g() function. Due to the way that the hardware
handles enabling the PHY interrupt. If the threshold value is modified at
the same time as the interrupt is enabled, the HW PHY state machine might
enable the interrupt before the new threshold value is actually updated.
This leaves a potential race condition caused by the hardware logic where
a PHY timestamp interrupt might be triggered before the non-zero threshold
is written, resulting in the PHY timestamp logic becoming stuck.
Once the PHY timestamp status is stuck high, it will remain stuck even
after attempting to reprogram the PHY block by changing its threshold or
disabling the interrupt. Even a typical PF or CORE reset will not reset the
particular block of the PHY that becomes stuck. Even a warm power cycle is
not guaranteed to cause the PHY block to reset, and a cold power cycle is
required.
Prevent this by always writing the PHY_REG_TS_INT_CONFIG in two stages.
First write the threshold value with the interrupt disabled, and only write
the enable bit after the threshold has been programmed. When disabling the
interrupt, leave the threshold unchanged. Additionally, re-read the
register after writing it to guarantee that the write to the PHY has been
flushed upon exit of the function.
While we're modifying this function implementation, explicitly reject
programming a threshold of 0 when enabling the interrupt. No caller does
this today, but the consequences of doing so are significant. An explicit
rejection in the code makes this clear.
Fixes: 7cab44f1c3 ("ice: Introduce ETH56G PHY model for E825C products")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260420-jk-iwl-net-2026-04-20-ptp-e825c-phy-interrupt-fixes-v1-1-bc2240f42251@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When QoS loopback is enabled for GDM3 or GDM4, incoming packets are
forwarded to GDM2. However, the PPE cpu port for GDM2 is not configured
in this path, causing traffic originating from GDM3/GDM4, which may
be set up as WAN ports backed by QDMA1, to be incorrectly directed
to QDMA0 instead.
Configure the PPE cpu port for GDM2 when QoS loopback is active on
GDM3 or GDM4 to ensure traffic is routed to the correct QDMA instance.
Fixes: 9cd451d414 ("net: airoha: Add loopback support for GDM2")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260417-airoha-ppe-cpu-port-for-gdm2-loopback-v1-1-c7a9de0f6f57@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Convert mlx5 from ndo_set_rx_mode to ndo_set_rx_mode_async. The
driver's mlx5e_set_rx_mode now receives uc/mc snapshots and calls
mlx5e_fs_set_rx_mode_work directly instead of queueing work.
mlx5e_sync_netdev_addr and mlx5e_handle_netdev_addr now take
explicit uc/mc list parameters and iterate with
netdev_hw_addr_list_for_each instead of netdev_for_each_{uc,mc}_addr.
Fallback to netdev's uc/mc in a few places and grab addr lock.
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-7-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Convert fbnic from ndo_set_rx_mode to ndo_set_rx_mode_async. The
driver's __fbnic_set_rx_mode() now takes explicit uc/mc list
parameters and uses __hw_addr_sync_dev() on the snapshots instead
of __dev_uc_sync/__dev_mc_sync on the netdev directly.
Update callers in fbnic_up, fbnic_fw_config_after_crash,
fbnic_bmc_rpc_check and fbnic_set_mac to pass the real address
lists calling __fbnic_set_rx_mode outside the async work path.
Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: kernel-team@meta.com
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-6-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
slhc_uncompress() parses a VJ-compressed TCP header by advancing a
pointer through the packet via decode() and pull16(). Neither helper
bounds-checks against isize, and decode() masks its return with
& 0xffff so it can never return the -1 that callers test for -- those
error paths are dead code.
A short compressed frame whose change byte requests optional fields
lets decode() read past the end of the packet. The over-read bytes
are folded into the cached cstate and reflected into subsequent
reconstructed packets.
Make decode() and pull16() take the packet end pointer and return -1
when exhausted. Add a bounds check before the TCP-checksum read.
The existing == -1 tests now do what they were always meant to.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/netdev/20260414134126.758795-2-horms@kernel.org/
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260416100147.531855-5-bestswngs@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
slhc_init() accepts rslots == 0 as a valid configuration, with the
documented meaning of 'no receive compression'. In that case the
allocation loop in slhc_init() is skipped, so comp->rstate stays
NULL and comp->rslot_limit stays 0 (from the kzalloc of struct
slcompress).
The receive helpers do not defend against that configuration.
slhc_uncompress() dereferences comp->rstate[x] when the VJ header
carries an explicit connection ID, and slhc_remember() later assigns
cs = &comp->rstate[...] after only comparing the packet's slot number
to comp->rslot_limit. Because rslot_limit is 0, slot 0 passes the
range check, and the code dereferences a NULL rstate.
The configuration is reachable in-tree through PPP. PPPIOCSMAXCID
stores its argument in a signed int, and (val >> 16) uses arithmetic
shift. Passing 0xffff0000 therefore sign-extends to -1, so val2 + 1
is 0 and ppp_generic.c ends up calling slhc_init(0, 1). Because
/dev/ppp open is gated by ns_capable(CAP_NET_ADMIN), the whole path
is reachable from an unprivileged user namespace. Once the malformed
VJ state is installed, any inbound VJ-compressed or VJ-uncompressed
frame that selects slot 0 crashes the kernel in softirq context:
Oops: general protection fault, probably for non-canonical
address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:slhc_uncompress (drivers/net/slip/slhc.c:519)
Call Trace:
<TASK>
ppp_receive_nonmp_frame (drivers/net/ppp/ppp_generic.c:2466)
ppp_input (drivers/net/ppp/ppp_generic.c:2359)
ppp_async_process (drivers/net/ppp/ppp_async.c:492)
tasklet_action_common (kernel/softirq.c:926)
handle_softirqs (kernel/softirq.c:623)
run_ksoftirqd (kernel/softirq.c:1055)
smpboot_thread_fn (kernel/smpboot.c:160)
kthread (kernel/kthread.c:436)
ret_from_fork (arch/x86/kernel/process.c:164)
</TASK>
Reject the receive side on such instances instead of touching rstate.
slhc_uncompress() falls through to its existing 'bad' label, which
bumps sls_i_error and enters the toss state. slhc_remember() mirrors
that with an explicit sls_i_error increment followed by slhc_toss();
the sls_i_runt counter is not used here because a missing rstate is
an internal configuration state, not a runt packet.
The transmit path is unaffected: the only in-tree caller that picks
rslots from userspace (ppp_generic.c) still supplies tslots >= 1, and
slip.c always calls slhc_init(16, 16), so comp->tstate remains valid
and slhc_compress() continues to work.
Fixes: 4ab42d78e3 ("ppp, slip: Validate VJ compression slot parameters completely")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260415204130.258866-2-bestswngs@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
gtp_genl_send_echo_req() runs as a generic netlink doit handler in
process context with BH not disabled. It calls udp_tunnel_xmit_skb(),
which eventually invokes iptunnel_xmit() — that uses __this_cpu_inc/dec
on softnet_data.xmit.recursion to track the tunnel xmit recursion level.
Without local_bh_disable(), the task may migrate between
dev_xmit_recursion_inc() and dev_xmit_recursion_dec(), breaking the
per-CPU counter pairing. The result is stale or negative recursion
levels that can later produce false-positive
SKB_DROP_REASON_RECURSION_LIMIT drops on either CPU.
The other udp_tunnel_xmit_skb() call sites in gtp.c are unaffected:
the data path runs under ndo_start_xmit and the echo response handlers
run from the UDP encap rx softirq, both with BH already disabled.
Fix it by disabling BH around the udp_tunnel_xmit_skb() call, mirroring
commit 2cd7e6971f ("sctp: disable BH before calling
udp_tunnel_xmit_skb()").
Fixes: 6f1a9140ec ("net: add xmit recursion limit to tunnel xmit functions")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260417055408.4667-1-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>