Commit Graph

1335140 Commits

Author SHA1 Message Date
Ido Schimmel
fb2f449eca vxlan: Refresh FDB 'updated' time upon user space updates
When a host migrates to a different remote and a packet is received from
the new remote, the corresponding FDB entry is updated and its 'updated'
time is refreshed.

However, when user space replaces the remote of an FDB entry, its
'updated' time is not refreshed:

 # ip link add name vx1 up type vxlan id 10010 dstport 4789
 # bridge fdb add 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1
 # sleep 10
 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]'
 10
 # bridge fdb replace 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.2
 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]'
 10

This can lead to the entry being aged out prematurely and it is also
inconsistent with the bridge driver:

 # ip link add name br1 up type bridge
 # ip link add name swp1 master br1 up type dummy
 # ip link add name swp2 master br1 up type dummy
 # bridge fdb add 00:11:22:33:44:55 dev swp1 master dynamic vlan 1
 # sleep 10
 # bridge -s -j fdb get 00:11:22:33:44:55 br br1 vlan 1 | jq '.[]["updated"]'
 10
 # bridge fdb replace 00:11:22:33:44:55 dev swp2 master dynamic vlan 1
 # bridge -s -j fdb get 00:11:22:33:44:55 br br1 vlan 1 | jq '.[]["updated"]'
 0

Adjust the VXLAN driver to refresh the 'updated' time of an FDB entry
whenever one of its attributes is changed by user space:

 # ip link add name vx1 up type vxlan id 10010 dstport 4789
 # bridge fdb add 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1
 # sleep 10
 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]'
 10
 # bridge fdb replace 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.2
 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]'
 0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250204145549.1216254-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 18:53:57 -08:00
Ido Schimmel
40a9994f2f vxlan: Refresh FDB 'updated' time upon 'NTF_USE'
The 'NTF_USE' flag can be used by user space to refresh FDB entries so
that they will not age out. Currently, the VXLAN driver implements it by
refreshing the 'used' field in the FDB entry as this is the field
according to which FDB entries are aged out.

Subsequent patches will switch the VXLAN driver to age out entries based
on the 'updated' field. Prepare for this change by refreshing the
'updated' field upon 'NTF_USE'. This is consistent with the bridge
driver's FDB:

 # ip link add name br1 up type bridge
 # ip link add name swp1 master br1 up type dummy
 # bridge fdb add 00:11:22:33:44:55 dev swp1 master dynamic vlan 1
 # sleep 10
 # bridge fdb replace 00:11:22:33:44:55 dev swp1 master dynamic vlan 1
 # bridge -s -j fdb get 00:11:22:33:44:55 br br1 vlan 1 | jq '.[]["updated"]'
 10
 # sleep 10
 # bridge fdb replace 00:11:22:33:44:55 dev swp1 master use dynamic vlan 1
 # bridge -s -j fdb get 00:11:22:33:44:55 br br1 vlan 1 | jq '.[]["updated"]'
 0

Before:

 # ip link add name vx1 up type vxlan id 10010 dstport 4789
 # bridge fdb add 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1
 # sleep 10
 # bridge fdb replace 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1
 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]'
 10
 # sleep 10
 # bridge fdb replace 00:11:22:33:44:55 dev vx1 self use dynamic dst 198.51.100.1
 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]'
 20

After:

 # ip link add name vx1 up type vxlan id 10010 dstport 4789
 # bridge fdb add 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1
 # sleep 10
 # bridge fdb replace 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1
 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]'
 10
 # sleep 10
 # bridge fdb replace 00:11:22:33:44:55 dev vx1 self use dynamic dst 198.51.100.1
 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]'
 0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250204145549.1216254-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 18:53:57 -08:00
Ido Schimmel
c4f2082bf6 vxlan: Always refresh FDB 'updated' time when learning is enabled
Currently, when learning is enabled and a packet is received from the
expected remote, the 'updated' field of the FDB entry is not refreshed.
This will become a problem when we switch the VXLAN driver to age out
entries based on the 'updated' field.

Solve this by always refreshing an FDB entry when we receive a packet
with a matching source MAC address, regardless if it was received via
the expected remote or not as it indicates the host is alive. This is
consistent with the bridge driver's FDB.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250204145549.1216254-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 18:53:57 -08:00
Ido Schimmel
1370c45d6e vxlan: Read jiffies once when updating FDB 'used' time
Avoid two volatile reads in the data path. Instead, read jiffies once
and only if an FDB entry was found.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250204145549.1216254-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 18:53:57 -08:00
Ido Schimmel
f6205f8215 vxlan: Annotate FDB data races
The 'used' and 'updated' fields in the FDB entry structure can be
accessed concurrently by multiple threads, leading to reports such as
[1]. Can be reproduced using [2].

Suppress these reports by annotating these accesses using
READ_ONCE() / WRITE_ONCE().

[1]
BUG: KCSAN: data-race in vxlan_xmit / vxlan_xmit

write to 0xffff942604d263a8 of 8 bytes by task 286 on cpu 0:
 vxlan_xmit+0xb29/0x2380
 dev_hard_start_xmit+0x84/0x2f0
 __dev_queue_xmit+0x45a/0x1650
 packet_xmit+0x100/0x150
 packet_sendmsg+0x2114/0x2ac0
 __sys_sendto+0x318/0x330
 __x64_sys_sendto+0x76/0x90
 x64_sys_call+0x14e8/0x1c00
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff942604d263a8 of 8 bytes by task 287 on cpu 2:
 vxlan_xmit+0xadf/0x2380
 dev_hard_start_xmit+0x84/0x2f0
 __dev_queue_xmit+0x45a/0x1650
 packet_xmit+0x100/0x150
 packet_sendmsg+0x2114/0x2ac0
 __sys_sendto+0x318/0x330
 __x64_sys_sendto+0x76/0x90
 x64_sys_call+0x14e8/0x1c00
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x00000000fffbac6e -> 0x00000000fffbac6f

Reported by Kernel Concurrency Sanitizer on:
CPU: 2 UID: 0 PID: 287 Comm: mausezahn Not tainted 6.13.0-rc7-01544-gb4b270f11a02 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014

[2]
 #!/bin/bash

 set +H
 echo whitelist > /sys/kernel/debug/kcsan
 echo !vxlan_xmit > /sys/kernel/debug/kcsan

 ip link add name vx0 up type vxlan id 10010 dstport 4789 local 192.0.2.1
 bridge fdb add 00:11:22:33:44:55 dev vx0 self static dst 198.51.100.1
 taskset -c 0 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &
 taskset -c 2 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250204145549.1216254-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 18:53:56 -08:00
Geert Uytterhoeven
50f37fc2a3 ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-only
if CONFIG_NET_IPGRE is enabled, but CONFIG_IPV6 is disabled:

    net/ipv4/ip_gre.c: In function ‘ipgre_err’:
    net/ipv4/ip_gre.c:144:22: error: variable ‘data_len’ set but not used [-Werror=unused-but-set-variable]
      144 |         unsigned int data_len = 0;
	  |                      ^~~~~~~~

Fix this by moving all data_len processing inside the IPV6-only section
that uses its result.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501121007.2GofXmh5-lkp@intel.com/
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/d09113cfe2bfaca02f3dddf832fb5f48dd20958b.1738704881.git.geert@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 18:48:45 -08:00
Heiner Kallweit
faac69a4ae r8169: don't scan PHY addresses > 0
The PHY address is a dummy, because r8169 PHY access registers
don't support a PHY address. Therefore scan address 0 only.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/830637dd-4016-4a68-92b3-618fcac6589d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 18:28:06 -08:00
Eric Dumazet
cbe08724c1 net: flush_backlog() small changes
Add READ_ONCE() around reads of skb->dev->reg_state, because
this field can be changed from other threads/cpus.

Instead of calling dev_kfree_skb_irq() and kfree_skb()
while interrupts are masked and locks held,
use a temporary list and use __skb_queue_purge_reason()

Use SKB_DROP_REASON_DEV_READY drop reason to better
describe why these skbs are dropped.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250204144825.316785-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 18:19:54 -08:00
Aswin Karuvally
6cccb3bb05 s390/net: Remove LCS driver
The original Open Systems Adapter (OSA) was introduced by IBM in the
mid-90s. These were then superseded by OSA-Express in 1999 which used
Queued Direct IO to greatly improve throughput. The newer cards
retained the older, slower non-QDIO (OSE) modes for compatibility with
older systems. In Linux, the lcs driver was responsible for cards
operating in the older OSE mode and the qeth driver was introduced to
allow the OSA-Express cards to operate in the newer QDIO (OSD) mode.

For an S390 machine from 1998 or later, there is no reason to use the
OSE mode and lcs driver as all OSA cards since 1999 provide the faster
OSD mode. As a result, it's been years since we have heard of a
customer configuration involving the lcs driver.

This patch removes the lcs driver. The technology it supports has been
obsolete for past 25+ years and is irrelevant for current use cases.

Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250204103135.1619097-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 18:19:01 -08:00
Gustavo A. R. Silva
863257c29f cxgb4: Avoid a -Wflex-array-member-not-at-end warning
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Move the conflicting declaration to the end of the structure. Notice
that `struct ethtool_dump` is a flexible structure --a structure that
contains a flexible-array member.

Fix the following warning:
./drivers/net/ethernet/chelsio/cxgb4/cxgb4.h:1215:29: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/Z6GBZ4brXYffLkt_@kspp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 18:16:30 -08:00
Petr Machata
d9e9f6d7b7 bridge: mdb: Allow replace of a host-joined group
Attempts to replace an MDB group membership of the host itself are
currently bounced:

 # ip link add name br up type bridge vlan_filtering 1
 # bridge mdb replace dev br port br grp 239.0.0.1 vid 2
 # bridge mdb replace dev br port br grp 239.0.0.1 vid 2
 Error: bridge: Group is already joined by host.

A similar operation done on a member port would succeed. Ignore the check
for replacement of host group memberships as well.

The bit of code that this enables is br_multicast_host_join(), which, for
already-joined groups only refreshes the MC group expiration timer, which
is desirable; and a userspace notification, also desirable.

Change a selftest that exercises this code path from expecting a rejection
to expecting a pass. The rest of MDB selftests pass without modification.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/e5c5188b9787ae806609e7ca3aa2a0a501b9b5c4.1738685648.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 17:50:03 -08:00
Jakub Kicinski
cbecd06a22 selftests: net: suppress ReST file generation when building selftests
Some selftests need libynl.a. When building it try to skip
generating the ReST documentation, libynl.a does not depend
on them.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250203214850.1282291-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 17:49:40 -08:00
Jakub Kicinski
fadbe52b3b Merge branch 'net-sysfs-remove-the-rtnl_trylock-restart_syscall-construction'
Antoine Tenart says:

====================
net-sysfs: remove the rtnl_trylock/restart_syscall construction

The series initially aimed at improving spins (and thus delays) while
accessing net sysfs under rtnl lock contention[1]. The culprit was the
trylock/restart_syscall constructions. There wasn't much interest at the
time but it got traction recently for other reasons (lowering the rtnl
lock pressure).

Since v1[2]:

- Do not export rtnl_lock_interruptible [Stephen].
- Add netdev_warn_once messages in rx_queue_add_kobject [Jakub].

Since the RFC[1]:

- Limit the breaking of the sysfs protection to sysfs_rtnl_lock() only
  as this is not needed in the whole rtnl locking section thanks to the
  additional check on dev_isalive(). This simplifies error handling as
  well as the unlocking path.
- Used an interruptible version of rtnl_lock, as done by Jakub in
  his experiments.
- Removed a WARN_ONCE_ONCE [Greg].
- Removed explicit inline markers [Stephen].

Most of the reasoning is explained in comments added in patch 1. This
was tested by stress-testing net sysfs attributes (read/write ops) while
adding/removing queues and adding/removing veths, all in parallel. I
also used an OCP single node cluster, spawning lots of pods.

[1] https://lore.kernel.org/all/20231018154804.420823-1-atenart@kernel.org/T/
[2] https://lore.kernel.org/all/20250117102612.132644-1-atenart@kernel.org/T/
====================

Link: https://patch.msgid.link/20250204170314.146022-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 17:49:11 -08:00
Antoine Tenart
b0b6fcfa6a net-sysfs: remove rtnl_trylock from queue attributes
Similar to the commit removing remove rtnl_trylock from device
attributes we here apply the same technique to networking queues.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250204170314.146022-5-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 17:49:08 -08:00
Antoine Tenart
7e54f85c60 net-sysfs: prevent uncleared queues from being re-added
With the (upcoming) removal of the rtnl_trylock/restart_syscall logic
and because of how Tx/Rx queues are implemented (and their
requirements), it might happen that a queue is re-added before having
the chance to be cleared. In such rare case, do not complete the queue
addition operation.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250204170314.146022-4-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 17:49:08 -08:00
Antoine Tenart
b7ecc1de51 net-sysfs: move queue attribute groups outside the default groups
Rx/tx queues embed their own kobject for registering their per-queue
sysfs files. The issue is they're using the kobject default groups for
this and entirely rely on the kobject refcounting for releasing their
sysfs paths.

In order to remove rtnl_trylock calls we need sysfs files not to rely on
their associated kobject refcounting for their release. Thus we here
move queues sysfs files from the kobject default groups to their own
groups which can be removed separately.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250204170314.146022-3-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 17:49:08 -08:00
Antoine Tenart
79c61899b5 net-sysfs: remove rtnl_trylock from device attributes
There is an ABBA deadlock between net device unregistration and sysfs
files being accessed[1][2]. To prevent this from happening all paths
taking the rtnl lock after the sysfs one (actually kn->active refcount)
use rtnl_trylock and return early (using restart_syscall)[3], which can
make syscalls to spin for a long time when there is contention on the
rtnl lock[4].

There are not many possibilities to improve the above:
- Rework the entire net/ locking logic.
- Invert two locks in one of the paths — not possible.

But here it's actually possible to drop one of the locks safely: the
kernfs_node refcount. More details in the code itself, which comes with
lots of comments.

Note that we check the device is alive in the added sysfs_rtnl_lock
helper to disallow sysfs operations to run after device dismantle has
started. This also help keeping the same behavior as before. Because of
this calls to dev_isalive in sysfs ops were removed.

[1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/
[2] https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/
[3] https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/
[4] https://lore.kernel.org/all/20210928125500.167943-1-atenart@kernel.org/T/

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250204170314.146022-2-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05 17:49:07 -08:00
Heiner Kallweit
0bea93fdba net: phy: realtek: use string choices helpers
Use string choices helpers to simplify the code.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501190707.qQS8PGHW-lkp@intel.com/
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-05 10:08:21 +00:00
Heiner Kallweit
135c3c86a7 r8169: make Kconfig option for LED support user-visible
Make config option R8169_LEDS user-visible, so that users can remove
support if not needed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/d29f0cdb-32bf-435f-b59d-dc96bca1e3ab@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 18:11:14 -08:00
Heiner Kallweit
51773846fa net: phy: realtek: make HWMON support a user-visible Kconfig symbol
Make config symbol REALTEK_PHY_HWMON user-visible, so that users can
remove support if not needed.

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/3466ee92-166a-4b0f-9ae7-42b9e046f333@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 18:10:53 -08:00
Breno Leitao
d5fdfe480c netconsole: selftest: Add test for fragmented messages
Add a new selftest to verify netconsole's handling of messages that
exceed the packet size limit and require fragmentation. The test sends
messages with varying sizes and userdata, validating that:

1. Large messages are correctly fragmented and reassembled
2. Userdata fields are properly preserved across fragments
3. Messages work correctly with and without kernel release version
   appending

The test creates a networking environment using netdevsim, sends
messages through /dev/kmsg, and verifies the received fragments maintain
message integrity.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250203-netcons_frag_msgs-v1-1-5bc6bedf2ac0@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 18:09:21 -08:00
Gustavo A. R. Silva
33b565fa2b net: atlantic: Avoid -Wflex-array-member-not-at-end warnings
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Remove unused flexible-array member `buf` and, with this, fix the following
warnings:
drivers/net/ethernet/aquantia/atlantic/aq_hw.h:197:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/ethernet/aquantia/atlantic/hw_atl/../aq_hw.h:197:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Suggested-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://patch.msgid.link/Z6F3KZVfnAZ2FoJm@kspp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 18:00:25 -08:00
Jakub Kicinski
9dd05df840 net: warn if NAPI instance wasn't shut down
Drivers should always disable a NAPI instance before removing it.
If they don't the instance may be queued for polling.
Since commit 86e25f40aa ("net: napi: Add napi_config")
we also remove the NAPI from the busy polling hash table
in napi_disable(), so not disabling would leave a stale
entry there.

Use of busy polling is relatively uncommon so bugs may be lurking
in the drivers. Add an explicit warning.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250203215816.1294081-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 17:58:55 -08:00
Dr. David Alan Gilbert
b565a8c750 cavium/liquidio: Remove unused lio_get_device_id
lio_get_device_id() has been unused since 2018's
commit 64fecd3ec5 ("liquidio: remove obsolete functions and data
structures")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250203183343.193691-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 14:22:11 -08:00
Dr. David Alan Gilbert
626b367276 mlxsw: spectrum_router: Remove unused functions
mlxsw_sp_ipip_lb_ul_vr_id() has been unused since 2020's
commit acde33bf73 ("mlxsw: spectrum_router: Reduce
mlxsw_sp_ipip_fib_entry_op_gre4()")

mlxsw_sp_rif_exists() has been unused since 2023's
commit 49c3a615d3 ("mlxsw: spectrum_router: Replay MACVLANs when RIF is
made")

mlxsw_sp_rif_vid() has been unused since 2023's
commit a5b52692e6 ("mlxsw: spectrum_switchdev: Manage RIFs on PVID
change")

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250203190141.204951-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 14:05:53 -08:00
Dr. David Alan Gilbert
15c51f17bd net/mlx5: Remove unused mlx5dr_domain_sync
mlx5dr_domain_sync() was added in 2019 by
commit 70605ea545 ("net/mlx5: DR, Expose APIs for direct rule managing")
but hasn't been used.

Remove it.

mlx5dr_domain_sync() was the only user of
mlx5dr_send_ring_force_drain().
Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250203185958.204794-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 14:05:29 -08:00
Dr. David Alan Gilbert
2cf424f5ac mlx4: Remove unused functions
The last use of mlx4_find_cached_mac() was removed in 2014 by
commit 2f5bb47368 ("mlx4: Add ref counting to port MAC table for RoCE")

mlx4_zone_free_entries() was added in 2014 by
commit 7a89399ffa ("net/mlx4: Add mlx4_bitmap zone allocator")
but hasn't been used. (The _unique version is used)

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250203185229.204279-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 14:05:07 -08:00
Andrew Kreimer
185b1d53ea net: qed: fix typos
There are some typos in comments/messages:
 - Valiate -> Validate
 - acceptible -> acceptable
 - acces -> access
 - relased -> released

Fix them via codespell.

Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250203175419.4146-1-algonell@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 14:04:38 -08:00
Ninad Palsule
ac33582611 dt-bindings: net: faraday,ftgmac100: Add phys mode
Aspeed device supports rgmii, rgmii-id, rgmii-rxid, rgmii-txid so
document them.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Ninad Palsule <ninad@linux.ibm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250203151306.276358-2-ninad@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 14:01:58 -08:00
Eric Dumazet
a064068bb6 neighbour: remove neigh_parms_destroy()
neigh_parms_destroy() is a simple kfree(), no need for
a forward declaration.

neigh_parms_put() can instead call kfree() directly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250203151152.3163876-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 14:01:19 -08:00
Leon Romanovsky
546d98393a bonding: delete always true device check
XFRM API makes sure that xs->xso.dev is valid in all XFRM offload
callbacks. There is no need to check it again.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/0b2f8f5f09701bb43bbd83b94bfe5cb506b57adc.1738587150.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04 13:59:30 -08:00
Linus Torvalds
c2933b2bef Merge tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from IPSec, netfilter and Bluetooth.

  Nothing really stands out, but as usual there's a slight concentration
  of fixes for issues added in the last two weeks before the merge
  window, and driver bugs from 6.13 which tend to get discovered upon
  wider distribution.

  Current release - regressions:

   - net: revert RTNL changes in unregister_netdevice_many_notify()

   - Bluetooth: fix possible infinite recursion of btusb_reset

   - eth: adjust locking in some old drivers which protect their state
     with spinlocks to avoid sleeping in atomic; core protects netdev
     state with a mutex now

  Previous releases - regressions:

   - eth:
      - mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node()
      - bgmac: reduce max frame size to support just 1500 bytes; the
        jumbo frame support would previously cause OOB writes, but now
        fails outright

   - mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted, avoid
     false detection of MPTCP blackholing

  Previous releases - always broken:

   - mptcp: handle fastopen disconnect correctly

   - xfrm:
      - make sure skb->sk is a full sock before accessing its fields
      - fix taking a lock with preempt disabled for RT kernels

   - usb: ipheth: improve safety of packet metadata parsing; prevent
     potential OOB accesses

   - eth: renesas: fix missing rtnl lock in suspend/resume path"

* tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  MAINTAINERS: add Neal to TCP maintainers
  net: revert RTNL changes in unregister_netdevice_many_notify()
  net: hsr: fix fill_frame_info() regression vs VLAN packets
  doc: mptcp: sysctl: blackhole_timeout is per-netns
  mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted
  netfilter: nf_tables: reject mismatching sum of field_len with set key length
  net: sh_eth: Fix missing rtnl lock in suspend/resume path
  net: ravb: Fix missing rtnl lock in suspend/resume path
  selftests/net: Add test for loading devbound XDP program in generic mode
  net: xdp: Disallow attaching device-bound programs in generic mode
  tcp: correct handling of extreme memory squeeze
  bgmac: reduce max frame size to support just MTU 1500
  vsock/test: Add test for connect() retries
  vsock/test: Add test for UAF due to socket unbinding
  vsock/test: Introduce vsock_connect_fd()
  vsock/test: Introduce vsock_bind()
  vsock: Allow retrying on connect() failure
  vsock: Keep the binding until socket destruction
  Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection
  Bluetooth: btnxpuart: Fix glitches seen in dual A2DP streaming
  ...
2025-01-30 12:24:20 -08:00
Linus Torvalds
b4b0881156 Merge tag 'docs-6.14-2' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
 "Two fixes for footnote-related warnings that appeared with Sphinx 8.x.

  We want to encourage use of newer Sphinx - they fixed a performance
  problem and the docs build takes less than half the time it used to"

* tag 'docs-6.14-2' of git://git.lwn.net/linux:
  docs: power: Fix footnote reference for Toshiba Satellite P10-554
  Documentation: ublk: Drop Stefan Hajnoczi's message footnote
2025-01-30 10:57:19 -08:00
Linus Torvalds
b8a1c9f4b7 Merge tag 's390-6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:

 - Architecutre-specific ftrace recursion trylock tests were removed in
   favour of the generic function_graph_enter(), but s390 got missed.

   Remove this test for s390 as well.

 - Add ftrace_get_symaddr() for s390, which returns the symbol address
   from ftrace 'ip' parameter

* tag 's390-6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/tracing: Define ftrace_get_symaddr() for s390
  s390/fgraph: Fix to remove ftrace_test_recursion_trylock()
2025-01-30 10:53:49 -08:00
Linus Torvalds
b731bc5f49 Merge tag 's390-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Alexander Gordeev:

 - The rework that uncoupled physical and virtual address spaces
   inadvertently prevented KASAN shadow mappings from using large pages.
   Restore large page mappings for KASAN shadows

 - Add decompressor routine physmem_alloc() that may fail, unlike
   physmem_alloc_or_die(). This allows callers to implement fallback
   paths

 - Allow falling back from large pages to smaller pages (1MB or 4KB) if
   the allocation of 2GB pages in the decompressor can not be fulfilled

 - Add to the decompressor boot print support of "%%" format string,
   width and padding hadnling, length modifiers and decimal conversion
   specifiers

 - Add to the decompressor message severity levels similar to kernel
   ones. Support command-line options that control console output
   verbosity

 - Replaces boot_printk() calls with appropriate loglevel- specific
   helpers such as boot_emerg(), boot_warn(), and boot_debug().

 - Collect all boot messages into a ring buffer independent of the
   current log level. This is particularly useful for early crash
   analysis

 - If 'earlyprintk' command line parameter is not specified, store
   decompressor boot messages in a ring buffer to be printed later by
   the kernel, once the console driver is registered

 - Add 'bootdebug' command line parameter to enable printing of
   decompressor debug messages when needed. That parameters allows
   message suppressing and filtering

 - Dump boot messages on a decompressor crash, but only if 'bootdebug'
   command line parameter is enabled

 - When CONFIG_PRINTK_TIME is enabled, add timestamps to boot messages
   in the same format as regular printk()

 - Dump physical memory tracking information on boot: online ranges,
   reserved areas and vmem allocations

 - Dump virtual memory layout and randomization details

 - Improve decompression error reporting and dump the message ring
   buffer in case the boot failed and system halted

 - Add an exception handler which handles exceptions when FPU control
   register is attempted to be set to an invalid value. Remove '.fixup'
   section as result of this change

 - Use 'A', 'O', and 'R' inline assembly format flags, which allows
   recent Clang compilers to generate better FPU code

 - Rework uaccess code so it reads better and generates more efficient
   code

 - Cleanup futex inline assembly code

 - Disable KMSAN instrumention for futex inline assemblies, which
   contain dereferenced user pointers. Otherwise, shadows for the user
   pointers would be accessed

 - PFs which are not initially configured but in standby create only a
   single-function PCI domain. If they are configured later on, sibling
   PFs and their child VFs will not be added to their PCI domain
   breaking SR-IOV expectations.

   Fix that by allowing initially configured but in standby PFs create
   multi-function PCI domains

 - Add '-std=gnu11' to decompressor and purgatory CFLAGS to avoid
   compile errors caused by kernel's own definitions of 'bool', 'false',
   and 'true' conflicting with the C23 reserved keywords

 - Fix sclp subsystem failure when a sclp console is not present

 - Fix misuse of non-NULL terminated strings in vmlogrdr driver

 - Various other small improvements, cleanups and fixes

* tag 's390-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (53 commits)
  s390/vmlogrdr: Use array instead of string initializer
  s390/vmlogrdr: Use internal_name for error messages
  s390/sclp: Initialize sclp subsystem via arch_cpu_finalize_init()
  s390/tools: Use array instead of string initializer
  s390/vmem: Fix null-pointer-arithmetic warning in vmem_map_init()
  s390: Add '-std=gnu11' to decompressor and purgatory CFLAGS
  s390/bitops: Use correct constraint for arch_test_bit() inline assembly
  s390/pci: Fix SR-IOV for PFs initially in standby
  s390/futex: Avoid KMSAN instrumention for user pointers
  s390/uaccess: Rename get_put_user_noinstr_attributes to uaccess_kmsan_or_inline
  s390/futex: Cleanup futex_atomic_cmpxchg_inatomic()
  s390/futex: Generate futex atomic op functions
  s390/uaccess: Remove INLINE_COPY_FROM_USER and INLINE_COPY_TO_USER
  s390/uaccess: Use asm goto for put_user()/get_user()
  s390/uaccess: Remove usage of the oac specifier
  s390/uaccess: Replace EX_TABLE_UA_LOAD_MEM exception handling
  s390/uaccess: Cleanup noinstr __put_user()/__get_user() inline assembly constraints
  s390/uaccess: Remove __put_user_fn()/__get_user_fn() wrappers
  s390/uaccess: Move put_user() / __put_user() close to put_user() asm code
  s390/uaccess: Use asm goto for __mvc_kernel_nofault()
  ...
2025-01-30 10:48:17 -08:00
Linus Torvalds
90cb220062 Merge tag 'gpio-fixes-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:

 - update gpio-sim selftests to not fail now that we no longer allow
   rmdir() on configfs entries of active devices

 - remove leftover code from gpio-mxc

* tag 'gpio-fixes-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  selftests: gpio: gpio-sim: Fix missing chip disablements
  gpio: mxc: remove dead code after switch to DT-only
2025-01-30 10:19:30 -08:00
Linus Torvalds
d3d90cc289 Merge tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs d_revalidate updates from Al Viro:
 "Provide stable parent and name to ->d_revalidate() instances

  Most of the filesystem methods where we care about dentry name and
  parent have their stability guaranteed by the callers;
  ->d_revalidate() is the major exception.

  It's easy enough for callers to supply stable values for expected name
  and expected parent of the dentry being validated. That kills quite a
  bit of boilerplate in ->d_revalidate() instances, along with a bunch
  of races where they used to access ->d_name without sufficient
  precautions"

* tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  9p: fix ->rename_sem exclusion
  orangefs_d_revalidate(): use stable parent inode and name passed by caller
  ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
  nfs: fix ->d_revalidate() UAF on ->d_name accesses
  nfs{,4}_lookup_validate(): use stable parent inode passed by caller
  gfs2_drevalidate(): use stable parent inode and name passed by caller
  fuse_dentry_revalidate(): use stable parent inode and name passed by caller
  vfat_revalidate{,_ci}(): use stable parent inode passed by caller
  exfat_d_revalidate(): use stable parent inode passed by caller
  fscrypt_d_revalidate(): use stable parent inode passed by caller
  ceph_d_revalidate(): propagate stable name down into request encoding
  ceph_d_revalidate(): use stable parent inode passed by caller
  afs_d_revalidate(): use stable name and parent inode passed by caller
  Pass parent directory inode and expected name to ->d_revalidate()
  generic_ci_d_compare(): use shortname_storage
  ext4 fast_commit: make use of name_snapshot primitives
  dissolve external_name.u into separate members
  make take_dentry_name_snapshot() lockless
  dcache: back inline names with a struct-wrapped array of unsigned long
  make sure that DNAME_INLINE_LEN is a multiple of word size
2025-01-30 09:13:35 -08:00
Jakub Kicinski
dfffaccffc Merge tag 'nf-25-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following batch contains one Netfilter fix:

1) Reject mismatching sum of field_len with set key length which allows
   to create a set without inconsistent pipapo rule width and set key
   length.

* tag 'nf-25-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: reject mismatching sum of field_len with set key length
====================

Link: https://patch.msgid.link/20250130113307.2327470-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-30 09:01:00 -08:00
Jakub Kicinski
d7dda216ca MAINTAINERS: add Neal to TCP maintainers
Neal Cardwell has been indispensable in TCP reviews
and investigations, especially protocol-related.
Neal is also the author of packetdrill.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250129191332.2526140-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-30 08:57:38 -08:00
Eric Dumazet
e759e1e4a4 net: revert RTNL changes in unregister_netdevice_many_notify()
This patch reverts following changes:

83419b61d1 net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 2)
ae646f1a0b net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 1)
cfa579f666 net: no longer hold RTNL while calling flush_all_backlogs()

This caused issues in layers holding a private mutex:

cleanup_net()
  rtnl_lock();
	mutex_lock(subsystem_mutex);

	unregister_netdevice();

	   rtnl_unlock();		// LOCKDEP violation
	   rtnl_lock();

I will revisit this in next cycle, opt-in for the new behavior
from safe contexts only.

Fixes: cfa579f666 ("net: no longer hold RTNL while calling flush_all_backlogs()")
Fixes: ae646f1a0b ("net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 1)")
Fixes: 83419b61d1 ("net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 2)")
Reported-by: syzbot+5b9196ecf74447172a9a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6789d55f.050a0220.20d369.004e.GAE@google.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250129142726.747726-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-30 08:57:18 -08:00
Eric Dumazet
0f5697f1a3 net: hsr: fix fill_frame_info() regression vs VLAN packets
Stephan Wurm reported that my recent patch broke VLAN support.

Apparently skb->mac_len is not correct for VLAN traffic as
shown by debug traces [1].

Use instead pskb_may_pull() to make sure the expected header
is present in skb->head.

Many thanks to Stephan for his help.

[1]
kernel: skb len=170 headroom=2 headlen=170 tailroom=20
        mac=(2,14) mac_len=14 net=(16,-1) trans=-1
        shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
        csum(0x0 start=0 offset=0 ip_summed=0 complete_sw=0 valid=0 level=0)
        hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0
        priority=0x0 mark=0x0 alloc_cpu=0 vlan_all=0x0
        encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
kernel: dev name=prp0 feat=0x0000000000007000
kernel: sk family=17 type=3 proto=0
kernel: skb headroom: 00000000: 74 00
kernel: skb linear:   00000000: 01 0c cd 01 00 01 00 d0 93 53 9c cb 81 00 80 00
kernel: skb linear:   00000010: 88 b8 00 01 00 98 00 00 00 00 61 81 8d 80 16 52
kernel: skb linear:   00000020: 45 47 44 4e 43 54 52 4c 2f 4c 4c 4e 30 24 47 4f
kernel: skb linear:   00000030: 24 47 6f 43 62 81 01 14 82 16 52 45 47 44 4e 43
kernel: skb linear:   00000040: 54 52 4c 2f 4c 4c 4e 30 24 44 73 47 6f 6f 73 65
kernel: skb linear:   00000050: 83 07 47 6f 49 64 65 6e 74 84 08 67 8d f5 93 7e
kernel: skb linear:   00000060: 76 c8 00 85 01 01 86 01 00 87 01 00 88 01 01 89
kernel: skb linear:   00000070: 01 00 8a 01 02 ab 33 a2 15 83 01 00 84 03 03 00
kernel: skb linear:   00000080: 00 91 08 67 8d f5 92 77 4b c6 1f 83 01 00 a2 1a
kernel: skb linear:   00000090: a2 06 85 01 00 83 01 00 84 03 03 00 00 91 08 67
kernel: skb linear:   000000a0: 8d f5 92 77 4b c6 1f 83 01 00
kernel: skb tailroom: 00000000: 80 18 02 00 fe 4e 00 00 01 01 08 0a 4f fd 5e d1
kernel: skb tailroom: 00000010: 4f fd 5e cd

Fixes: b9653d19e5 ("net: hsr: avoid potential out-of-bound access in fill_frame_info()")
Reported-by: Stephan Wurm <stephan.wurm@a-eberle.de>
Tested-by: Stephan Wurm <stephan.wurm@a-eberle.de>
Closes: https://lore.kernel.org/netdev/Z4o_UC0HweBHJ_cw@PC-LX-SteWu/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250129130007.644084-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-30 08:56:59 -08:00
Linus Torvalds
ce335806b5 Merge tag 'ntfs3_for_6.14' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 fixes from Konstantin Komarov:

 - unify inode corruption marking and mark them as bad immediately upon
   detection of an error in attribute enumeration

 - folio cleanup

* tag 'ntfs3_for_6.14' of https://github.com/Paragon-Software-Group/linux-ntfs3:
  fs/ntfs3: Unify inode corruption marking with _ntfs_bad_inode()
  fs/ntfs3: Mark inode as bad as soon as error detected in mi_enum_attr()
  ntfs3: Remove an access to page->index
2025-01-30 08:47:17 -08:00
Linus Torvalds
8080ff5ac6 Merge tag 'bcachefs-2025-01-29' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:

 - second half of a fix for a bug that'd been causing oopses on
   filesystems using snapshots with memory pressure (key cache fills for
   snaphots btrees are tricky)

 - build fix for strange compiler configurations that double stack frame
   size

 - "journal stuck timeout" now takes into account device latency: this
   fixes some spurious warnings, and the main remaining source of SRCU
   lock hold time warnings (I'm no longer seeing this in my CI, so any
   users still seeing this should definitely ping me)

 - fix for slow/hanging unmounts (" Improve journal pin flushing")

 - some more tracepoint fixes/improvements, to chase down the "rebalance
   isn't making progress" issues

* tag 'bcachefs-2025-01-29' of git://evilpiepirate.org/bcachefs:
  bcachefs: Improve trace_move_extent_finish
  bcachefs: Fix trace_copygc
  bcachefs: Journal writes are now IOPRIO_CLASS_RT
  bcachefs: Improve journal pin flushing
  bcachefs: fix bch2_btree_node_flags
  bcachefs: rebalance, copygc enabled are runtime opts
  bcachefs: Improve decompression error messages
  bcachefs: bset_blacklisted_journal_seq is now AUTOFIX
  bcachefs: "Journal stuck" timeout now takes into account device latency
  bcachefs: Reduce stack frame size of __bch2_str_hash_check_key()
  bcachefs: Fix btree_trans_peek_key_cache()
2025-01-30 08:42:50 -08:00
Paolo Abeni
047558caf7 Merge branch 'mptcp-blackhole-only-if-1st-syn-retrans-w-o-mpc-is-accepted'
Matthieu Baerts says:

====================
mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted

Here are two small fixes for issues introduced in v6.12.

- Patch 1: reset the mpc_drop mark for other SYN retransmits, to only
  consider an MPTCP blackhole when the first SYN retransmitted without
  the MPTCP options is accepted, as initially intended.

- Patch 2: also mention in the doc that the blackhole_timeout sysctl
  knob is per-netns, like all the others.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
====================

Link: https://patch.msgid.link/20250129-net-mptcp-blackhole-fix-v1-0-afe88e5a6d2c@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30 14:02:22 +01:00
Matthieu Baerts (NGI0)
18da4b5d12 doc: mptcp: sysctl: blackhole_timeout is per-netns
All other sysctl entries mention it, and it is a per-namespace sysctl.

So mention it as well.

Fixes: 27069e7cb3 ("mptcp: disable active MPTCP in case of blackhole")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30 14:02:19 +01:00
Matthieu Baerts (NGI0)
e598d8981f mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted
The Fixes commit mentioned this:

> An MPTCP firewall blackhole can be detected if the following SYN
> retransmission after a fallback to "plain" TCP is accepted.

But in fact, this blackhole was detected if any following SYN
retransmissions after a fallback to TCP was accepted.

That's because 'mptcp_subflow_early_fallback()' will set 'request_mptcp'
to 0, and 'mpc_drop' will never be reset to 0 after.

This is an issue, because some not so unusual situations might cause the
kernel to detect a false-positive blackhole, e.g. a client trying to
connect to a server while the network is not ready yet, causing a few
SYN retransmissions, before reaching the end server.

Fixes: 27069e7cb3 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30 14:02:19 +01:00
Pablo Neira Ayuso
1b9335a800 netfilter: nf_tables: reject mismatching sum of field_len with set key length
The field length description provides the length of each separated key
field in the concatenation, each field gets rounded up to 32-bits to
calculate the pipapo rule width from pipapo_init(). The set key length
provides the total size of the key aligned to 32-bits.

Register-based arithmetics still allows for combining mismatching set
key length and field length description, eg. set key length 10 and field
description [ 5, 4 ] leading to pipapo width of 12.

Cc: stable@vger.kernel.org
Fixes: 3ce67e3793 ("netfilter: nf_tables: do not allow mismatch field size and set key length")
Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-30 12:26:11 +01:00
Paolo Abeni
3effcc04d8 Merge branch 'fix-missing-rtnl-lock-in-suspend-path'
Kory Maincent says:

====================
Fix missing rtnl lock in suspend path

Fix the suspend path by ensuring the rtnl lock is held where required.
Calls to open, close and WOL operations must be performed under the
rtnl lock to prevent conflicts with ongoing ndo operations.

Discussion about this issue can be found here:
https://lore.kernel.org/netdev/20250120141926.1290763-1-kory.maincent@bootlin.com/

While working on the ravb fix, it was discovered that the sh_eth driver
has the same issue. This patch series addresses both drivers.

I do not have access to hardware for either of these MACs, so it would
be great if maintainers or others with the relevant boards could test
these fixes.

v2: https://lore.kernel.org/r/20250123-fix_missing_rtnl_lock_phy_disconnect-v2-0-e6206f5508ba@bootlin.com

v1: https://lore.kernel.org/r/20250122-fix_missing_rtnl_lock_phy_disconnect-v1-0-8cb9f6f88fd1@bootlin.com

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
====================

Link: https://patch.msgid.link/20250129-fix_missing_rtnl_lock_phy_disconnect-v3-0-24c4ba185a92@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30 11:23:32 +01:00
Kory Maincent
b95102215a net: sh_eth: Fix missing rtnl lock in suspend/resume path
Fix the suspend/resume path by ensuring the rtnl lock is held where
required. Calls to sh_eth_close, sh_eth_open and wol operations must be
performed under the rtnl lock to prevent conflicts with ongoing ndo
operations.

Fixes: b71af04676 ("sh_eth: add more PM methods")
Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30 11:23:01 +01:00
Kory Maincent
2c2ebb2b49 net: ravb: Fix missing rtnl lock in suspend/resume path
Fix the suspend/resume path by ensuring the rtnl lock is held where
required. Calls to ravb_open, ravb_close and wol operations must be
performed under the rtnl lock to prevent conflicts with ongoing ndo
operations.

Without this fix, the following warning is triggered:
[   39.032969] =============================
[   39.032983] WARNING: suspicious RCU usage
[   39.033019] -----------------------------
[   39.033033] drivers/net/phy/phy_device.c:2004 suspicious
rcu_dereference_protected() usage!
...
[   39.033597] stack backtrace:
[   39.033613] CPU: 0 UID: 0 PID: 174 Comm: python3 Not tainted
6.13.0-rc7-next-20250116-arm64-renesas-00002-g35245dfdc62c #7
[   39.033623] Hardware name: Renesas SMARC EVK version 2 based on
r9a08g045s33 (DT)
[   39.033628] Call trace:
[   39.033633]  show_stack+0x14/0x1c (C)
[   39.033652]  dump_stack_lvl+0xb4/0xc4
[   39.033664]  dump_stack+0x14/0x1c
[   39.033671]  lockdep_rcu_suspicious+0x16c/0x22c
[   39.033682]  phy_detach+0x160/0x190
[   39.033694]  phy_disconnect+0x40/0x54
[   39.033703]  ravb_close+0x6c/0x1cc
[   39.033714]  ravb_suspend+0x48/0x120
[   39.033721]  dpm_run_callback+0x4c/0x14c
[   39.033731]  device_suspend+0x11c/0x4dc
[   39.033740]  dpm_suspend+0xdc/0x214
[   39.033748]  dpm_suspend_start+0x48/0x60
[   39.033758]  suspend_devices_and_enter+0x124/0x574
[   39.033769]  pm_suspend+0x1ac/0x274
[   39.033778]  state_store+0x88/0x124
[   39.033788]  kobj_attr_store+0x14/0x24
[   39.033798]  sysfs_kf_write+0x48/0x6c
[   39.033808]  kernfs_fop_write_iter+0x118/0x1a8
[   39.033817]  vfs_write+0x27c/0x378
[   39.033825]  ksys_write+0x64/0xf4
[   39.033833]  __arm64_sys_write+0x18/0x20
[   39.033841]  invoke_syscall+0x44/0x104
[   39.033852]  el0_svc_common.constprop.0+0xb4/0xd4
[   39.033862]  do_el0_svc+0x18/0x20
[   39.033870]  el0_svc+0x3c/0xf0
[   39.033880]  el0t_64_sync_handler+0xc0/0xc4
[   39.033888]  el0t_64_sync+0x154/0x158
[   39.041274] ravb 11c30000.ethernet eth0: Link is Down

Reported-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Closes: https://lore.kernel.org/netdev/4c6419d8-c06b-495c-b987-d66c2e1ff848@tuxon.dev/
Fixes: 0184165b2f ("ravb: add sleep PM suspend/resume support")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30 11:23:01 +01:00