Commit Graph

1310824 Commits

Author SHA1 Message Date
Kuniyuki Iwashima
bdd85ddce5 rtnetlink: Fix kdoc of rtnl_af_register().
Commit 26eebdc4b0 ("rtnetlink: Return int from rtnl_af_register().")
made rtnl_af_register() return int again, and kdoc needs to be fixed up.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241022210320.86111-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:35:20 -07:00
Jakub Kicinski
25c509f483 Merge branch 'ipv4-prepare-core-ipv4-files-to-future-flowi4_tos-conversion'
Guillaume Nault says:

====================
ipv4: Prepare core ipv4 files to future .flowi4_tos conversion.

Continue preparing users of ->flowi4_tos (struct flowi4) to the future
conversion of this field (from __u8 to dscp_t). The objective is to
have type annotation to properly separate DSCP bits from ECN ones. This
way we'll ensure that ECN doesn't interfere with DSCP and avoid
regressions where it break routing descisions (fib rules in particular).

This series concentrates on some easy IPv4 conversions where
->flowi4_tos is set directly from an IPv4 header, so we can get the
DSCP value using the ip4h_dscp() helper function.
====================

Link: https://patch.msgid.link/cover.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:21:25 -07:00
Guillaume Nault
85ef52e869 ipv4: Prepare ip_rt_get_source() to future .flowi4_tos conversion.
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/0a13a200f31809841975e38633914af1061e0c04.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:21:06 -07:00
Guillaume Nault
6ab04392dd ipv4: Prepare ipmr_rt_fib_lookup() to future .flowi4_tos conversion.
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/462402a097260357a7aba80228612305f230b6a9.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:21:05 -07:00
Guillaume Nault
0ed373390c ipv4: Prepare icmp_reply() to future .flowi4_tos conversion.
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/61b7563563f8b0a562b5b62032fe5260034d0aac.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:21:05 -07:00
Guillaume Nault
b76ebf22c5 ipv4: Prepare fib_compute_spec_dst() to future .flowi4_tos conversion.
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/a0eba69cce94f747e4c7516184a85ffd0abbe3f0.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:21:05 -07:00
Paolo Abeni
c093e2b976 Merge branch 'ibm-emac-more-cleanups'
Rosen Penev says:

====================
ibm: emac: more cleanups

Tested on Cisco MX60W.

v2: fixed build errors. Also added extra commits to clean the driver up
further.
v3: Added tested message. Removed bad alloc_netdev_dummy commit.
v4: removed modules changes from patchset. Added fix for if MAC not
found.
v5: added of_find_matching_node commit.
v6: resend after net-next merge.
v7: removed of_find_matching_node commit. Adjusted mutex_init patch.
v8: removed patch removing custom init/exit. Needs more work.
====================

Link: https://patch.msgid.link/20241022002245.843242-1-rosenp@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 15:33:25 +01:00
Rosen Penev
707f1c4b6a net: ibm: emac: generate random MAC if not found
On this Cisco MX60W, u-boot sets the local-mac-address property.
Unfortunately by default, the MAC is wrong and is actually located on a
UBI partition. Which means nvmem needs to be used to grab it.

In the case where that fails, EMAC fails to initialize instead of
generating a random MAC as many other drivers do.

Match behavior with other drivers to have a working ethernet interface.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 15:33:23 +01:00
Rosen Penev
af4698be49 net: ibm: emac: use devm for mutex_init
It seems since inception that mutex_destroy was never called for these
in _remove. Instead of handling this manually, just use devm for
simplicity.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 15:33:23 +01:00
Rosen Penev
a598f66d91 net: ibm: emac: use platform_get_irq
No need for irq_of_parse_and_map since we have platform_device.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 15:33:23 +01:00
Rosen Penev
c9bf90863d net: ibm: emac: use devm_platform_ioremap_resource
No need to have a struct resource. Gets rid of the TODO.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 15:33:23 +01:00
Rosen Penev
0a24488d93 net: ibm: emac: use netif_receive_skb_list
Small rx improvement. Would use napi_gro_receive instead but that's a
lot more involved than netif_receive_skb_list because of how the
function is implemented.

Before:

> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 51556 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.04 sec   559 MBytes   467 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 48228 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.03 sec   558 MBytes   467 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 47600 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.04 sec   557 MBytes   466 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 37252 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.05 sec   559 MBytes   467 Mbits/sec

After:

> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 40786 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.05 sec   572 MBytes   478 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 52482 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.04 sec   571 MBytes   477 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 48370 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.04 sec   572 MBytes   478 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 46086 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.05 sec   571 MBytes   476 Mbits/sec
> iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.101 port 46062 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.04 sec   572 MBytes   478 Mbits/sec

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 15:33:22 +01:00
Paolo Abeni
dd1b082f01 Merge branch 'ipv4-convert-rtm_-new-del-addr-and-more-to-per-netns-rtnl'
Kuniyuki Iwashima says:

====================
ipv4: Convert RTM_{NEW,DEL}ADDR and more to per-netns RTNL.

The IPv4 address hash table and GC are already namespacified.

This series converts RTM_NEWADDR/RTM_DELADDR and some more
RTNL users to per-netns RTNL.

Changes:
  v2:
    * Add patch 1 to address sparse warning for CONFIG_DEBUG_NET_SMALL_RTNL=n
    * Add Eric's tags to patch 2-12

  v1: https://lore.kernel.org/netdev/20241018012225.90409-1-kuniyu@amazon.com/
====================

Link: https://patch.msgid.link/20241021183239.79741-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:55:29 +01:00
Kuniyuki Iwashima
7ed8da17bf ipv4: Convert devinet_ioctl to per-netns RTNL.
ioctl(SIOCGIFCONF) calls dev_ifconf() that operates on the current netns.

Let's use per-netns RTNL helpers in dev_ifconf() and inet_gifconf().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:58 +01:00
Kuniyuki Iwashima
88d1f87706 ipv4: Convert devinet_ioctl() to per-netns RTNL except for SIOCSIFFLAGS.
Basically, devinet_ioctl() operates on a single netns.

However, ioctl(SIOCSIFFLAGS) will trigger the netdev notifier
that could touch another netdev in different netns.

Let's use per-netns RTNL helper in devinet_ioctl() and place
ASSERT_RTNL() for SIOCSIFFLAGS.

We will remove ASSERT_RTNL() once RTM_SETLINK and RTM_DELLINK
are converted.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:58 +01:00
Kuniyuki Iwashima
77453d428d ipv4: Convert devinet_sysctl_forward() to per-netns RTNL.
devinet_sysctl_forward() touches only a single netns.

Let's use rtnl_trylock() and __in_dev_get_rtnl_net().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
d1c81818aa rtnetlink: Define rtnl_net_trylock().
We will need the per-netns version of rtnl_trylock().

rtnl_net_trylock() calls __rtnl_net_lock() only when rtnl_trylock()
successfully holds RTNL.

When RTNL is removed, we will use mutex_trylock() for per-netns RTNL.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
c350c4761e ipv4: Convert check_lifetime() to per-netns RTNL.
Since commit 1675f38521 ("ipv4: Namespacify IPv4 address GC."),
check_lifetime() works on a per-netns basis.

Let's use rtnl_net_lock() and rtnl_net_dereference().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
4df5066f07 ipv4: Convert RTM_DELADDR to per-netns RTNL.
Let's push down RTNL into inet_rtm_deladdr() as rtnl_net_lock().

Now, ip_mc_autojoin_config() is always called under per-netns RTNL,
so ASSERT_RTNL() can be replaced with ASSERT_RTNL_NET().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
d4b483208b ipv4: Use per-netns RTNL helpers in inet_rtm_newaddr().
inet_rtm_to_ifa() and find_matching_ifa() are called
under rtnl_net_lock().

__in_dev_get_rtnl() and in_dev_for_each_ifa_rtnl() there
can use per-netns RTNL helpers.

Let's define and use __in_dev_get_rtnl_net() and
in_dev_for_each_ifa_rtnl_net().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
487257786b ipv4: Convert RTM_NEWADDR to per-netns RTNL.
The address hash table and GC are already namespacified.

Let's push down RTNL into inet_rtm_newaddr() as rtnl_net_lock().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
abd0deff03 ipv4: Don't allocate ifa for 0.0.0.0 in inet_rtm_newaddr().
When we pass 0.0.0.0 to __inet_insert_ifa(), it frees ifa and returns 0.

We can do this check much earlier for RTM_NEWADDR even before allocating
struct in_ifaddr.

Let's move the validation to

  1. inet_insert_ifa() for ioctl()
  2. inet_rtm_newaddr() for RTM_NEWADDR

Now, we can remove the same check in find_matching_ifa().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
2d34429d14 ipv4: Factorise RTM_NEWADDR validation to inet_validate_rtm().
rtm_to_ifaddr() validates some attributes, looks up a netdev,
allocates struct in_ifaddr, and validates IFA_CACHEINFO.

There is no reason to delay IFA_CACHEINFO validation.

We will push RTNL down to inet_rtm_newaddr(), and then we want
to complete rtnetlink validation before rtnl_net_lock().

Let's factorise the validation parts.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
26d8db55ee rtnetlink: Define RTNL_FLAG_DOIT_PERNET for per-netns RTNL doit().
We will push RTNL down to each doit() as rtnl_net_lock().

We can use RTNL_FLAG_DOIT_UNLOCKED to call doit() without RTNL, but doit()
will still hold RTNL.

Let's define RTNL_FLAG_DOIT_PERNET as an alias of RTNL_FLAG_DOIT_UNLOCKED.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
9cb7e40d38 rtnetlink: Make per-netns RTNL dereference helpers to macro.
When CONFIG_DEBUG_NET_SMALL_RTNL is off, rtnl_net_dereference() is the
static inline wrapper of rtnl_dereference() returning a plain (void *)
pointer to make sure net is always evaluated as requested in [0].

But, it makes sparse complain [1] when the pointer has __rcu annotation:

  net/ipv4/devinet.c:674:47: sparse: warning: incorrect type in argument 2 (different address spaces)
  net/ipv4/devinet.c:674:47: sparse:    expected void *p
  net/ipv4/devinet.c:674:47: sparse:    got struct in_ifaddr [noderef] __rcu *

Also, if we evaluate net as (void *) in a macro, then the compiler
in turn fails to build due to -Werror=unused-value.

  #define rtnl_net_dereference(net, p)                  \
        ({                                              \
                (void *)net;                            \
                rtnl_dereference(p);                    \
        })

  net/ipv4/devinet.c: In function ‘inet_rtm_deladdr’:
  ./include/linux/rtnetlink.h:154:17: error: statement with no effect [-Werror=unused-value]
    154 |                 (void *)net;                            \
  net/ipv4/devinet.c:674:21: note: in expansion of macro ‘rtnl_net_dereference’
    674 |              (ifa = rtnl_net_dereference(net, *ifap)) != NULL;
        |                     ^~~~~~~~~~~~~~~~~~~~

Let's go back to the original simplest macro.

Note that checkpatch complains about this approach, but it's one-shot and
less noisy than the other two.

  WARNING: Argument 'net' is not used in function-like macro
  #76: FILE: include/linux/rtnetlink.h:142:
  +#define rtnl_net_dereference(net, p)			\
  +	rtnl_dereference(p)

Fixes: 844e5e7e65 ("rtnetlink: Add assertion helpers for per-netns RTNL.")
Link: https://lore.kernel.org/netdev/20241004132145.7fd208e9@kernel.org/ [0]
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410200325.SaEJmyZS-lkp@intel.com/ [1]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Eric Dumazet
ab101c553b neighbour: use kvzalloc()/kvfree()
mm layer is providing convenient functions, we do not have
to work around old limitations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gilad Naaman <gnaaman@drivenets.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241022150059.1345406-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 18:12:06 -07:00
Kory Maincent
63afe0c217 netlink: specs: Add missing phy-ntf command to ethtool spec
ETHTOOL_MSG_PHY_NTF description is missing in the ethtool netlink spec.
Add it to the spec.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241022151418.875424-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 18:10:08 -07:00
Eric Dumazet
ba4e469e42 vsock: do not leave dangling sk pointer in vsock_create()
syzbot was able to trigger the following warning after recent
core network cleanup.

On error vsock_create() frees the allocated sk object, but sock_init_data()
has already attached it to the provided sock object.

We must clear sock->sk to avoid possible use-after-free later.

WARNING: CPU: 0 PID: 5282 at net/socket.c:1581 __sock_create+0x897/0x950 net/socket.c:1581
Modules linked in:
CPU: 0 UID: 0 PID: 5282 Comm: syz.2.43 Not tainted 6.12.0-rc2-syzkaller-00667-g53bac8330865 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
 RIP: 0010:__sock_create+0x897/0x950 net/socket.c:1581
Code: 7f 06 01 65 48 8b 34 25 00 d8 03 00 48 81 c6 b0 08 00 00 48 c7 c7 60 0b 0d 8d e8 d4 9a 3c 02 e9 11 f8 ff ff e8 0a ab 0d f8 90 <0f> 0b 90 e9 82 fd ff ff 89 e9 80 e1 07 fe c1 38 c1 0f 8c c7 f8 ff
RSP: 0018:ffffc9000394fda8 EFLAGS: 00010293
RAX: ffffffff89873c46 RBX: ffff888079f3c818 RCX: ffff8880314b9e00
RDX: 0000000000000000 RSI: 00000000ffffffed RDI: 0000000000000000
RBP: ffffffff8d3337f0 R08: ffffffff8987384e R09: ffffffff8989473a
R10: dffffc0000000000 R11: fffffbfff203a276 R12: 00000000ffffffed
R13: ffff888079f3c8c0 R14: ffffffff898736e7 R15: dffffc0000000000
FS:  00005555680ab500(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22b11196d0 CR3: 00000000308c0000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  sock_create net/socket.c:1632 [inline]
  __sys_socket_create net/socket.c:1669 [inline]
  __sys_socket+0x150/0x3c0 net/socket.c:1716
  __do_sys_socket net/socket.c:1730 [inline]
  __se_sys_socket net/socket.c:1728 [inline]
  __x64_sys_socket+0x7a/0x90 net/socket.c:1728
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f22b117dff9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff56aec0e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000029
RAX: ffffffffffffffda RBX: 00007f22b1335f80 RCX: 00007f22b117dff9
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000028
RBP: 00007f22b11f0296 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f22b1335f80 R14: 00007f22b1335f80 R15: 00000000000012dd

Fixes: 48156296a0 ("net: warn, if pf->create does not clear sock->sk on error")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20241022134819.1085254-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 18:08:52 -07:00
Sebastian Ott
25872a079b net/mlx5: unique names for per device caches
Add the device name to the per device kmem_cache names to
ensure their uniqueness. This fixes warnings like this:
"kmem_cache of name 'mlx5_fs_fgs' already exists".

Signed-off-by: Sebastian Ott <sebott@redhat.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241023134146.28448-1-sebott@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 17:33:36 -07:00
Jakub Kicinski
c7cf3e928e Merge branch 'bonding-returns-detailed-error-about-xdp-failures'
Hangbin Liu says:

====================
Bonding: returns detailed error about XDP failures

Based on discussion[1], this patch set returns detailed error about XDP
failures. And update bonding document about XDP supports.

https://lore.kernel.org/8088f2a7-3ab1-4a1e-996d-c15703da13cc@blackwall.org
====================

Link: https://patch.msgid.link/20241021031211.814-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 16:09:44 -07:00
Hangbin Liu
9f59eccd9d Documentation: bonding: add XDP support explanation
Add document about which modes have native XDP support.

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241021031211.814-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 16:09:42 -07:00
Hangbin Liu
22ccb684c1 bonding: return detailed error when loading native XDP fails
Bonding only supports native XDP for specific modes, which can lead to
confusion for users regarding why XDP loads successfully at times and
fails at others. This patch enhances error handling by returning detailed
error messages, providing users with clearer insights into the specific
reasons for the failure when loading native XDP.

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241021031211.814-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 16:09:42 -07:00
Jakub Kicinski
825199bf20 Merge branch 'mptcp-various-small-improvements'
Matthieu Baerts says:

====================
mptcp: various small improvements

The following patches are not related to each other.

- Patch 1: Avoid sending advertisements on stale subflows, reducing
  risks on loosing them.

- Patch 2: Annotate data-races around subflow->fully_established, using
  READ/WRITE_ONCE().

- Patch 3: A small clean-up on the PM side, avoiding a bit of duplicated
  code.

- Patch 4: Use "Middlebox interference" MP_TCPRST code in reaction to a
  packet received without MPTCP options in the middle of a connection.
====================

Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-0-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:55:48 -07:00
Davide Caratti
46a3282b87 mptcp: use "middlebox interference" RST when no DSS
RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of
fully established subflow that carries data at TCP level with no DSS
sub-option.

This is generally the case when mpext is NULL or mpext->use_map is 0:
use a dedicated value of 'mapping_status' and use it before closing the
socket in subflow_check_data_avail().

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/518
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-4-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:55:45 -07:00
Geliang Tang
5add80bfdc mptcp: implement mptcp_pm_connection_closed
The MPTCP path manager event handler mptcp_pm_connection_closed
interface has been added in the commit 1b1c7a0ef7 ("mptcp: Add path
manager interface") but it was an empty function from then on.

With such name, it sounds good to invoke mptcp_event with the
MPTCP_EVENT_CLOSED event type from it. It also removes a bit of
duplicated code.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-3-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:55:45 -07:00
Gang Yan
581c8cbfa9 mptcp: annotate data-races around subflow->fully_established
We introduce the same handling for potential data races with the
'fully_established' flag in subflow as previously done for
msk->fully_established.

Additionally, we make a crucial change: convert the subflow's
'fully_established' from 'bit_field' to 'bool' type. This is
necessary because methods for avoiding data races don't work well
with 'bit_field'. Specifically, the 'READ_ONCE' needs to know
the size of the variable being accessed, which is not supported in
'bit_field'. Also, 'test_bit' expect the address of 'bit_field'.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/516
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-2-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:55:45 -07:00
Matthieu Baerts (NGI0)
a42f307664 mptcp: pm: send ACK on non-stale subflows
If the subflow is considered as "staled", it is better to avoid it to
send an ACK carrying an ADD_ADDR or RM_ADDR. Another subflow, if any,
will then be selected.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-1-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:55:45 -07:00
Jakub Kicinski
fbb26ecc55 Merge branch 'net-systemport-minor-io-macros-changes'
Florian Fainelli says:

====================
net: systemport: Minor IO macros changes

This patch series addresses the warning initially reported by Vladimir
here:

https://lore.kernel.org/all/20241014150139.927423-1-vladimir.oltean@nxp.com/

and follows on with proceeding with his suggestion the IO macros to the
header file.
====================

Link: https://patch.msgid.link/20241021174935.57658-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:54:42 -07:00
Florian Fainelli
e69fbd287d net: systemport: Move IO macros to header file
Move the BCM_SYSPORT_IO_MACRO() definition and its use to bcmsysport.h
where it is more appropriate and where static inline helpers are
acceptable. While at it, make sure that the macro 'offset' argument does
not trigger a checkpatch warning due to possible argument re-use.

Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241021174935.57658-3-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:54:37 -07:00
Florian Fainelli
890bde75a2 net: systemport: Remove unused txchk accessors
Vladimir reported the following warning with clang-16 and W=1:

warning: unused function 'txchk_readl' [-Wunused-function]
BCM_SYSPORT_IO_MACRO(txchk, SYS_PORT_TXCHK_OFFSET);
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'

warning: unused function 'txchk_writel' [-Wunused-function]
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'

warning: unused function 'tbuf_readl' [-Wunused-function]
BCM_SYSPORT_IO_MACRO(tbuf, SYS_PORT_TBUF_OFFSET);
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'

warning: unused function 'tbuf_writel' [-Wunused-function]
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'

The TXCHK and RBUF blocks are not being accessed, remove the IO macros
used to access those blocks. No functional impact.

Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241021174935.57658-2-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:54:37 -07:00
Leo Stone
47e99f3073 selftest/tcp-ao: Add filter tests
Add tests that check if getsockopt(TCP_AO_GET_KEYS) returns the right
keys when using different filters.

Sample output:

> # ok 114 filter keys: by sndid, rcvid, address
> # ok 115 filter keys: by is_current
> # ok 116 filter keys: by is_rnext
> # ok 117 filter keys: by sndid, rcvid
> # ok 118 filter keys: correct nkeys when in.nkeys < matches

Acked-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Leo Stone <leocstone@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241021174652.6949-1-leocstone@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:53:50 -07:00
Yazen Ghannam
9f6cb31979 net: amd8111e: Remove duplicate definition of PCI_VENDOR_ID_AMD
The AMD PCI vendor ID is already defined in <linux/pci_ids.h>.

Remove this local definition as it is not needed.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241021153825.2536819-1-yazen.ghannam@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:48:51 -07:00
Danila Tikhonov
05c9afb9bf dt-bindings: nfc: nxp,nci: Document PN553 compatible
The PN553 is another NFC chip from NXP, document the compatible in the
bindings.

Signed-off-by: Danila Tikhonov <danila@jiaxyga.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20241020205615.211256-2-danila@jiaxyga.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 12:53:20 -07:00
Jakub Kicinski
a3e4bf7f96 configs/debug: make sure PROVE_RCU_LIST=y takes effect
Commit 0aaa8977ac ("configs: introduce debug.config for CI-like setup")
added CONFIG_PROVE_RCU_LIST=y to the common CI config,
but RCU_EXPERT is not set, and it's a dependency for
CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
indicate that it does catch bugs.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241016011144.3058445-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 10:21:09 -07:00
Javier Carrasco
b8ee7a11c7 net: dsa: mv88e6xxx: fix unreleased fwnode_handle in setup_port()
'ports_fwnode' is initialized via device_get_named_child_node(), which
requires a call to fwnode_handle_put() when the variable is no longer
required to avoid leaking memory.

Add the missing fwnode_handle_put() after 'ports_fwnode' has been used
and is no longer required.

Fixes: 94a2a84f5e ("net: dsa: mv88e6xxx: Support LED control")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-28 13:27:34 +00:00
Guillaume Nault
788d5d655b bareudp: Use pcpu stats to update rx_dropped counter.
Use the core_stats rx_dropped counter to avoid the cost of atomic
increments.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-28 11:16:32 +00:00
Paolo Abeni
03fc07a247 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts and no adjacent changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-25 09:08:22 +02:00
Linus Torvalds
d44cd82264 Merge tag 'net-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from netfiler, xfrm and bluetooth.

  Oddly this includes a fix for a posix clock regression; in our
  previous PR we included a change there as a pre-requisite for
  networking one. That fix proved to be buggy and requires the follow-up
  included here. Thomas suggested we should send it, given we sent the
  buggy patch.

  Current release - regressions:

   - posix-clock: Fix unbalanced locking in pc_clock_settime()

   - netfilter: fix typo causing some targets not to load on IPv6

  Current release - new code bugs:

   - xfrm: policy: remove last remnants of pernet inexact list

  Previous releases - regressions:

   - core: fix races in netdev_tx_sent_queue()/dev_watchdog()

   - bluetooth: fix UAF on sco_sock_timeout

   - eth: hv_netvsc: fix VF namespace also in synthetic NIC
     NETDEV_REGISTER event

   - eth: usbnet: fix name regression

   - eth: be2net: fix potential memory leak in be_xmit()

   - eth: plip: fix transmit path breakage

  Previous releases - always broken:

   - sched: deny mismatched skip_sw/skip_hw flags for actions created by
     classifiers

   - netfilter: bpf: must hold reference on net namespace

   - eth: virtio_net: fix integer overflow in stats

   - eth: bnxt_en: replace ptp_lock with irqsave variant

   - eth: octeon_ep: add SKB allocation failures handling in
     __octep_oq_process_rx()

  Misc:

   - MAINTAINERS: add Simon as an official reviewer"

* tag 'net-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
  net: dsa: mv88e6xxx: support 4000ps cycle counter period
  net: dsa: mv88e6xxx: read cycle counter period from hardware
  net: dsa: mv88e6xxx: group cycle counter coefficients
  net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition
  hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event
  net: dsa: microchip: disable EEE for KSZ879x/KSZ877x/KSZ876x
  Bluetooth: ISO: Fix UAF on iso_sock_timeout
  Bluetooth: SCO: Fix UAF on sco_sock_timeout
  Bluetooth: hci_core: Disable works on hci_unregister_dev
  posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()
  r8169: avoid unsolicited interrupts
  net: sched: use RCU read-side critical section in taprio_dump()
  net: sched: fix use-after-free in taprio_change()
  net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers
  net: usb: usbnet: fix name regression
  mlxsw: spectrum_router: fix xa_store() error checking
  virtio_net: fix integer overflow in stats
  net: fix races in netdev_tx_sent_queue()/dev_watchdog()
  net: wwan: fix global oob in wwan_rtnl_policy
  netfilter: xtables: fix typo causing some targets not to load on IPv6
  ...
2024-10-24 16:43:50 -07:00
Linus Torvalds
c9a50b9090 Merge tag 'hid-for-linus-20241024' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
 "Device-specific functionality quirks for Thinkpad X1 Gen3, Logitech
  Bolt and some Goodix touchpads (Bartłomiej Maryńczak, Hans de Goede
  and Kenneth Albanowski)"

* tag 'hid-for-linus-20241024' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: lenovo: Add support for Thinkpad X1 Tablet Gen 3 keyboard
  HID: multitouch: Add quirk for Logitech Bolt receiver w/ Casa touchpad
  HID: i2c-hid: Delayed i2c resume wakeup for 0x0d42 Goodix touchpad
2024-10-24 16:31:58 -07:00
Linus Torvalds
3964f82a4d Merge tag 'loongarch-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
 "Get correct cores_per_package for SMT systems, enable IRQ if do_ale()
  triggered in irq-enabled context, and fix some bugs about vDSO, memory
  managenent, hrtimer in KVM, etc"

* tag 'loongarch-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Mark hrtimer to expire in hard interrupt context
  LoongArch: Make KASAN usable for variable cpu_vabits
  LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  LoongArch: Don't crash in stack_top() for tasks without vDSO
  LoongArch: Set correct size for vDSO code mapping
  LoongArch: Enable IRQ if do_ale() triggered in irq-enabled context
  LoongArch: Get correct cores_per_package for SMT systems
  LoongArch: Use "Exception return address" to comment ERA
2024-10-24 14:17:34 -07:00