linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-11 12:44:49 -04:00

Author	SHA1	Message	Date
Michal Koutný	0876453147	net: cgroup: Guard users of sock_cgroup_classid() Exclude code that relies on sock_cgroup_classid() as preparation of removal of the function. Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-04-24 16:04:02 +02:00
Michal Koutný	3ba0032afe	netfilter: xt_cgroup: Make it independent from net_cls The xt_group matching supports the default hierarchy since commit `c38c4597e4` ("netfilter: implement xt_cgroup cgroup2 path match"). The cgroup v1 matching (based on clsid) and cgroup v2 matching (based on path) are rather independent. Downgrade the Kconfig dependency to mere CONFIG_SOCK_GROUP_DATA so that xt_group can be built even without CONFIG_NET_CLS_CGROUP for path matching. Also add a message for users when they attempt to specify any clsid. Link: https://lists.opensuse.org/archives/list/kernel@lists.opensuse.org/thread/S23NOILB7MUIRHSKPBOQKJHVSK26GP6X/ Cc: Jan Engelhardt <ej@inai.de> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-04-24 16:04:02 +02:00
Easwar Hariharan	f4293c2baf	netfilter: xt_IDLETIMER: convert timeouts to secs_to_jiffies() Commit `b35108a51c` ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication. This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies(E * 1000) +secs_to_jiffies(E) -msecs_to_jiffies(E * MSEC_PER_SEC) +secs_to_jiffies(E) Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-04-24 16:04:01 +02:00
Niklas Söderlund	bef4f1156b	net: phy: marvell-88q2xxx: Enable temperature sensor for mv88q211x The temperature sensor enabled for mv88q222x devices also functions for mv88q211x based devices. Unify the two devices probe functions to enable the sensors for all devices supported by this driver. The same oddity as for mv88q222x devices exists, the PHY link must be up for a correct temperature reading to be reported. # cat /sys/class/hwmon/hwmon9/temp1_input -75000 # ifconfig end5 up # cat /sys/class/hwmon/hwmon9/temp1_input 59000 Worth noting is that while the temperature register offsets and layout are the same between mv88q211x and mv88q222x devices their names in the datasheets are different. This change keeps the mv88q222x names for the mv88q211x support. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Dimitri Fedrau <dima.fedrau@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250418145800.2420751-1-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 13:19:51 +02:00
Paolo Abeni	b0e8cb1e16	Merge branch 'ipv6-no-rtnl-for-ipv6-routing-table' Kuniyuki Iwashima says: ==================== ipv6: No RTNL for IPv6 routing table. IPv6 routing tables are protected by each table's lock and work in the interrupt context, which means we basically don't need RTNL to modify an IPv6 routing table itself. Currently, the control paths require RTNL because we may need to perform device and nexthop lookups; we must prevent dev/nexthop from going away from the netns. This, however, can be achieved by RCU as well. If we are in the RCU critical section while adding an IPv6 route, synchronize_net() in __dev_change_net_namespace() and unregister_netdevice_many_notify() guarantee that the dev will not be moved to another netns or removed. Also, nexthop is guaranteed not to be freed during the RCU grace period. If we care about a race between nexthop removal and IPv6 route addition, we can get rid of RTNL from the control paths. Patch 1 moves a validation for RTA_MULTIPATH earlier. Patch 2 removes RTNL for SIOCDELRT and RTM_DELROUTE. Patch 3 ~ 11 moves validation and memory allocation earlier. Patch 12 prevents a race between two requests for the same table. Patch 13 & 14 prevents the nexthop race mentioned above. Patch 15 removes RTNL for SIOCADDRT and RTM_NEWROUTE. Test: The script [0] lets each CPU-X create 100000 routes on table-X in a batch. On c7a.metal-48xl EC2 instance with 192 CPUs, without this series: $ sudo ./route_test.sh start adding routes added 19200000 routes (100000 routes * 192 tables). total routes: 19200006 Time elapsed: 191577 milliseconds. with this series: $ sudo ./route_test.sh start adding routes added 19200000 routes (100000 routes * 192 tables). total routes: 19200006 Time elapsed: 62854 milliseconds. I changed the number of routes (1000 ~ 100000 per CPU/table) and consistently saw it finish 3x faster with this series. [0] mkdir tmp NS="test" ip netns add $NS ip -n $NS link add veth0 type veth peer veth1 ip -n $NS link set veth0 up ip -n $NS link set veth1 up TABLES=() for i in $(seq $(nproc)); do TABLES+=("$i") done ROUTES=() for i in {1..100}; do for j in {1..1000}; do ROUTES+=("2001:$i:$j::/64") done done for TABLE in "${TABLES[@]}"; do ( FILE="./tmp/batch-table-$TABLE.txt" > $FILE for ROUTE in "${ROUTES[@]}"; do echo "route add $ROUTE dev veth0 table $TABLE" >> $FILE done ) & done wait echo "start adding routes" START_TIME=$(date +%s%3N) for TABLE in "${TABLES[@]}"; do ip -n $NS -6 -batch "./tmp/batch-table-$TABLE.txt" & done wait END_TIME=$(date +%s%3N) ELAPSED_TIME=$((END_TIME - START_TIME)) echo "added $((${#ROUTES[@]} * ${#TABLES[@]})) routes (${#ROUTES[@]} routes * ${#TABLES[@]} tables)." echo "total routes: $(ip -n $NS -6 route show table all \| wc -l)" # Just for debug echo "Time elapsed: ${ELAPSED_TIME} milliseconds." ip netns del $NS rm -fr ./tmp/ v2: https://lore.kernel.org/netdev/20250409011243.26195-1-kuniyu@amazon.com/ v1: https://lore.kernel.org/netdev/20250321040131.21057-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20250418000443.43734-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:30:02 +02:00
Kuniyuki Iwashima	169fd62799	ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE. Now we are ready to remove RTNL from SIOCADDRT and RTM_NEWROUTE. The remaining things to do are 1. pass false to lwtunnel_valid_encap_type_attr() 2. use rcu_dereference_rtnl() in fib6_check_nexthop() 3. place rcu_read_lock() before ip6_route_info_create_nh(). Let's complete the RTNL-free conversion. When each CPU-X adds 100000 routes on table-X in a batch concurrently on c7a.metal-48xl EC2 instance with 192 CPUs, without this series: $ sudo ./route_test.sh ... added 19200000 routes (100000 routes * 192 tables). time elapsed: 191577 milliseconds. with this series: $ sudo ./route_test.sh ... added 19200000 routes (100000 routes * 192 tables). time elapsed: 62854 milliseconds. I changed the number of routes in each table (1000 ~ 100000) and consistently saw it finish 3x faster with this series. Note that now every caller of lwtunnel_valid_encap_type() passes false as the last argument, and this can be removed later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418000443.43734-16-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:56 +02:00
Kuniyuki Iwashima	081efd1832	ipv6: Protect nh->f6i_list with spinlock and flag. We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT. Then, we may be going to add a route tied to a dying nexthop. The nexthop itself is not freed during the RCU grace period, but if we link a route after __remove_nexthop_fib() is called for the nexthop, the route will be leaked. To avoid the race between IPv6 route addition under RCU vs nexthop deletion under RTNL, let's add a dead flag and protect it and nh->f6i_list with a spinlock. __remove_nexthop_fib() acquires the nexthop's spinlock and sets false to nh->dead, then calls ip6_del_rt() for the linked route one by one without the spinlock because fib6_purge_rt() acquires it later. While adding an IPv6 route, fib6_add() acquires the nexthop lock and checks the dead flag just before inserting the route. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418000443.43734-15-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:56 +02:00
Kuniyuki Iwashima	accb46b56b	ipv6: Defer fib6_purge_rt() in fib6_add_rt2node() to fib6_add(). The next patch adds per-nexthop spinlock which protects nh->f6i_list. When rt->nh is not NULL, fib6_add_rt2node() will be called under the lock. fib6_add_rt2node() could call fib6_purge_rt() for another route, which could holds another nexthop lock. Then, deadlock could happen between two nexthops. Let's defer fib6_purge_rt() after fib6_add_rt2node(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20250418000443.43734-14-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:56 +02:00
Kuniyuki Iwashima	834d97843e	ipv6: Protect fib6_link_table() with spinlock. We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT. If the request specifies a new table ID, fib6_new_table() is called to create a new routing table. Two concurrent requests could specify the same table ID, so we need a lock to protect net->ipv6.fib_table_hash[h]. Let's add a spinlock to protect the hash bucket linkage. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20250418000443.43734-13-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:56 +02:00
Kuniyuki Iwashima	71c0efb6d1	ipv6: Factorise ip6_route_multipath_add(). We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT and rely on RCU to guarantee dev and nexthop lifetime. Then, the RCU section will start before ip6_route_info_create_nh() in ip6_route_multipath_add(), but ip6_route_info_create() is called in the same loop and will sleep. Let's split the loop into ip6_route_mpath_info_create() and ip6_route_mpath_info_create_nh(). Note that ip6_route_info_append() is now integrated into ip6_route_mpath_info_create_nh() because we need to call different free functions for nexthops that passed ip6_route_info_create_nh(). In case of failure, the remaining nexthops that ip6_route_info_create_nh() has not been called for will be freed by ip6_route_mpath_info_cleanup(). OTOH, if a nexthop passes ip6_route_info_create_nh(), it will be linked to a local temporary list, which will be spliced back to rt6_nh_list. In case of failure, these nexthops will be released by fib6_info_release() in ip6_route_multipath_add(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418000443.43734-12-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:56 +02:00
Kuniyuki Iwashima	5a1ccff5c6	ipv6: Rename rt6_nh.next to rt6_nh.list. ip6_route_multipath_add() allocates struct rt6_nh for each config of multipath routes to link them to a local list rt6_nh_list. struct rt6_nh.next is the list node of each config, so the name is quite misleading. Let's rename it to list. Suggested-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/netdev/c9bee472-c94e-4878-8cc2-1512b2c54db5@redhat.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418000443.43734-11-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:56 +02:00
Kuniyuki Iwashima	87d5d921ea	ipv6: Don't pass net to ip6_route_info_append(). net is not used in ip6_route_info_append() after commit `36f19d5b4f` ("net/ipv6: Remove extra call to ip6_convert_metrics for multipath case"). Let's remove the argument. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20250418000443.43734-10-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:56 +02:00
Kuniyuki Iwashima	d27b9c40db	ipv6: Preallocate nhc_pcpu_rth_output in ip6_route_info_create(). ip6_route_info_create_nh() will be called under RCU. It calls fib_nh_common_init() and allocates nhc->nhc_pcpu_rth_output. As with the reason for rt->fib6_nh->rt6i_pcpu, we want to avoid GFP_ATOMIC allocation for nhc->nhc_pcpu_rth_output under RCU. Let's preallocate it in ip6_route_info_create(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418000443.43734-9-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:56 +02:00
Kuniyuki Iwashima	5720a328c3	ipv6: Preallocate rt->fib6_nh->rt6i_pcpu in ip6_route_info_create(). ip6_route_info_create_nh() will be called under RCU. Then, fib6_nh_init() is also under RCU, but per-cpu memory allocation is very likely to fail with GFP_ATOMIC while bulk-adding IPv6 routes and we would see a bunch of this message in dmesg. percpu: allocation failed, size=8 align=8 atomic=1, atomic alloc failed, no space left percpu: allocation failed, size=8 align=8 atomic=1, atomic alloc failed, no space left Let's preallocate rt->fib6_nh->rt6i_pcpu in ip6_route_info_create(). If something fails before the original memory allocation in fib6_nh_init(), ip6_route_info_create_nh() calls fib6_info_release(), which releases the preallocated per-cpu memory. Note that rt->fib6_nh->rt6i_pcpu is not preallocated when called via ipv6_stub, so we still need alloc_percpu_gfp() in fib6_nh_init(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418000443.43734-8-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:56 +02:00
Kuniyuki Iwashima	c4837b9853	ipv6: Split ip6_route_info_create(). We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT and rely on RCU to guarantee dev and nexthop lifetime. Then, we want to allocate as much as possible before entering the RCU section. The RCU section will start in the middle of ip6_route_info_create(), and this is problematic for ip6_route_multipath_add() that calls ip6_route_info_create() multiple times. Let's split ip6_route_info_create() into two parts; one for memory allocation and another for nexthop setup. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20250418000443.43734-7-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:56 +02:00
Kuniyuki Iwashima	c9cabe05e4	ipv6: Move nexthop_find_by_id() after fib6_info_alloc(). We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT. Then, we must perform two lookups for nexthop and dev under RCU to guarantee their lifetime. ip6_route_info_create() calls nexthop_find_by_id() first if RTA_NH_ID is specified, and then allocates struct fib6_info. nexthop_find_by_id() must be called under RCU, but we do not want to use GFP_ATOMIC for memory allocation here, which will be likely to fail in ip6_route_multipath_add(). Let's move nexthop_find_by_id() after the memory allocation so that we can later split ip6_route_info_create() into two parts: the sleepable part and the RCU part. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20250418000443.43734-6-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:56 +02:00
Kuniyuki Iwashima	e6f497955f	ipv6: Check GATEWAY in rtm_to_fib6_multipath_config(). In ip6_route_multipath_add(), we call rt6_qualify_for_ecmp() for each entry. If it returns false, the request fails. rt6_qualify_for_ecmp() returns false if either of the conditions below is true: 1. f6i->fib6_flags has RTF_ADDRCONF 2. f6i->nh is not NULL 3. f6i->fib6_nh->fib_nh_gw_family is AF_UNSPEC 1 is unnecessary because rtm_to_fib6_config() never sets RTF_ADDRCONF to cfg->fc_flags. 2. is equivalent with cfg->fc_nh_id. 3. can be replaced by checking RTF_GATEWAY in the base and each multipath entry because AF_INET6 is set to f6i->fib6_nh->fib_nh_gw_family only when cfg.fc_is_fdb is true or RTF_GATEWAY is set, but the former is always false. These checks do not require RCU and can be done earlier. Let's perform the equivalent checks in rtm_to_fib6_multipath_config(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418000443.43734-5-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:56 +02:00
Kuniyuki Iwashima	fa76c1674f	ipv6: Move some validation from ip6_route_info_create() to rtm_to_fib6_config(). ip6_route_info_create() is called from 3 functions: * ip6_route_add() * ip6_route_multipath_add() * addrconf_f6i_alloc() addrconf_f6i_alloc() does not need validation for struct fib6_config in ip6_route_info_create(). ip6_route_multipath_add() calls ip6_route_info_create() for multiple routes with slightly different fib6_config instances, which is copied from the base config passed from userspace. So, we need not validate the same config repeatedly. Let's move such validation into rtm_to_fib6_config(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20250418000443.43734-4-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:55 +02:00
Kuniyuki Iwashima	bd11ff421d	ipv6: Get rid of RTNL for SIOCDELRT and RTM_DELROUTE. Basically, removing an IPv6 route does not require RTNL because the IPv6 routing tables are protected by per table lock. inet6_rtm_delroute() calls nexthop_find_by_id() to check if the nexthop specified by RTA_NH_ID exists. nexthop uses rbtree and the top-down walk can be safely performed under RCU. ip6_route_del() already relies on RCU and the table lock, but we need to extend the RCU critical section a bit more to cover __ip6_del_rt(). For example, nexthop_for_each_fib6_nh() and inet6_rt_notify() needs RCU. Let's call nexthop_find_by_id() and __ip6_del_rt() under RCU and get rid of RTNL from inet6_rtm_delroute() and SIOCDELRT. Even if the nexthop is removed after rcu_read_unlock() in inet6_rtm_delroute(), __remove_nexthop_fib() cleans up the routes tied to the nexthop, and ip6_route_del() returns -ESRCH. So the request was at least valid as of nexthop_find_by_id(), and it's just a matter of timing. Note that we need to pass false to lwtunnel_valid_encap_type_attr(). The following patches also use the newroute bool. Note also that fib6_get_table() does not require RCU because once allocated fib6_table is not freed until netns dismantle. I will post a follow-up series to convert such callers to RCU-lockless version. [0] Link: https://lore.kernel.org/netdev/20250417174557.65721-1-kuniyu@amazon.com/ #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418000443.43734-3-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:55 +02:00
Kuniyuki Iwashima	4cb4861d8c	ipv6: Validate RTA_GATEWAY of RTA_MULTIPATH in rtm_to_fib6_config(). We will perform RTM_NEWROUTE and RTM_DELROUTE under RCU, and then we want to perform some validation out of the RCU scope. When creating / removing an IPv6 route with RTA_MULTIPATH, inet6_rtm_newroute() / inet6_rtm_delroute() validates RTA_GATEWAY in each multipath entry. Let's do that in rtm_to_fib6_config(). Note that now RTM_DELROUTE returns an error for RTA_MULTIPATH with 0 entries, which was accepted but should result in -EINVAL as RTM_NEWROUTE. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418000443.43734-2-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-24 09:29:55 +02:00
Jakub Kicinski	abcec3ed92	Merge branch 'net-mlx5-hws-improve-ip-version-handling' Mark Bloch says: ==================== net/mlx5: HWS, Improve IP version handling This small series hardens our checks against a single matcher containing rules that match on IPv4 and IPv6. This scenario is not supported by hardware steering and the implementation now signals this instead of failing silently. Patches: * Patch 1 forbids a single definer to match on mixed IP versions for source and destination address. * Patch 2 reproduces a couple of firmware checks: it forbids creating a definer that matches on IP address without matching on IP version, and also disallows matching on IPv6 addresses and the IPv4 IHL fields in the same definer. * Patch 3 forbids mixing rules that match on IPv4 and IPv6 addresses in the same matcher. The underlying definer mechanism does not support that. ==================== Link: https://patch.msgid.link/20250422092540.182091-1-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 18:48:14 -07:00
Vlad Dogaru	f41f3edf0b	net/mlx5: HWS, Disallow matcher IP version mixing Signal clearly to the user, via an error, that mixing IPv4 and IPv6 rules in the same matcher is not supported. Previously such cases silently failed by adding a rule that did not work correctly. Rules can specify an IP version by one of two fields: IP version or ethertype. At matcher creation, store whether the template matches on any of these two fields. If yes, inspect each rule for its corresponding match value and store the IP version inside the matcher to guard against inconsistencies with subsequent rules. Furthermore, also check rules for internal consistency, i.e. verify that the ethertype and IP version match values do not contradict each other. The logic applies to inner and outer headers independently, to account for tunneling. Rules that do not match on IP addresses are not affected. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250422092540.182091-4-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 18:48:11 -07:00
Vlad Dogaru	6991a975e4	net/mlx5: HWS, Harden IP version definer checks Replicate some sanity checks that firmware does, since hardware steering does not go through firmware. When creating a definer, disallow matching on IP addresses without also matching on IP version. The latter can be satisfied by matching either on the version field in the IP header, or on the ethertype field. Also refuse to match IPv4 IHL alongside IPv6. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250422092540.182091-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 18:48:11 -07:00
Vlad Dogaru	5f2f8d8b68	net/mlx5: HWS, Fix IP version decision Unify the check for IP version when creating a definer. A given matcher is deemed to match on IPv6 if any of the higher order (>31) bits of source or destination address mask are set. A single packet cannot mix IP versions between source and destination addresses, so it makes no sense that they would be decided on independently. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250422092540.182091-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 18:48:11 -07:00
Hariprasad Kelam	b5cdb9b311	octeontx2-pf: AF_XDP: code clean up The current API, otx2_xdp_sq_append_pkt, verifies the number of available descriptors before sending packets to the hardware. However, for AF_XDP, this check is unnecessary because the batch value is already determined based on the free descriptors. This patch introduces a new API, "otx2_xsk_sq_append_pkt" to address this. Remove the logic for releasing the TX buffers, as it is implicitly handled by xsk_tx_peek_release_desc_batch Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20250420032350.4047706-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 18:33:14 -07:00
Jakub Kicinski	3fec58f5a4	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== igc: Add support for Frame Preemption Faizal Rahim says: Introduce support for the FPE feature in the IGC driver. The patches aligns with the upstream FPE API: https://patchwork.kernel.org/project/netdevbpf/cover/20230220122343.1156614-1-vladimir.oltean@nxp.com/ https://patchwork.kernel.org/project/netdevbpf/cover/20230119122705.73054-1-vladimir.oltean@nxp.com/ It builds upon earlier work: https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/ The patch series adds the following functionalities to the IGC driver: a) Configure FPE using `ethtool --set-mm`. b) Display FPE settings via `ethtool --show-mm`. c) View FPE statistics using `ethtool --include-statistics --show-mm'. e) Block setting preemptible tc in taprio since it is not supported yet. Existing code already blocks it in mqprio. Tested: Enabled CONFIG_PROVE_LOCKING, CONFIG_DEBUG_ATOMIC_SLEEP, CONFIG_DMA_API_DEBUG, and CONFIG_KASAN 1) selftests 2) netdev down/up cycles 3) suspend/resume cycles 4) fpe verification No bugs or unusual dmesg logs were observed. Ran 1), 2) and 3) with and without the patch series, compared dmesg and selftest logs - no differences found. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: add support to get frame preemption statistics via ethtool igc: add support to get MAC Merge data via ethtool igc: block setting preemptible traffic class in taprio igc: add support to set tx-min-frag-size igc: add support for frame preemption verification igc: set the RX packet buffer size for TSN mode igc: use FIELD_PREP and GENMASK for existing RX packet buffer size igc: optimize TX packet buffer utilization for TSN mode igc: use FIELD_PREP and GENMASK for existing TX packet buffer size igc: rename I225_RXPBSIZE_DEFAULT and I225_TXPBSIZE_DEFAULT igc: rename xdp_get_tx_ring() for non-xdp usage net: ethtool: mm: reset verification status when link is down net: ethtool: mm: extract stmmac verification logic into common library net: stmmac: move frag_size handling out of spin_lock ==================== Link: https://patch.msgid.link/20250418163822.3519810-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 18:31:55 -07:00
Jakub Kicinski	a484fe8806	Merge branch 'enable-multiple-irq-lines-support-in-airoha_eth-driver' Lorenzo Bianconi says: ==================== Enable multiple IRQ lines support in airoha_eth driver EN7581 ethernet SoC supports 4 programmable IRQ lines each one composed by 4 IRQ configuration registers to map Tx/Rx queues. Enable multiple IRQ lines support. ==================== Link: https://patch.msgid.link/20250418-airoha-eth-multi-irq-v1-0-1ab0083ca3c1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 17:03:54 -07:00
Lorenzo Bianconi	f252493e18	net: airoha: Enable multiple IRQ lines support in airoha_eth driver. EN7581 ethernet SoC supports 4 programmable IRQ lines for Tx and Rx interrupts. Enable multiple IRQ lines support. Map Rx/Tx queues to the available IRQ lines using the default scheme used in the vendor SDK: - IRQ0: rx queues [0-4],[7-9],15 - IRQ1: rx queues [21-30] - IRQ2: rx queues 5 - IRQ3: rx queues 6 Tx queues interrupts are managed by IRQ0. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250418-airoha-eth-multi-irq-v1-2-1ab0083ca3c1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 17:03:53 -07:00
Lorenzo Bianconi	9439db26d3	net: airoha: Introduce airoha_irq_bank struct EN7581 ethernet SoC supports 4 programmable IRQ lines each one composed by 4 IRQ configuration registers. Add airoha_irq_bank struct as a container for independent IRQ lines info (e.g. IRQ number, enabled source interrupts, ecc). This is a preliminary patch to support multiple IRQ lines in airoha_eth driver. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250418-airoha-eth-multi-irq-v1-1-1ab0083ca3c1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 17:03:53 -07:00
Jakub Kicinski	cd7276ecac	Merge branch 'r8169-merge-chip-versions' Heiner Kallweit says: ==================== r8169: merge chip versions After `2b065c098c` ("r8169: refactor chip version detection") we can merge handling of few chip versions. ==================== Link: https://patch.msgid.link/5e1e14ea-d60f-4608-88eb-3104b6bbace8@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:58:05 -07:00
Heiner Kallweit	4f51e7d370	r8169: merge chip versions 52 and 53 (RTL8117) Handling of both chip versions is the same, only difference is the firmware. So we can merge handling of both chip versions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/ae866b71-c904-434e-befb-848c831e33ff@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:58:03 -07:00
Heiner Kallweit	f372ef6ed5	r8169: merge chip versions 64 and 65 (RTL8125D) Handling of both chip versions is the same, only difference is the firmware. So we can merge handling of both chip versions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/0baad123-c679-4154-923f-fdc12783e900@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:58:02 -07:00
Heiner Kallweit	4dec0702b8	r8169: merge chip versions 70 and 71 (RTL8126A) Handling of both chip versions is the same, only difference is the firmware. So we can merge handling of both chip versions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/97d7ae79-d021-4b6b-b424-89e5e305b029@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:58:02 -07:00
Heiner Kallweit	52358dd63e	net: phy: remove function stubs All callers of these functions depend on PHYLIB or select it directly or indirectly by selecting PHYLINK. Stubs make sense for optional functionality, but that's not the case here. MDIO_XGENE usually is selected by NET_XGENE which also selects PHYLIB. Add a dependency to PHYLIB nevertheless, in order not to break randconfig builds. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/f7a69a1f-60e9-4ac0-8b7c-481e0cc850e7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:56:39 -07:00
Kuniyuki Iwashima	f0cc3777b2	net: Fix wild-memory-access in __register_pernet_operations() when CONFIG_NET_NS=n. kernel test robot reported the splat below. [0] Before commit `fed176bf31` ("net: Add ops_undo_single for module load/unload."), if CONFIG_NET_NS=n, ops was linked to pernet_list only when init_net had not been initialised, and ops was unlinked from pernet_list only under the same condition. Let's say an ops is loaded before the init_net setup but unloaded after that. Then, the ops remains in pernet_list, which seems odd. The cited commit added ops_undo_single(), which calls list_add() for ops to link it to a temporary list, so a minor change was added to __register_pernet_operations() and __unregister_pernet_operations() under CONFIG_NET_NS=n to avoid the pernet_list corruption. However, the corruption must have been left as is. When CONFIG_NET_NS=n, pernet_list was used to keep ops registered before the init_net setup, and after that, pernet_list was not used at all. This was because some ops annotated with __net_initdata are cleared out of memory at some point during boot. Then, such ops is initialised by POISON_FREE_INITMEM (0xcc), resulting in that ops->list.{next,prev} suddenly switches from a valid pointer to a weird value, 0xcccccccccccccccc. To avoid such wild memory access, let's allow the pernet_list corruption for CONFIG_NET_NS=n. [0]: Oops: general protection fault, probably for non-canonical address 0xf999959999999999: 0000 [#1] SMP KASAN NOPTI KASAN: maybe wild-memory-access in range [0xccccccccccccccc8-0xcccccccccccccccf] CPU: 2 UID: 0 PID: 346 Comm: modprobe Not tainted 6.15.0-rc1-00294-ga4cba7e98e35 #85 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__list_add_valid_or_report (lib/list_debug.c:32) Code: 48 c1 ea 03 80 3c 02 00 0f 85 5a 01 00 00 49 39 74 24 08 0f 85 83 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 1f 01 00 00 4c 39 26 0f 85 ab 00 00 00 4c 39 ee RSP: 0018:ff11000135b87830 EFLAGS: 00010a07 RAX: dffffc0000000000 RBX: ffffffffc02223c0 RCX: ffffffff8406fcc2 RDX: 1999999999999999 RSI: cccccccccccccccc RDI: ffffffffc02223c0 RBP: ffffffff86064e40 R08: 0000000000000001 R09: fffffbfff0a9f5b5 R10: ffffffff854fadaf R11: 676552203a54454e R12: ffffffff86064e40 R13: ffffffffc02223c0 R14: ffffffff86064e48 R15: 0000000000000021 FS: 00007f6fb0d9e1c0(0000) GS:ff11000858ea0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6fb0eda580 CR3: 0000000122fec005 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> register_pernet_operations (./include/linux/list.h:150 (discriminator 5) ./include/linux/list.h:183 (discriminator 5) net/core/net_namespace.c:1315 (discriminator 5) net/core/net_namespace.c:1359 (discriminator 5)) register_pernet_subsys (net/core/net_namespace.c:1401) inet6_init (net/ipv6/af_inet6.c:535) ipv6 do_one_initcall (init/main.c:1257) do_init_module (kernel/module/main.c:2942) load_module (kernel/module/main.c:3409) init_module_from_file (kernel/module/main.c:3599) idempotent_init_module (kernel/module/main.c:3611) __x64_sys_finit_module (./include/linux/file.h:62 ./include/linux/file.h:83 kernel/module/main.c:3634 kernel/module/main.c:3621 kernel/module/main.c:3621) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) RIP: 0033:0x7f6fb0df7e5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007fffdc6a8968 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000055b535721b70 RCX: 00007f6fb0df7e5d RDX: 0000000000000000 RSI: 000055b51e44aa2a RDI: 0000000000000004 RBP: 0000000000040000 R08: 0000000000000000 R09: 000055b535721b30 R10: 0000000000000004 R11: 0000000000000246 R12: 000055b51e44aa2a R13: 000055b535721bf0 R14: 000055b5357220b0 R15: 0000000000000000 </TASK> Modules linked in: ipv6(+) crc_ccitt Fixes: `fed176bf31` ("net: Add ops_undo_single for module load/unload.") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202504181511.1c3f23e4-lkp@intel.com Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418215025.87871-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:24:56 -07:00
Jakub Kicinski	d219fab875	Merge branch 'netlink-specs-rtnetlink-adjust-specs-for-c-codegen' Jakub Kicinski says: ==================== netlink: specs: rtnetlink: adjust specs for C codegen The first patch brings a schema extension allowing specifying "header" (as in .h file) properties in attribute sets. This is used for rare cases where we carry attributes from another family in a nest - we need to include the extra headers. If we were to generate kernel code we'd also need to skip it in the uAPI output. The remaining 11 patches are pretty boring schema adjustments. ==================== Link: https://patch.msgid.link/20250418021706.1967583-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:23 -07:00
Jakub Kicinski	620b38232f	netlink: specs: rt-rule: add C naming info Add properties needed for C codegen to match names with uAPI headers. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-13-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:16 -07:00
Jakub Kicinski	e3d199d309	netlink: specs: rtnetlink: correct notify properties The notify property should point at the object the notifications carry, usually the get object, not the cmd which triggers the notification: notify: description: Name of the command sharing the reply type with this notification. Not treating this as a fix, I think that only C codegen cares. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-12-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:16 -07:00
Jakub Kicinski	eee94a89c5	netlink: specs: rt-neigh: make sure getneigh is consistent The consistency check complains replies to do and dump don't match because dump has no value. It doesn't have to by the schema... but fixing this in code gen would be more code than adjusting the spec. This is rare. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-11-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:16 -07:00
Jakub Kicinski	cd879795c3	netlink: specs: rt-neigh: add C naming info Add properties needed for C codegen to match names with uAPI headers. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:16 -07:00
Jakub Kicinski	622d7050cf	netlink: specs: rt-link: add notification for newlink Add a notification entry for netlink so that we can test ntf handling in classic netlink and C. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:16 -07:00
Jakub Kicinski	1c224f19ff	netlink: specs: rt-link: make bond's ipv6 address attribute fixed size ns-ip6-target is an indexed-array. Codegen for variable size binary array would be a bit tedious, tell C that we know the size of these attributes, since they are IPv6 addrs. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:16 -07:00
Jakub Kicinski	e6e1f53f02	netlink: specs: rt-link: adjust AF_ nest for C codegen The AF nest is indexed by AF ID, so it's a bit strange, but with minor adjustments C codegen deals with it just fine. Entirely unclear why the names have been in quotes here. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:16 -07:00
Jakub Kicinski	b12b0f4181	netlink: specs: rt-link: add C naming info Add properties needed for C codegen to match names with uAPI headers. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:15 -07:00
Jakub Kicinski	c703d258f6	netlink: specs: rt-link: remove duplicated group in attr list group is listed twice for newlink. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:15 -07:00
Jakub Kicinski	ed43ce6ab2	netlink: specs: rt-link: remove if-netnsid from attr list if-netnsid an alias to target-netnsid: IFLA_TARGET_NETNSID = IFLA_IF_NETNSID, /* new alias */ We don't have a definition for this attr. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:15 -07:00
Jakub Kicinski	43b606d984	netlink: specs: rt-link: remove the fixed members from attrs The purpose of the attribute list is to list the attributes which will be included in a given message to shrink the objects for families with huge attr spaces. Fixed headers are always present in their entirety (between netlink header and the attrs) so there's no point in listing their members. Current C codegen doesn't expect them and tries to look them up in the attribute space. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:15 -07:00
Jakub Kicinski	7965facefa	netlink: specs: allow header properties for attribute sets rt-link has a number of disjoint headers, plus it uses attributes of other families (e.g. DPLL). Allow declaring a attribute set as "foreign" by specifying which header its definition is coming from. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:07:15 -07:00
Russell King (Oracle)	87f43e6f06	net: stmmac: dwc-qos: calibrate tegra with mdio bus idle Thierry states that there are prerequists for Tegra's calibration that should be met before starting calibration - both the RGMII and MDIO interfaces should be idle. This commit adds the necessary MII bus locking to ensure that the MDIO interface is idle during calibration. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/E1u7EYR-001ZAS-Cr@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 16:01:22 -07:00
Arnd Bergmann	8ff6175139	bnxt_en: hide CONFIG_DETECT_HUNG_TASK specific code The CONFIG_DEFAULT_HUNG_TASK_TIMEOUT setting is only available when the hung task detection is enabled, otherwise the code now produces a build failure: drivers/net/ethernet/broadcom/bnxt/bnxt.c:10188:21: error: use of undeclared identifier 'CONFIG_DEFAULT_HUNG_TASK_TIMEOUT' 10188 \| max_tmo_secs > CONFIG_DEFAULT_HUNG_TASK_TIMEOUT) { Enclose this warning logic in an #ifdef to ensure this builds. Fixes: `0fcad44a86` ("bnxt_en: Change FW message timeout warning") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250423162827.2189658-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-23 14:46:00 -07:00

1 2 3 4 5 ...

1352170 Commits