Commit Graph

1268248 Commits

Author SHA1 Message Date
Alexis Lothoré
a5d6b1d453 wifi: wilc1000: make sdio deinit function really deinit the sdio card
In order to be able to read raw registers (eg the nv mac address) in
wilc1000 during probe before the firmware is loaded and running, we need to
run the basic sdio functions initialization, but then we also need to
properly deinitialize those right after, to preserve the current driver
behavior (keeping the chip idle/unconfigured until the corresponding
interface is brought up). Calling wilc_sdio_deinit in its current form is
not enough because it merely resets an internal flag.

Implement a deinit sequence which symmetrically reset all steps performed
in wilc_sdio_init (only for parts activating/deactivating features, for the
sake of simplicity, let's ignore blocks size configuration reset)

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240514-mac_addr_at_probe-v2-5-afef09f1cd10@bootlin.com
2024-05-17 11:01:52 +03:00
Alexis Lothoré
59cf9277c1 wifi: wilc1000: add function to read mac address from eFuse
wilc driver currently reads and sets mac address by firmware calls. It
means that we can not access mac address if no interface has been brought
up (so firmware is up and running). Another way to get mac address is to
read it directly from eFUSE.

Add a function helper to read the mac address written in eFuse, without
firmware assistance

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240514-mac_addr_at_probe-v2-4-afef09f1cd10@bootlin.com
2024-05-17 11:01:51 +03:00
Alexis Lothoré
5f1191ed38 wifi: wilc1000: set wilc_set_mac_address parameter as const
Any attempt to provide a const mac address to wilc_set_mac_address results
in the following warning:

warning: passing argument 2 of 'wilc_set_mac_address' discards 'const'
qualifier from pointer target type [-Wdiscarded-qualifiers]
[...]
drivers/net/wireless/microchip/wilc1000/hif.h:170:52: note: expected 'u8 *'
{aka 'unsigned char *'} but argument is of type 'const unsigned char *'a
int wilc_set_mac_address(struct wilc_vif *vif, u8 *mac_addr);

Instead of using an explicit cast each time we need provide a MAC address,
set the function parameter as const

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240514-mac_addr_at_probe-v2-3-afef09f1cd10@bootlin.com
2024-05-17 11:01:51 +03:00
Alexis Lothoré
ec99908906 wifi: wilc1000: register net device only after bus being fully initialized
SDIO/SPI probes functions automatically add a default wlan interface on top
of registered wiphy, through wilc_cfg80211_init which in turn calls
wilc_netdev_ifc_init. However, bus is still not fully initialized when we
register corresponding net device (for example we still miss some private
driver data pointers), which for example makes it impossible to
retrieve MAC address from chip (which is supposed to be set on net device
before its registration) before registering net device. More generally, net
device registration should not be done until driver has fully initialized
everything and is ready to handle any operation  on the net device.

Prevent net device from being registered so early by doing it at the end of
probe functions. Apply this logic to both sdio and spi buses.

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240514-mac_addr_at_probe-v2-2-afef09f1cd10@bootlin.com
2024-05-17 11:01:51 +03:00
Alexis Lothoré
6fe46d5c0a wifi: wilc1000: set net device registration as last step during interface creation
net device registration is currently done in wilc_netdev_ifc_init but
other initialization operations are still done after this registration.
Since net device is assumed to be usable right after registration, it
should be the very last step of initialization.

Move netdev registration at the very end of wilc_netdev_ifc_init to let
this function completely initialize netdevice before registering it.

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240514-mac_addr_at_probe-v2-1-afef09f1cd10@bootlin.com
2024-05-17 11:01:51 +03:00
Samasth Norway Ananda
c636fa85fe wifi: brcmsmac: LCN PHY code is used for BCM4313 2G-only device
The band_idx variable in the function wlc_lcnphy_tx_iqlo_cal() will
never be set to 1 as BCM4313 is the only device for which the LCN PHY
code is used. This is a 2G-only device.

Fixes: 5b435de0d7 ("net: wireless: add brcm80211 drivers")
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240509231037.2014109-1-samasth.norway.ananda@oracle.com
2024-05-14 16:31:37 +03:00
Andrii Batyiev
02b682d545 wifi: iwlegacy: do not skip frames with bad FCS
Monitor/sniffer mode benefits from all types of frames, even if FCS
check fails. But we must mark frames as such.

Tested on iwl3945 only.

Signed-off-by: Andrii Batyiev <batyiev@gmail.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240509101140.32664-1-batyiev@gmail.com
2024-05-14 16:31:09 +03:00
Jakub Kicinski
83127ecada Merge tag 'wireless-next-2024-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:

====================
wireless-next patches for v6.10

The third, and most likely the last, "new features" pull request for
v6.10 with changes both in stack and in drivers. In ath12k and rtw89
we disabled Wireless Extensions just like with iwlwifi earlier. Wi-Fi
7 devices will not support Wireless Extensions (WEXT) anymore so if
someone is still using the legacy WEXT interface it's time to switch
to nl80211 now!

We merged wireless into wireless-next as we decided not to send a
wireless pull request to v6.9 this late in the cycle. Also an
immutable branch with MHI subsystem was merged to get ath11k and
ath12k hibernation working.

Major changes:

mac80211/cfg80211
 * handle color change per link

mt76
 * mt7921 LED control
 * mt7925 EHT radiotap support
 * mt7920e PCI support

ath12k
 * debugfs support
 * dfs_simulate_radar debugfs file
 * disable Wireless Extensions
 * suspend and hibernation support
 * ACPI support
 * refactoring in preparation of multi-link support

ath11k
 * support hibernation (required changes in qrtr and MHI subsystems)
 * ieee80211-freq-limit Device Tree property support

ath10k
 * firmware-name Device Tree property support

rtw89
 * complete features of new WiFi 7 chip 8922AE including BT-coexistence
   and WoWLAN
 * use BIOS ACPI settings to set TX power and channels
 * disable Wireless Extensios on Wi-Fi 7 devices

iwlwifi
 * block_esr debugfs file
 * support again firmware API 90 (was reverted earlier)
 * provide channel survey information for Automatic Channel Selection (ACS)

* tag 'wireless-next-2024-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (214 commits)
  wifi: mwl8k: initialize cmd->addr[] properly
  wifi: iwlwifi: Ensure prph_mac dump includes all addresses
  wifi: iwlwifi: mvm: don't request statistics in restart
  wifi: iwlwifi: mvm: exit EMLSR if secondary link is not used
  wifi: iwlwifi: mvm: add beacon template version 14
  wifi: iwlwifi: mvm: align UATS naming with firmware
  wifi: iwlwifi: Force SCU_ACTIVE for specific platforms
  wifi: iwlwifi: mvm: record and return channel survey information
  wifi: iwlwifi: mvm: add the firmware API for channel survey
  wifi: iwlwifi: mvm: Fix race in scan completion
  wifi: iwlwifi: mvm: Add a print for invalid link pair due to bandwidth
  wifi: iwlwifi: mvm: add a debugfs for reading EMLSR blocking reasons
  wifi: iwlwifi: mvm: Add active EMLSR blocking reasons prints
  wifi: iwlwifi: bump FW API to 90 for BZ/SC devices
  wifi: iwlwifi: mvm: fix primary link setting
  wifi: iwlwifi: mvm: use already determined cmd_id
  wifi: iwlwifi: mvm: don't reset link selection during restart
  wifi: iwlwifi: Print EMLSR states name
  wifi: iwlwifi: mvm: Block EMLSR when a p2p/softAP vif is active
  wifi: iwlwifi: mvm: fix typo in debug print
  ...
====================

Link: https://lore.kernel.org/r/20240508120726.85A10C113CC@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08 19:09:38 -07:00
Jakub Kicinski
d9308f51b3 Merge branch 'netdevsim-add-napi-support'
David Wei says:

====================
netdevsim: add NAPI support

Add NAPI support to netdevsim and register its Rx queues with NAPI
instances. Then add a selftest using the new netdev Python selftest
infra to exercise the existing Netdev Netlink API, specifically the
queue-get API.

This expands test coverage and further fleshes out netdevsim as a test
device. It's still my goal to make it useful for testing things like
flow steering and ZC Rx.
====================

Link: https://lore.kernel.org/r/20240507163228.2066817-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08 18:59:51 -07:00
David Wei
1cf2704242 net: selftest: add test for netdev netlink queue-get API
Add a selftest for netdev generic netlink. For now there is only a
single test that exercises the `queue-get` API.

The test works with netdevsim by default or with a real device by
setting NETIF.

Add a timeout param to cmd() since ethtool -L can take a long time on
real devices.

Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://lore.kernel.org/r/20240507163228.2066817-3-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08 18:59:47 -07:00
David Wei
3762ec05a9 netdevsim: add NAPI support
Add NAPI support to netdevim, similar to veth.

* Add a nsim_rq rx queue structure to hold a NAPI instance and a skb
  queue.
* During xmit, store the skb in the peer skb queue and schedule NAPI.
* During napi_poll(), drain the skb queue and pass up the stack.
* Add assoc between rxq and NAPI instance using netif_queue_set_napi().

Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://lore.kernel.org/r/20240507163228.2066817-2-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08 18:59:47 -07:00
Willem de Bruijn
1d0dc857b5 selftests: drv-net: add checksum tests
Run tools/testing/selftest/net/csum.c as part of drv-net.
This binary covers multiple scenarios, based on arguments given,
for both IPv4 and IPv6:

- Accept UDP correct checksum
- Detect UDP invalid checksum
- Accept TCP correct checksum
- Detect TCP invalid checksum

- Transmit UDP: basic checksum offload
- Transmit UDP: zero checksum conversion

The test direction is reversed between receive and transmit tests, so
that the NIC under test is always the local machine.

In total this adds up to 12 testcases, with more to follow. For
conciseness, I replaced individual functions with a function factory.

Also detect hardware offload feature availability using Ethtool
netlink and skip tests when either feature is off. This need may be
common for offload feature tests and eventually deserving of a thin
wrapper in lib.py.

Missing are the PF_PACKET based send tests ('-P'). These use
virtio_net_hdr to program hardware checksum offload. Which requires
looking up the local MAC address and (harder) the MAC of the next hop.
I'll have to give it some though how to do that robustly and where
that code would belong.

Tested:

        make -C tools/testing/selftests/ \
                TARGETS="drivers/net drivers/net/hw" \
                install INSTALL_PATH=/tmp/ksft
        cd /tmp/ksft

	sudo NETIF=ens4 REMOTE_TYPE=ssh \
		REMOTE_ARGS="root@10.40.0.2" \
		LOCAL_V4="10.40.0.1" \
		REMOTE_V4="10.40.0.2" \
		./run_kselftest.sh -t drivers/net/hw:csum.py

Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240507154216.501111-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08 18:57:55 -07:00
Eric Dumazet
58a4ff5d77 phonet: no longer hold RTNL in route_dumpit()
route_dumpit() already relies on RCU, RTNL is not needed.

Also change return value at the end of a dump.
This allows NLMSG_DONE to be appended to the current
skb at the end of a dump, saving a couple of recvmsg()
system calls.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Remi Denis-Courmont <courmisch@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240507121748.416287-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08 18:54:50 -07:00
Eric Dumazet
8d8b1a422c net: annotate data-races around dev->if_port
Various ndo_set_config() methods can change dev->if_port

dev->if_port is going to be read locklessly from
rtnl_fill_link_ifmap().

Add corresponding WRITE_ONCE() on writer sides.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240507184144.1230469-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08 18:51:30 -07:00
Eric Dumazet
e2d09e5a1e net: dst_cache: minor optimization in dst_cache_set_ip6()
There is no need to use this_cpu_ptr(dst_cache->cache) twice.

Compiler is unable to optimize the second call, because of
per-cpu constraints.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240507132717.627518-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08 18:50:11 -07:00
Eric Dumazet
3b09b2bd0d net: dst_cache: annotate data-races around dst_cache->reset_ts
dst_cache->reset_ts is read or written locklessly,
add READ_ONCE() and WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240507132000.614591-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08 18:49:51 -07:00
Donald Hunter
e497c3228a netlink/specs: Add VF attributes to rt_link spec
Add support for retrieving VFs as part of link info. For example:

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_link.yaml \
  --do getlink --json '{"ifi-index": 38, "ext-mask": ["vf", "skip-stats"]}'
{'address': 'b6:75:91:f2:64:65',
 [snip]
 'vfinfo-list': {'info': [{'broadcast': b'\xff\xff\xff\xff\xff\xff\x00\x00'
                                        b'\x00\x00\x00\x00\x00\x00\x00\x00'
                                        b'\x00\x00\x00\x00\x00\x00\x00\x00'
                                        b'\x00\x00\x00\x00\x00\x00\x00\x00',
                           'link-state': {'link-state': 'auto', 'vf': 0},
                           'mac': {'mac': b'\x00\x00\x00\x00\x00\x00\x00\x00'
                                          b'\x00\x00\x00\x00\x00\x00\x00\x00'
                                          b'\x00\x00\x00\x00\x00\x00\x00\x00'
                                          b'\x00\x00\x00\x00\x00\x00\x00\x00',
                                   'vf': 0},
                           'rate': {'max-tx-rate': 0,
                                    'min-tx-rate': 0,
                                    'vf': 0},
                           'rss-query-en': {'setting': 0, 'vf': 0},
                           'spoofchk': {'setting': 0, 'vf': 0},
                           'trust': {'setting': 0, 'vf': 0},
                           'tx-rate': {'rate': 0, 'vf': 0},
                           'vlan': {'qos': 0, 'vf': 0, 'vlan': 0},
                           'vlan-list': {'info': [{'qos': 0,
                                                   'vf': 0,
                                                   'vlan': 0,
                                                   'vlan-proto': 0}]}},
                          {'broadcast': b'\xff\xff\xff\xff\xff\xff\x00\x00'
                                        b'\x00\x00\x00\x00\x00\x00\x00\x00'
                                        b'\x00\x00\x00\x00\x00\x00\x00\x00'
                                        b'\x00\x00\x00\x00\x00\x00\x00\x00',
                           'link-state': {'link-state': 'auto', 'vf': 1},
                           'mac': {'mac': b'\x00\x00\x00\x00\x00\x00\x00\x00'
                                          b'\x00\x00\x00\x00\x00\x00\x00\x00'
                                          b'\x00\x00\x00\x00\x00\x00\x00\x00'
                                          b'\x00\x00\x00\x00\x00\x00\x00\x00',
                                   'vf': 1},
                           'rate': {'max-tx-rate': 0,
                                    'min-tx-rate': 0,
                                    'vf': 1},
                           'rss-query-en': {'setting': 0, 'vf': 1},
                           'spoofchk': {'setting': 0, 'vf': 1},
                           'trust': {'setting': 0, 'vf': 1},
                           'tx-rate': {'rate': 0, 'vf': 1},
                           'vlan': {'qos': 0, 'vf': 1, 'vlan': 0},
                           'vlan-list': {'info': [{'qos': 0,
                                                   'vf': 1,
                                                   'vlan': 0,
                                                   'vlan-proto': 0}]}}]},
 'xdp': {'attached': 0}}

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240507103603.23017-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08 18:47:51 -07:00
Alexandru Gagniuc
3a2a192b0e dt-bindings: net: ipq4019-mdio: add IPQ9574 compatible
Add a compatible property specific to IPQ9574. This should be used
along with the IPQ4019 compatible. This second compatible serves the
same purpose as the ipq{5,6,8} compatibles. This is to indicate that
the clocks properties are required.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240507024758.2810514-1-mr.nuke.me@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08 18:44:49 -07:00
Lukasz Majewski
252aa6d539 test: hsr: Call cleanup_all_ns when hsr_redbox.sh script exits
Without this change the created netns instances are not cleared after
this script execution. To fix this problem the cleanup_all_ns function
from ../lib.sh is called.

Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 12:23:53 +01:00
Joel Granados
1d3985ed0d ax25: Remove superfuous "return" from ax25_ds_set_timer
Remove the explicit call to "return" in the void ax25_ds_set_timer
function that was introduced in 78a7b5dbc0 ("ax.25: x.25: Remove the
now superfluous sentinel elements from ctl_table array").

Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 12:23:10 +01:00
Alexander Mikhalitsyn
2b696a2a10 ipvs: allow some sysctls in non-init user namespaces
Let's make all IPVS sysctls writtable even when
network namespace is owned by non-initial user namespace.

Let's make a few sysctls to be read-only for non-privileged users:
- sync_qlen_max
- sync_sock_size
- run_estimation
- est_cpulist
- est_nice

I'm trying to be conservative with this to prevent
introducing any security issues in there. Maybe,
we can allow more sysctls to be writable, but let's
do this on-demand and when we see real use-case.

This patch is motivated by user request in the LXC
project [1]. Having this can help with running some
Kubernetes [2] or Docker Swarm [3] workloads inside the system
containers.

Link: https://github.com/lxc/lxc/issues/4278 [1]
Link: b722d017a3/pkg/proxy/ipvs/proxier.go (L103) [2]
Link: 3797618f9a/osl/namespace_linux.go (L682) [3]

Cc: Julian Anastasov <ja@ssi.bg>
Cc: Simon Horman <horms@verge.net.au>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 12:22:18 +01:00
Alexander Mikhalitsyn
643bb5dbae ipvs: add READ_ONCE barrier for ipvs->sysctl_amemthresh
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Simon Horman <horms@verge.net.au>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Suggested-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 12:22:18 +01:00
Christian Marangi
abb45a2477 net: stmmac: dwmac-ipq806x: account for rgmii-txid/rxid/id phy-mode
Currently the ipq806x dwmac driver is almost always used attached to the
CPU port of a switch and phy-mode was always set to "rgmii" or "sgmii".

Some device came up with a special configuration where the PHY is
directly attached to the GMAC port and in those case phy-mode needs to
be set to "rgmii-id" to make the PHY correctly work and receive packets.

Since the driver supports only "rgmii" and "sgmii" mode, when "rgmii-id"
(or variants) mode is set, the mode is rejected and probe fails.

Add support also for these phy-modes to correctly setup PHYs that requires
delay applied to tx/rx.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 12:19:54 +01:00
Oleksij Rempel
b7ffab29a8 net: bridge: switchdev: Improve error message for port_obj_add/del functions
Enhance the error reporting mechanism in the switchdev framework to
provide more informative and user-friendly error messages.

Following feedback from users struggling to understand the implications
of error messages like "failed (err=-28) to add object (id=2)", this
update aims to clarify what operation failed and how this might impact
the system or network.

With this change, error messages now include a description of the failed
operation, the specific object involved, and a brief explanation of the
potential impact on the system. This approach helps administrators and
developers better understand the context and severity of errors,
facilitating quicker and more effective troubleshooting.

Example of the improved logging:

[   70.516446] ksz-switch spi0.0 uplink: Failed to add Port Multicast
               Database entry (object id=2) with error: -ENOSPC (-28).
[   70.516446] Failure in updating the port's Multicast Database could
               lead to multicast forwarding issues.
[   70.516446] Current HW/SW setup lacks sufficient resources.

This comprehensive update includes handling for a range of switchdev
object IDs, ensuring that most operations within the switchdev framework
benefit from clearer error reporting.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 12:19:12 +01:00
Peilin He
db3efdcf70 net/ipv4: add tracepoint for icmp_send
Introduce a tracepoint for icmp_send, which can help users to get more
detail information conveniently when icmp abnormal events happen.

1. Giving an usecase example:
=============================
When an application experiences packet loss due to an unreachable UDP
destination port, the kernel will send an exception message through the
icmp_send function. By adding a trace point for icmp_send, developers or
system administrators can obtain detailed information about the UDP
packet loss, including the type, code, source address, destination address,
source port, and destination port. This facilitates the trouble-shooting
of UDP packet loss issues especially for those network-service
applications.

2. Operation Instructions:
==========================
Switch to the tracing directory.
        cd /sys/kernel/tracing
Filter for destination port unreachable.
        echo "type==3 && code==3" > events/icmp/icmp_send/filter
Enable trace event.
        echo 1 > events/icmp/icmp_send/enable

3. Result View:
================
 udp_client_erro-11370   [002] ...s.12   124.728002:
 icmp_send: icmp_send: type=3, code=3.
 From 127.0.0.1:41895 to 127.0.0.1:6666 ulen=23
 skbaddr=00000000589b167a

Signed-off-by: Peilin He <he.peilin@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: Yunkai Zhang <zhang.yunkai@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: Liu Chun <liu.chun2@zte.com.cn>
Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:39:26 +01:00
David S. Miller
9f481cea15 Merge branch 'ksz-dcb-dscp'
Oleksij Rempel says:

====================
add DCB and DSCP support for KSZ switches

This patch series is aimed at improving support for DCB (Data Center
Bridging) and DSCP (Differentiated Services Code Point) on KSZ switches.

The main goal is to introduce global DSCP and PCP (Priority Code Point)
mapping support, addressing the limitation of KSZ switches not having
per-port DSCP priority mapping. This involves extending the DSA
framework with new callbacks for managing trust settings for global DSCP
and PCP maps. Additionally, we introduce IEEE 802.1q helpers for default
configurations, benefiting other drivers too.

Change logs are in separate patches.

Compared to v6 this series includes some new patches for DSCP global
mapping support and QoS selftest script for KSZ9477 switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:11 +01:00
Oleksij Rempel
cbc7afffc5 selftests: microchip: add test for QoS support on KSZ9477 switch family
Add tests covering following functionality on KSZ9477 switch family:
- default port priority
- global DSCP to Internal Priority Mapping
- apptrust configuration

This script was tested on KSZ9893R

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:11 +01:00
Oleksij Rempel
c2e722657f net: dsa: microchip: add support DSCP priority mapping
Microchip KSZ and LAN variants do not have per port DSCP priority
configuration. Instead there is a global DSCP mapping table.

This patch provides write access to this global DSCP map. In case entry
is "deleted", we map corresponding DSCP entry to a best effort prio,
which is expected to be the default priority for all untagged traffic.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:11 +01:00
Oleksij Rempel
5f5109af47 net: dsa: add support switches global DSCP priority mapping
Some switches like Microchip KSZ variants do not support per port DSCP
priority configuration. Instead there is a global DSCP mapping table.

To handle it, we will accept set/del request to any of user ports to
make global configuration and update dcb app entries for all other
ports.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:10 +01:00
Oleksij Rempel
ea1078d94c net: dsa: microchip: let DCB code do PCP and DSCP policy configuration
802.1P (PCP) and DiffServ (DSCP) are handled now by DCB code. Let it do
all needed initial configuration.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:10 +01:00
Oleksij Rempel
3bcb896865 net: dsa: microchip: init predictable IPV to queue mapping for all non KSZ8xxx variants
Init priority to queue mapping in the way as it shown in IEEE 802.1Q
mapping example.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:10 +01:00
Oleksij Rempel
c631250a24 net: dsa: microchip: enable ETS support for KSZ989X variants
I tested ETS support on KSZ9893, so it should work other KSZ989X
variants too, which was till not listed as support.

With this change we now officially not support only ksz8 family of
chips.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:10 +01:00
Oleksij Rempel
a1ea57710c net: dsa: microchip: dcb: add special handling for KSZ88X3 family
KSZ88X3 switches have different behavior on different ports:
- It seems to be not possible to disable VLAN PCP classification on port
  2. It means, as soon as mutliqueue support is enabled, frames with
     VLAN tag will get PCP prios. This behavior do not affect Port 1 -
     it is possible to disable PCP prios.
- DSCP classification is not working on Port 2.

Since there are still usable configuration combinations, I added some
quirks to make sure user will get appropriate error message if not
possible configuration is chosen.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:10 +01:00
Oleksij Rempel
a16efc61d2 net: dsa: microchip: add support for different DCB app configurations
Add DCB support to configure app trust sources and default port priority.

Following commands can be used for testing:
dcb apptrust set dev lan1 order pcp dscp
dcb app replace dev lan1 default-prio 3

Since it is not possible to configure DSCP-Prio mapping per port, this
patch provide only ability to read switch global dscp-prio mapping and
way to enable/disable app trust for DSCP.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:10 +01:00
Oleksij Rempel
328de4671d net: dsa: microchip: add multi queue support for KSZ88X3 variants
KSZ88X3 switches support up to 4 queues. Rework ksz8795_set_prio_queue()
to support KSZ8795 and KSZ88X3 families of switches.

Per default, configure KSZ88X3 to use one queue, since it need special
handling due to priority related errata. Errata handling is implemented
in a separate patch.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:10 +01:00
Oleksij Rempel
768cf84138 net: add IEEE 802.1q specific helpers
IEEE 802.1q specification provides recommendation and examples which can
be used as good default values for different drivers.

This patch implements mapping examples documented in IEEE 802.1Q-2022 in
Annex I "I.3 Traffic type to traffic class mapping" and IETF DSCP naming
and mapping DSCP to Traffic Type inspired by RFC8325.

This helpers will be used in followup patches for dsa/microchip DCB
implementation.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:09 +01:00
Oleksij Rempel
97278f8f10 net: dsa: microchip: add IPV information support
Most of Microchip KSZ switches use Internal Priority Value associated
with every frame. For example, it is possible to map any VLAN PCP or
DSCP value to IPV and at the end, map IPV to a queue.

Since amount of IPVs is not equal to amount of queues, add this
information and make use of it in some functions.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:09 +01:00
Oleksij Rempel
96c6f33795 net: dsa: add support for DCB get/set apptrust configuration
Add DCB support to get/set trust configuration for different packet
priority information sources. Some switch allow to chose different
source of packet priority classification. For example on KSZ switches it
is possible to configure VLAN PCP and/or DSCP sources.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08 10:35:09 +01:00
Jakub Kicinski
09ca994072 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-05-06 (ice)

This series contains updates to ice driver only.

Paul adds support for additional E830 devices and adjusts naming for
existing E830 devices.

Marcin commonizes a couple of TC setup calls to reduce duplicated code.

Mateusz adds ice_vsi_cfg_params into ice_vsi to consolidate info.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: refactor struct ice_vsi_cfg_params to be inside of struct ice_vsi
  ice: Deduplicate tc action setup
  ice: update E830 device ids and comments
  ice: add additional E830 device ids
====================

Link: https://lore.kernel.org/r/20240506170827.948682-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 19:08:34 -07:00
Eric Dumazet
05417aa9c0 net: usb: sr9700: stop lying about skb->truesize
Some usb drivers set small skb->truesize and break
core networking stacks.

In this patch, I removed one of the skb->truesize override.

I also replaced one skb_clone() by an allocation of a fresh
and small skb, to get minimally sized skbs, like we did
in commit 1e2c611723 ("net: cdc_ncm: reduce skb truesize
in rx path") and 4ce62d5b2f ("net: usb: ax88179_178a:
stop lying about skb->truesize")

Fixes: c9b37458e9 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240506143939.3673865-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 17:43:18 -07:00
Eric Dumazet
1b3b2d9e77 net: usb: smsc75xx: stop lying about skb->truesize
Some usb drivers try to set small skb->truesize and break
core networking stacks.

In this patch, I removed one of the skb->truesize override.

I also replaced one skb_clone() by an allocation of a fresh
and small skb, to get minimally sized skbs, like we did
in commit 1e2c611723 ("net: cdc_ncm: reduce skb truesize
in rx path") and 4ce62d5b2f ("net: usb: ax88179_178a:
stop lying about skb->truesize")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steve Glendinning <steve.glendinning@shawell.net>
Link: https://lore.kernel.org/r/20240506142358.3657918-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 17:38:48 -07:00
Eric Dumazet
9aad6e45c4 usb: aqc111: stop lying about skb->truesize
Some usb drivers try to set small skb->truesize and break
core networking stacks.

I replace one skb_clone() by an allocation of a fresh
and small skb, to get minimally sized skbs, like we did
in commit 1e2c611723 ("net: cdc_ncm: reduce skb truesize
in rx path") and 4ce62d5b2f ("net: usb: ax88179_178a:
stop lying about skb->truesize")

Fixes: 361459cd96 ("net: usb: aqc111: Implement RX data path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240506135546.3641185-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 17:37:24 -07:00
John Hubbard
eb709b5f65 selftests/net: fix uninitialized variables
When building with clang, via:

    make LLVM=1 -C tools/testing/selftest

...clang warns about three variables that are not initialized in all
cases:

1) The opt_ipproto_off variable is used uninitialized if "testname" is
not "ip". Willem de Bruijn pointed out that this is an actual bug, and
suggested the fix that I'm using here (thanks!).

2) The addr_len is used uninitialized, but only in the assert case,
   which bails out, so this is harmless.

3) The family variable in add_listener() is only used uninitialized in
   the error case (neither IPv4 nor IPv6 is specified), so it's also
   harmless.

Fix by initializing each variable.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20240506190204.28497-1-jhubbard@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 17:22:18 -07:00
Florian Fainelli
0d5044b4e7 lib: Allow for the DIM library to be modular
Allow the Dynamic Interrupt Moderation (DIM) library to be built as a
module. This is particularly useful in an Android GKI (Google Kernel
Image) configuration where everything is built as a module, including
Ethernet controller drivers. Having to build DIMLIB into the kernel
image with potentially no user is wasteful.

Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20240506175040.410446-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 16:42:45 -07:00
Eric Dumazet
445c0b69c7 mptcp: fix possible NULL dereferences
subflow_add_reset_reason(skb, ...) can fail.

We can not assume mptcp_get_ext(skb) always return a non NULL pointer.

syzbot reported:

general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 0 PID: 5098 Comm: syz-executor132 Not tainted 6.9.0-rc6-syzkaller-01478-gcdc74c9d06e7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
 RIP: 0010:subflow_v6_route_req+0x2c7/0x490 net/mptcp/subflow.c:388
Code: 8d 7b 07 48 89 f8 48 c1 e8 03 42 0f b6 04 20 84 c0 0f 85 c0 01 00 00 0f b6 43 07 48 8d 1c c3 48 83 c3 18 48 89 d8 48 c1 e8 03 <42> 0f b6 04 20 84 c0 0f 85 84 01 00 00 0f b6 5b 01 83 e3 0f 48 89
RSP: 0018:ffffc9000362eb68 EFLAGS: 00010206
RAX: 0000000000000003 RBX: 0000000000000018 RCX: ffff888022039e00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88807d961140 R08: ffffffff8b6cb76b R09: 1ffff1100fb2c230
R10: dffffc0000000000 R11: ffffed100fb2c231 R12: dffffc0000000000
R13: ffff888022bfe273 R14: ffff88802cf9cc80 R15: ffff88802ad5a700
FS:  0000555587ad2380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f420c3f9720 CR3: 0000000022bfc000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  tcp_conn_request+0xf07/0x32c0 net/ipv4/tcp_input.c:7180
  tcp_rcv_state_process+0x183c/0x4500 net/ipv4/tcp_input.c:6663
  tcp_v6_do_rcv+0x8b2/0x1310 net/ipv6/tcp_ipv6.c:1673
  tcp_v6_rcv+0x22b4/0x30b0 net/ipv6/tcp_ipv6.c:1910
  ip6_protocol_deliver_rcu+0xc76/0x1570 net/ipv6/ip6_input.c:438
  ip6_input_finish+0x186/0x2d0 net/ipv6/ip6_input.c:483
  NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
  NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
  __netif_receive_skb_one_core net/core/dev.c:5625 [inline]
  __netif_receive_skb+0x1ea/0x650 net/core/dev.c:5739
  netif_receive_skb_internal net/core/dev.c:5825 [inline]
  netif_receive_skb+0x1e8/0x890 net/core/dev.c:5885
  tun_rx_batched+0x1b7/0x8f0 drivers/net/tun.c:1549
  tun_get_user+0x2f35/0x4560 drivers/net/tun.c:2002
  tun_chr_write_iter+0x113/0x1f0 drivers/net/tun.c:2048
  call_write_iter include/linux/fs.h:2110 [inline]
  new_sync_write fs/read_write.c:497 [inline]
  vfs_write+0xa84/0xcb0 fs/read_write.c:590
  ksys_write+0x1a0/0x2c0 fs/read_write.c:643
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 3e140491dd ("mptcp: support rstreason for passive reset")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240506123032.3351895-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 16:36:26 -07:00
Florian Westphal
76508154d7 selftests: netfilter: conntrack_tcp_unreplied.sh: wait for initial connection attempt
Netdev CI reports occasional failures with this test
("ERROR: ns2-dX6bUE did not pick up tcp connection from peer").

Add explicit busywait call until the initial connection attempt shows
up in conntrack rather than a one-shot 'must exist' check.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240506114320.12178-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 16:33:53 -07:00
Eric Dumazet
1eb2cded45 net: annotate writes on dev->mtu from ndo_change_mtu()
Simon reported that ndo_change_mtu() methods were never
updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted
in commit 501a90c945 ("inet: protect against too small
mtu values.")

We read dev->mtu without holding RTNL in many places,
with READ_ONCE() annotations.

It is time to take care of ndo_change_mtu() methods
to use corresponding WRITE_ONCE()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 16:19:14 -07:00
Jeff Johnson
feb8c2b76e net: dccp: Fix ccid2_rtt_estimator() kernel-doc
make C=1 reports:

warning: Function parameter or struct member 'mrtt' not described in 'ccid2_rtt_estimator'

So document the 'mrtt' parameter.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240505-ccid2_rtt_estimator-kdoc-v1-1-09231fcb9145@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 16:15:08 -07:00
Matthias Schiffer
ecc2ae6176 net: phy: marvell: add support for MV88E6250 family internal PHYs
The embedded PHYs of the 88E6250 family switches are very basic - they
do not even have an Extended Address / Page register.

This adds support for the PHYs to the driver to set up PHY interrupts
and retrieve error stats. To deal with PHYs without a page register,
"simple" variants of all stat handling functions are introduced.

The code should work with all 88E6250 family switches (6250/6220/6071/
6070/6020). The PHY ID 0x01410db0 was read from a 88E6020, under the
assumption that all switches of this family use the same ID. The spec
only lists the prefix 0x01410c00 and leaves the last 10 bits as reserved,
but that seems too unspecific to be useful, as it would cover several
existing PHY IDs already supported by the driver; therefore, the ID read
from the actual hardware is used.

Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/0695f699cd942e6e06da9d30daeedfd47785bc01.1714643285.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 15:17:52 -07:00
Matthias Schiffer
71dd027ab4 net: phy: marvell: constify marvell_hw_stats
The list of stat registers is read-only, so we can declare it as const.

Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/24d7a2f39e0c4c94466e8ad43228fdd798053f3a.1714643285.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 15:17:52 -07:00