Commit Graph

1203897 Commits

Author SHA1 Message Date
Sven Eckelmann
29d15589f0 wifi: ath11k: Cleanup mac80211 references on failure during tx_complete
When a function is using functions from mac80211 to free an skb then it
should do it consistently and not switch to the generic dev_kfree_skb_any
(or similar functions). Otherwise (like in the error handlers), mac80211
will will not be aware of the freed skb and thus not clean up related
information in its internal data structures.

Not doing so lead in the past to filled up structure which then prevented
new clients to connect.

Fixes: d5c65159f2 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
Fixes: 6257c70226 ("wifi: ath11k: fix tx status reporting in encap offload mode")
Cc: stable@vger.kernel.org
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230802-ath11k-ack_status_leak-v2-2-c0af729d6229@narfation.org
2023-08-23 17:02:47 +03:00
Sven Eckelmann
400ece6c7f wifi: ath11k: Don't drop tx_status when peer cannot be found
When a station idles for a long time, hostapd will try to send a QoS Null
frame to the station as "poll". NL80211_CMD_PROBE_CLIENT is used for this
purpose. And the skb will be added to ack_status_frame - waiting for a
completion via ieee80211_report_ack_skb().

But when the peer was already removed before the tx_complete arrives, the
peer will be missing. And when using dev_kfree_skb_any (instead of going
through mac80211), the entry will stay inside ack_status_frames. This IDR
will therefore run full after 8K request were generated for such clients.
At this point, the access point will then just stall and not allow any new
clients because idr_alloc() for ack_status_frame will fail.

ieee80211_free_txskb() on the other hand will (when required) call
ieee80211_report_ack_skb() and make sure that (when required) remove the
entry from the ack_status_frame.

Tested-on: IPQ6018 hw1.0 WLAN.HK.2.5.0.1-01100-QCAHKSWPL_SILICONZ-1

Fixes: 6257c70226 ("wifi: ath11k: fix tx status reporting in encap offload mode")
Fixes: 94739d45c3 ("ath11k: switch to using ieee80211_tx_status_ext()")
Cc: stable@vger.kernel.org
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230802-ath11k-ack_status_leak-v2-1-c0af729d6229@narfation.org
2023-08-23 17:02:47 +03:00
Yue Haibing
c4125bf883 wifi: wilc1000: Remove unused declarations
Commit 8399918f30 ("staging: wilc1000: use RCU list to maintain vif interfaces list")
removed wilc_get_interface() but not its declaration.
Commit 9bc061e880 ("staging: wilc1000: added support to dynamically add/remove interfaces")
declared but never implemented wilc_cfg_alloc() and wilc_netdev_interface().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230819102100.13720-1-yuehaibing@huawei.com
2023-08-23 14:12:17 +03:00
Dmitry Antipov
35a7a1ce7c wifi: mwifiex: avoid possible NULL skb pointer dereference
In 'mwifiex_handle_uap_rx_forward()', always check the value
returned by 'skb_copy()' to avoid potential NULL pointer
dereference in 'mwifiex_uap_queue_bridged_pkt()', and drop
original skb in case of copying failure.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 838e4f4492 ("mwifiex: improve uAP RX handling")
Acked-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230814095041.16416-1-dmantipov@yandex.ru
2023-08-23 14:11:42 +03:00
Shiji Yang
821b5192c9 wifi: rt2x00: limit MT7620 TX power based on eeprom calibration
In the vendor driver, the current channel power is queried from
EEPROM_TXPOWER_BG1 and EEPROM_TXPOWER_BG2. And then the mixed value
will be written into the low half-word of the TX_ALC_CFG_0 register.
The high half-word of the TX_ALC_CFG_0 is a fixed value 0x2f2f.

We can't get the accurate TX power. Based on my tests and the new
MediaTek mt76 driver source code, the real TX power is approximately
equal to channel_power + (max) rate_power. Usually max rate_power is
the gain of the OFDM 6M rate, which can be readed from the offset
EEPROM_TXPOWER_BYRATE +1.

Based on these eeprom values, this patch adds basic TX power control
for the MT7620 and limits its maximum TX power. This can avoid the
link speed decrease caused by chip overheating. rt2800_config_alc()
function has also been renamed to rt2800_config_alc_rt6352() because
it's only used by RT6352 (MT7620).

Notice:
It's still need some work to sync the max channel power to the user
interface. This part is missing from the rt2x00 driver framework. If
we set the power exceed the calibration value, it won't take effect.

Signed-off-by: Shiji Yang <yangshiji66@outlook.com>
Acked-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/TYAP286MB03159090ED14044215E59FD6BC10A@TYAP286MB0315.JPNP286.PROD.OUTLOOK.COM
2023-08-23 14:09:48 +03:00
Li Zetao
eaa8023e9b wifi: wfx: Use devm_kmemdup to replace devm_kmalloc + memcpy
Use the helper function devm_kmemdup() rather than duplicating its
implementation, which helps to enhance code readability.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230810114939.2104013-1-lizetao1@huawei.com
2023-08-23 14:09:23 +03:00
Wu Yunchuan
7d8473c799 wifi: rsi: rsi_91x_usb_ops: Remove unnecessary (void*) conversions
No need cast (void*) to (struct rsi_91x_usbdev *).

Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230803073606.3667062-1-yunchuan@nfschina.com
2023-08-23 14:07:16 +03:00
Wu Yunchuan
f543235c39 wifi: rsi: rsi_91x_usb: Remove unnecessary (void*) conversions
No need cast (void*) to (struct rsi_91x_usbdev *).

Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230803073558.3666936-1-yunchuan@nfschina.com
2023-08-23 14:07:15 +03:00
Wu Yunchuan
361beddbfb wifi: rsi: rsi_91x_sdio_ops: Remove unnecessary (void*) conversions
No need cast (void*) to (struct rsi_91x_sdiodev *).

Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230803073550.3666829-1-yunchuan@nfschina.com
2023-08-23 14:07:15 +03:00
Wu Yunchuan
f9bf6e729f wifi: rsi: rsi_91x_sdio: Remove unnecessary (void*) conversions
No need cast (void*) to (struct rsi_91x_sdiodev *).

Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230803073539.3666735-1-yunchuan@nfschina.com
2023-08-23 14:07:15 +03:00
Wu Yunchuan
db2be1a01f wifi: rsi: rsi_91x_main: Remove unnecessary (void*) conversions
No need cast (void*) to (struct rsi_common *).

Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230803073529.3666653-1-yunchuan@nfschina.com
2023-08-23 14:07:15 +03:00
Wu Yunchuan
6d5d2dbd00 wifi: rsi: rsi_91x_mac80211: Remove unnecessary conversions
No need cast (struct rsi_hw *) to (struct rsi_hw *),
or cast (struct rsi_common *) to (struct rsi_common *).

Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230803073517.3666559-1-yunchuan@nfschina.com
2023-08-23 14:07:14 +03:00
Wu Yunchuan
52424e0c49 wifi: rsi: rsi_91x_hal: Remove unnecessary conversions
No need cast (struct rsi_hw *) to (struct rsi_hw *).

Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230803073507.3666471-1-yunchuan@nfschina.com
2023-08-23 14:07:14 +03:00
Wu Yunchuan
148924e537 wifi: rsi: rsi_91x_debugfs: Remove unnecessary (void*) conversions
No need cast (void*) to (struct rsi_91x_sdiodev *).

Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230803073455.3666306-1-yunchuan@nfschina.com
2023-08-23 14:07:14 +03:00
Wu Yunchuan
5f48e91624 wifi: rsi: rsi_91x_coex: Remove unnecessary (void*) conversions
No need cast (void*) to (struct rsi_coex_ctrl_block *) or
(struct rsi_common *).

Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230803073440.3666204-1-yunchuan@nfschina.com
2023-08-23 14:07:14 +03:00
Michael Ellerman
bfedba3b2c ibmveth: Use dcbf rather than dcbfl
When building for power4, newer binutils don't recognise the "dcbfl"
extended mnemonic.

dcbfl RA, RB is equivalent to dcbf RA, RB, 1.

Switch to "dcbf" to avoid the build error.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 11:51:16 +01:00
Andrii Staikov
9525a3c38a i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
Add check for pf->vf not being NULL before dereferencing
pf->vf[vsi->vf_id] in updating VSI filter sync.
Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted
in the condition for clearing promisc mode bit.

Fixes: c87c938f62 ("i40e: Add VF VLAN pruning")
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 11:48:04 +01:00
Jakub Kicinski
e3b3a87967 bnxt: use the NAPI skb allocation cache
All callers of build_skb() (*) in bnxt are in NAPI context.
The budget checking is somewhat convoluted because in the shared
completion queue cases Rx packets are discarded by netpoll
by forcing an error (E). But that happens before skb allocation.
Only a call chain starting at __bnxt_poll_work() can lead to
an skb allocation and it checks budget (b).

* bnxt_rx_multi_page_skb
* bnxt_rx_skb
  ` bp->rx_skb_func
  * bnxt_tpa_end
    ` bnxt_rx_pkt
      E bnxt_force_rx_discard
      E bnxt_poll_nitroa0
      b __bnxt_poll_work

Use napi_build_skb() to take advantage of the skb cache.
In iperf tests with HW-GRO enabled it barely makes a difference
but in cases where HW-GRO is not as effective (or disabled) it
can give even a >10% boost (20.7Gbps -> 23.1Gbps).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 11:45:54 +01:00
Jamal Hadi Salim
da71714e35 net/sched: fix a qdisc modification with ambiguous command request
When replacing an existing root qdisc, with one that is of the same kind, the
request boils down to essentially a parameterization change  i.e not one that
requires allocation and grafting of a new qdisc. syzbot was able to create a
scenario which resulted in a taprio qdisc replacing an existing taprio qdisc
with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to
create and graft scenario.
The fix ensures that only when the qdisc kinds are different that we should
allow a create and graft, otherwise it goes into the "change" codepath.

While at it, fix the code and comments to improve readability.

While syzbot was able to create the issue, it did not zone on the root cause.
Analysis from Vladimir Oltean <vladimir.oltean@nxp.com> helped narrow it down.

v1->V2 changes:
- remove "inline" function definition (Vladmir)
- remove extrenous braces in branches (Vladmir)
- change inline function names (Pedro)
- Run tdc tests (Victor)
v2->v3 changes:
- dont break else/if (Simon)

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 09:44:48 +01:00
Alexis Lothoré
2e0c8ee2b5 net: dsa: rzn1-a5psw: remove redundant logs
Remove debug logs in port vlan management, since there are already multiple
tracepoints defined for those operations in DSA

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 09:43:32 +01:00
Jordan Rife
0bdf399342 net: Avoid address overwrite in kernel_connect
BPF programs that run on connect can rewrite the connect address. For
the connect system call this isn't a problem, because a copy of the address
is made when it is moved into kernel space. However, kernel_connect
simply passes through the address it is given, so the caller may observe
its address value unexpectedly change.

A practical example where this is problematic is where NFS is combined
with a system such as Cilium which implements BPF-based load balancing.
A common pattern in software-defined storage systems is to have an NFS
mount that connects to a persistent virtual IP which in turn maps to an
ephemeral server IP. This is usually done to achieve high availability:
if your server goes down you can quickly spin up a replacement and remap
the virtual IP to that endpoint. With BPF-based load balancing, mounts
will forget the virtual IP address when the address rewrite occurs
because a pointer to the only copy of that address is passed down the
stack. Server failover then breaks, because clients have forgotten the
virtual IP address. Reconnects fail and mounts remain broken. This patch
was tested by setting up a scenario like this and ensuring that NFS
reconnects worked after applying the patch.

Signed-off-by: Jordan Rife <jrife@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 09:42:05 +01:00
Feng Liu
dae64749db virtio_net: Introduce skb_vnet_common_hdr to avoid typecasting
The virtio_net driver currently deals with different versions and types
of virtio net headers, such as virtio_net_hdr_mrg_rxbuf,
virtio_net_hdr_v1_hash, etc. Due to these variations, the code relies
on multiple type casts to convert memory between different structures,
potentially leading to bugs when there are changes in these structures.

Introduces the "struct skb_vnet_common_hdr" as a unifying header
structure using a union. With this approach, various virtio net header
structures can be converted by accessing different members of this
structure, thus eliminating the need for type casting and reducing the
risk of potential bugs.

For example following code:
static struct sk_buff *page_to_skb(struct virtnet_info *vi,
		struct receive_queue *rq,
		struct page *page, unsigned int offset,
		unsigned int len, unsigned int truesize,
		unsigned int headroom)
{
[...]
	struct virtio_net_hdr_mrg_rxbuf *hdr;
[...]
	hdr_len = vi->hdr_len;
[...]
ok:
	hdr = skb_vnet_hdr(skb);
	memcpy(hdr, hdr_p, hdr_len);
[...]
}

When VIRTIO_NET_F_HASH_REPORT feature is enabled, hdr_len = 20
But the sizeof(*hdr) is 12,
memcpy(hdr, hdr_p, hdr_len); will copy 20 bytes to the hdr,
which make a potential risk of bug. And this risk can be avoided by
introducing struct skb_vnet_common_hdr.

Change log
v1->v2
feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com>
feedback from Simon Horman <horms@kernel.org>
1. change to use net-next tree.
2. move skb_vnet_common_hdr inside kernel file instead of the UAPI header.

v2->v3
feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com>
1. fix typo in commit message.
2. add original struct virtio_net_hdr into union
3. remove virtio_net_hdr_mrg_rxbuf variable in receive_buf;

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 09:40:18 +01:00
Jinjie Ruan
45f9cb6bd9 dp83640: Use list_for_each_entry() helper
Convert list_for_each() to list_for_each_entry() where applicable.

No functional changed.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 09:39:15 +01:00
David S. Miller
5c42b66d01 Merge branch 'mlx4-aux-bus'
Petr Pavlu says:

====================
Convert mlx4 to use auxiliary bus

This series converts the mlx4 drivers to use auxiliary bus, similarly to
how mlx5 was converted [1]. The first 6 patches are preparatory changes,
the remaining 4 are the final conversion.

Initial motivation for this change was to address a problem related to
loading mlx4_en/mlx4_ib by mlx4_core using request_module_nowait(). When
doing such a load in initrd, the operation is asynchronous to any init
control and can get unexpectedly affected/interrupted by an eventual
root switch. Using an auxiliary bus leaves these module loads to udevd
which better integrates with systemd processing. [2]

General benefit is to get rid of custom interface logic and instead use
a common facility available for this task. An obvious risk is that some
new bug is introduced by the conversion.

Leon Romanovsky was kind enough to check for me that the series passes
their verification tests.

Changes since v2 [3]:
* Use 'void *' as the event param of mlx4_dispatch_event().

Changes since v1 [4]:
* Fix a missing definition of the err variable in mlx4_en_add().
* Remove not needed comments about the event type in mlx4_en_event()
  and mlx4_ib_event().

[1] https://lore.kernel.org/netdev/20201101201542.2027568-1-leon@kernel.org/
[2] https://lore.kernel.org/netdev/0a361ac2-c6bd-2b18-4841-b1b991f0635e@suse.com/
[3] https://lore.kernel.org/netdev/20230813145127.10653-1-petr.pavlu@suse.com/
[4] https://lore.kernel.org/netdev/20230804150527.6117-1-petr.pavlu@suse.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:28 +01:00
Petr Pavlu
c138cdb89a mlx4: Delete custom device management logic
After the conversion to use the auxiliary bus, the custom device
management is not needed anymore and can be deleted.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:28 +01:00
Petr Pavlu
7d22b1cb9d mlx4: Connect the infiniband part to the auxiliary bus
Use the auxiliary bus to perform device management of the infiniband
part of the mlx4 driver.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:28 +01:00
Petr Pavlu
eb93ae495a mlx4: Connect the ethernet part to the auxiliary bus
Use the auxiliary bus to perform device management of the ethernet part
of the mlx4 driver.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:28 +01:00
Petr Pavlu
8c2d2b8771 mlx4: Register mlx4 devices to an auxiliary virtual bus
Add an auxiliary virtual bus to model the mlx4 driver structure. The
code is added along the current custom device management logic.
Subsequent patches switch mlx4_en and mlx4_ib to the auxiliary bus and
the old interface is then removed.

Structure mlx4_priv gains a new adev dynamic array to keep track of its
auxiliary devices. Access to the array is protected by the global
mlx4_intf mutex.

Functions mlx4_register_device() and mlx4_unregister_device() are
updated to expose auxiliary devices on the bus in order to load mlx4_en
and/or mlx4_ib. Functions mlx4_register_auxiliary_driver() and
mlx4_unregister_auxiliary_driver() are added to substitute
mlx4_register_interface() and mlx4_unregister_interface(), respectively.
Function mlx4_do_bond() is adjusted to walk over the adev array and
re-adds a specific auxiliary device if its driver sets the
MLX4_INTFF_BONDING flag.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:28 +01:00
Petr Pavlu
c9452b8fd2 mlx4: Avoid resetting MLX4_INTFF_BONDING per driver
The mlx4_core driver has a logic that allows a sub-driver to set the
MLX4_INTFF_BONDING flag which then causes that function mlx4_do_bond()
asks the sub-driver to fully re-probe a device when its bonding
configuration changes.

Performing this operation is disallowed in mlx4_register_interface()
when it is detected that any mlx4 device is multifunction (SRIOV). The
code then resets MLX4_INTFF_BONDING in the driver flags.

Move this check directly into mlx4_do_bond(). It provides a better
separation as mlx4_core no longer directly modifies the sub-driver flags
and it will allow to get rid of explicitly keeping track of all mlx4
devices by the intf.c code when it is switched to an auxiliary bus.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:28 +01:00
Petr Pavlu
e2fb47d4eb mlx4: Move the bond work to the core driver
Function mlx4_en_queue_bond_work() is used in mlx4_en to start a bond
reconfiguration. It gathers data about a new port map setting, takes
a reference on the netdev that triggered the change and queues a work
object on mlx4_en_priv.mdev.workqueue to perform the operation. The
scheduled work is mlx4_en_bond_work() which calls
mlx4_bond()/mlx4_unbond() and consequently mlx4_do_bond().

At the same time, function mlx4_change_port_types() in mlx4_core might
be invoked to change the port type configuration. As part of its logic,
it re-registers the whole device by calling mlx4_unregister_device(),
followed by mlx4_register_device().

The two operations can result in concurrent access to the data about
currently active interfaces on the device.

Functions mlx4_register_device() and mlx4_unregister_device() lock the
intf_mutex to gain exclusive access to this data. The current
implementation of mlx4_do_bond() doesn't do that which could result in
an unexpected behavior. An updated version of mlx4_do_bond() for use
with an auxiliary bus goes and locks the intf_mutex when accessing a new
auxiliary device array.

However, doing so can then result in the following deadlock:
* A two-port mlx4 device is configured as an Ethernet bond.
* One of the ports is changed from eth to ib, for instance, by writing
  into a mlx4_port<x> sysfs attribute file.
* mlx4_change_port_types() is called to update port types. It invokes
  mlx4_unregister_device() to unregister the device which locks the
  intf_mutex and starts removing all associated interfaces.
* Function mlx4_en_remove() gets invoked and starts destroying its first
  netdev. This triggers mlx4_en_netdev_event() which recognizes that the
  configured bond is broken. It runs mlx4_en_queue_bond_work() which
  takes a reference on the netdev. Removing the netdev now cannot
  proceed until the work is completed.
* Work function mlx4_en_bond_work() gets scheduled. It calls
  mlx4_unbond() -> mlx4_do_bond(). The latter function tries to lock the
  intf_mutex but that is not possible because it is held already by
  mlx4_unregister_device().

This particular case could be possibly solved by unregistering the
mlx4_en_netdev_event() notifier in mlx4_en_remove() earlier, but it
seems better to decouple mlx4_en more and break this reference order.

Avoid then this scenario by recognizing that the bond reconfiguration
operates only on a mlx4_dev. The logic to queue and execute the bond
work can be moved into the mlx4_core driver. Only a reference on the
respective mlx4_dev object is needed to be taken during the work's
lifetime. This removes a call from mlx4_en that can directly result in
needing to lock the intf_mutex, it remains a privilege of the core
driver.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:28 +01:00
Petr Pavlu
13f857111c mlx4: Get rid of the mlx4_interface.activate callback
The mlx4_interface.activate callback was introduced in commit
79857cd31f ("net/mlx4: Postpone the registration of net_device"). It
dealt with a situation when a netdev notifier received a NETDEV_REGISTER
event for a new net_device created by mlx4_en but the same device was
not yet visible to mlx4_get_protocol_dev(). The callback can be removed
now that mlx4_get_protocol_dev() is gone.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:27 +01:00
Petr Pavlu
73d68002a0 mlx4: Replace the mlx4_interface.event callback with a notifier
Use a notifier to implement mlx4_dispatch_event() in preparation to
switch mlx4_en and mlx4_ib to be an auxiliary device.

A problem is that if the mlx4_interface.event callback was replaced with
something as mlx4_adrv.event then the implementation of
mlx4_dispatch_event() would need to acquire a lock on a given device
before executing this callback. That is necessary because otherwise
there is no guarantee that the associated driver cannot get unbound when
the callback is running. However, taking this lock is not possible
because mlx4_dispatch_event() can be invoked from the hardirq context.
Using an atomic notifier allows the driver to accurately record when it
wants to receive these events and solves this problem.

A handler registration is done by both mlx4_en and mlx4_ib at the end of
their mlx4_interface.add callback. This matches the current situation
when mlx4_add_device() would enable events for a given device
immediately after this callback, by adding the device on the
mlx4_priv.list.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:27 +01:00
Petr Pavlu
7ba189ac52 mlx4: Use 'void *' as the event param of mlx4_dispatch_event()
Function mlx4_dispatch_event() takes an 'unsigned long' as its event
parameter. The actual value is none (MLX4_DEV_EVENT_CATASTROPHIC_ERROR),
a pointer to mlx4_eqe (MLX4_DEV_EVENT_PORT_MGMT_CHANGE), or a 32-bit
integer (remaining events).

In preparation to switch mlx4_en and mlx4_ib to be an auxiliary device,
the mlx4_interface.event callback is replaced with a notifier and
function mlx4_dispatch_event() gets updated to invoke
atomic_notifier_call_chain(). This requires forwarding the input 'param'
value from the former function to the latter. A problem is that the
notifier call takes 'void *' as its 'param' value, compared to
'unsigned long' used by mlx4_dispatch_event(). Re-passing the value
would need either punning it to 'void *' or passing down the address of
the input 'param'. Both approaches create a number of unnecessary casts.

Change instead the input 'param' of mlx4_dispatch_event() from
'unsigned long' to 'void *'. A mlx4_eqe pointer can be passed directly,
callers using an int value are adjusted to pass its address.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:27 +01:00
Petr Pavlu
ef5617e343 mlx4: Rename member mlx4_en_dev.nb to netdev_nb
Rename the mlx4_en_dev.nb notifier_block member to netdev_nb in
preparation to add a mlx4 core notifier_block.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:27 +01:00
Petr Pavlu
71ab55a9af mlx4: Get rid of the mlx4_interface.get_dev callback
Simplify the mlx4 driver interface by removing mlx4_get_protocol_dev()
and the associated mlx4_interface.get_dev callbacks. This is done in
preparation to use an auxiliary bus to model the mlx4 driver structure.

The change is motivated by the following situation:
* The mlx4_en interface is being initialized by mlx4_en_add() and
  mlx4_en_activate().
* The latter activate function calls mlx4_en_init_netdev() ->
  register_netdev() to register a new net_device.
* A netdev event NETDEV_REGISTER is raised for the device.
* The netdev notififier mlx4_ib_netdev_event() is called and it invokes
  mlx4_ib_scan_netdevs() -> mlx4_get_protocol_dev() ->
  mlx4_en_get_netdev() [via mlx4_interface.get_dev].

This chain creates a problem when mlx4_en gets switched to be an
auxiliary driver. It contains two device calls which would both need to
take a respective device lock.

Avoid this situation by updating mlx4_ib_scan_netdevs() to no longer
call mlx4_get_protocol_dev() but instead to utilize the information
passed in net_device.parent and net_device.dev_port. This data is
sufficient to determine that an updated port is one that the mlx4_ib
driver should take care of and to keep mlx4_ib_dev.iboe.netdevs up to
date.

Following that, update mlx4_ib_get_netdev() to also not call
mlx4_get_protocol_dev() and instead scan all current netdevs to find
find a matching one. Note that mlx4_ib_get_netdev() is called early from
ib_register_device() and cannot use data tracked in
mlx4_ib_dev.iboe.netdevs which is not at that point yet set.

Finally, remove function mlx4_get_protocol_dev() and the
mlx4_interface.get_dev callbacks (only mlx4_en_get_netdev()) as they
became unused.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:27 +01:00
Yue Haibing
eb6603246a qed/qede: Remove unused declarations
Commit 8cd160a294 ("qede: convert to new udp_tunnel_nic infra")
removed qede_udp_tunnel_{add,del}() but not the declarations.
Commit 0ebcebbef1 ("qed: Read device port count from the shmem")
removed qed_device_num_engines() but not its declaration.
Commit 1e128c8129 ("qed: Add support for hardware offloaded FCoE.")
declared but never implemented qed_fcoe_set_pf_params().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:21:37 +01:00
Sai Krishna
bdf79b1286 octeontx2-pf: Use PTP HW timestamp counter atomic update feature
Some of the newer silicon versions in CN10K series supports a feature
where in the current PTP timestamp in HW can be updated atomically
without losing any cpu cycles unlike read/modify/write register.
This patch uses this feature so that PTP accuracy can be improved
while adjusting the master offset in HW. There is no need for SW
timecounter when using this feature. So removed references to SW
timecounter wherever appropriate.

Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:20:50 +01:00
Leon Romanovsky
b8c697e177 net/mlx5e: Support IPsec upper TCP protocol selector
Support TCP as protocol selector for policy and state in IPsec
packet offload mode.

Example of state configuration is as follows:
  ip xfrm state add src 192.168.25.3 dst 192.168.25.1 \
	proto esp spi 1001 reqid 10001 aead 'rfc4106(gcm(aes))' \
	0x54a7588d36873b031e4bd46301be5a86b3a53879 128 mode transport \
	offload packet dev re0 dir in sel src 192.168.25.3 dst 192.168.25.1 \
	proto tcp dport 9003

Acked-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:18 -07:00
Emeel Hakim
c338325f7a net/mlx5e: Support IPsec upper protocol selector field offload for RX
Support RX policy/state upper protocol selector field offload,
to enable selecting RX traffic for IPsec operation based on l4
protocol UDP with specific source/destination port.

Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:18 -07:00
Jiri Pirko
7d8335200c net/mlx5: Store vport in struct mlx5_devlink_port and use it in port ops
Instead of using internal devlink_port->index to perform vport lookup in
every devlink port op, store the vport pointer to the container struct
mlx5_devlink_port and use it directly in port ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:18 -07:00
Jiri Pirko
eb555e34f0 net/mlx5: Check vhca_resource_manager capability in each op and add extack msg
Since the follow-up patch is going to remove
mlx5_devlink_port_fn_get_vport() entirely, move the vhca_resource_manager
capability checking to individual ops. Add proper extack message
on the way.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:18 -07:00
Jiri Pirko
5c632cc352 net/mlx5: Relax mlx5_devlink_eswitch_get() return value checking
If called from port ops, it is not needed to perform the checks in
mlx5_devlink_eswitch_get(). The reason is devlink port would not be
registered if the checks are not true. Introduce relaxed version
mlx5_devlink_eswitch_nocheck_get() and use it in port ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:17 -07:00
Jiri Pirko
c0ae009292 net/mlx5: Return -EOPNOTSUPP in mlx5_devlink_port_fn_migratable_set() directly
Instead of initializing "err" variable, just return "-EOPNOTSUPP"
directly where it is needed.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:17 -07:00
Jiri Pirko
2caa2a3911 net/mlx5: Reduce number of vport lookups passing vport pointer instead of index
During devlink port init/cleanup and register/unregister calls, there
are many lookups of vport. Instead of passing vport_num as argument to
functions, pass the vport struct pointer directly and avoid repeated
lookups.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:17 -07:00
Jiri Pirko
2c5f33f6b9 net/mlx5: Embed struct devlink_port into driver structure
Struct devlink_port is usually embedded in a driver-specific struct
which allows to carry driver context to devlink port ops.

Introduce a container struct to include devlink_port struct
in preparation to also include driver context for devlink port ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:17 -07:00
Jiri Pirko
13f878a22c net/mlx5: Don't register ops for non-PF/VF/SF port and avoid checks in ops
Currently each PF/VF/SF devlink port op called into mlx5 code calls
is_port_function_supported() to check if the port is either
PF, VF or SF. So make sure that the ops are registered with devlink
port only for those and avoid the is_port_function_supported() checks
in ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:17 -07:00
Jiri Pirko
b940ec4b25 net/mlx5: Remove no longer used mlx5_esw_offloads_sf_vport_enable/disable()
Since the previous patch removed the only users of these functions,
remove them.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:16 -07:00
Jiri Pirko
e855afd715 net/mlx5: Introduce mlx5_eswitch_load/unload_sf_vport() and use it from SF code
Similar to the PF/VF helpers, introduce a set of load/unload helpers
for SF vports. From there, call mlx5_eswitch_load/unload_vport() which
are common for PFs/VFs and newly introduced SF helpers.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:16 -07:00
Jiri Pirko
382fe5747b net/mlx5: Allow mlx5_esw_offloads_devlink_port_register() to register SFs
Currently there is a separate set of functions used to
register/unregister the SF. The only difference is currently the ops
struct. Move the struct up and use it for SFs in
mlx5_esw_offloads_devlink_port_register().

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:16 -07:00
Jiri Pirko
d9833bcfe8 net/mlx5: Push devlink port PF/VF init/cleanup calls out of devlink_port_register/unregister()
In order to prepare for
mlx5_esw_offloads_devlink_port_register/unregister() to be used
for SFs as well, push out the PF/VF specific init/cleanup calls outside.
Introduce mlx5_eswitch_load/unload_pf_vf_vport() and call them from
there. Use these new helpers of PF/VF loading and make
mlx5_eswitch_local/unload_vport() reusable for SFs.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22 21:34:16 -07:00