Commit Graph

119545 Commits

Author SHA1 Message Date
Jakub Kicinski
4a0822608e Merge tag 'mlx5-fixes-2023-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5 fixes 2023-07-26

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2023-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Unregister devlink params in case interface is down
  net/mlx5: DR, Fix peer domain namespace setting
  net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported
  net/mlx5e: kTLS, Fix protection domain in use syndrome when devlink reload
  net/mlx5: Bridge, set debugfs access right to root-only
  net/mlx5e: xsk: Fix crash on regular rq reactivation
  net/mlx5e: xsk: Fix invalid buffer access for legacy rq
  net/mlx5e: Move representor neigh cleanup to profile cleanup_tx
  net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set
  net/mlx5e: Don't hold encap tbl lock if there is no encap action
  net/mlx5: Honor user input for migratable port fn attr
  net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
  net/mlx5: fix potential memory leak in mlx5e_init_rep_rx
  net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
  net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups
====================

Link: https://lore.kernel.org/r/20230726213206.47022-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 20:03:41 -07:00
Yuanjun Gong
dadc5b86cc net: dsa: fix value check in bcm_sf2_sw_probe()
in bcm_sf2_sw_probe(), check the return value of clk_prepare_enable()
and return the error code if clk_prepare_enable() returns an
unexpected value.

Fixes: e9ec5c3bd2 ("net: dsa: bcm_sf2: request and handle clocks")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20230726170506.16547-1-ruc_gongyuanjun@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 20:02:54 -07:00
Yuanjun Gong
5c85f70657 benet: fix return value check in be_lancer_xmit_workarounds()
in be_lancer_xmit_workarounds(), it should go to label 'tx_drop'
if an unexpected value is returned by pskb_trim().

Fixes: 93040ae5cc ("be2net: Fix to trim skb for padded vlan packets to workaround an ASIC Bug")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Link: https://lore.kernel.org/r/20230725032726.15002-1-ruc_gongyuanjun@163.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-27 10:31:38 +02:00
Jason Wang
25266128fe virtio-net: fix race between set queues and probe
A race were found where set_channels could be called after registering
but before virtnet_set_queues() in virtnet_probe(). Fixing this by
moving the virtnet_set_queues() before netdevice registering. While at
it, use _virtnet_set_queues() to avoid holding rtnl as the device is
not even registered at that time.

Cc: stable@vger.kernel.org
Fixes: a220871be6 ("virtio-net: correctly enable multiqueue")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230725072049.617289-1-jasowang@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-26 22:09:42 -07:00
Wei Fang
15cec633fc net: fec: tx processing does not call XDP APIs if budget is 0
According to the clarification [1] in the latest napi.rst, the tx
processing cannot call any XDP (or page pool) APIs if the "budget"
is 0. Because NAPI is called with the budget of 0 (such as netpoll)
indicates we may be in an IRQ context, however, we cannot use the
page pool from IRQ context.

[1] https://lore.kernel.org/all/20230720161323.2025379-1-kuba@kernel.org/

Fixes: 20f7973990 ("net: fec: recycle pages for transmitted XDP frames")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230725074148.2936402-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-26 21:12:12 -07:00
Shay Drory
53d737dfd3 net/mlx5: Unregister devlink params in case interface is down
Currently, in case an interface is down, mlx5 driver doesn't
unregister its devlink params, which leads to this WARN[1].
Fix it by unregistering devlink params in that case as well.

[1]
[  295.244769 ] WARNING: CPU: 15 PID: 1 at net/core/devlink.c:9042 devlink_free+0x174/0x1fc
[  295.488379 ] CPU: 15 PID: 1 Comm: shutdown Tainted: G S         OE 5.15.0-1017.19.3.g0677e61-bluefield #g0677e61
[  295.509330 ] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.2.0.12761 Jun  6 2023
[  295.543096 ] pc : devlink_free+0x174/0x1fc
[  295.551104 ] lr : mlx5_devlink_free+0x18/0x2c [mlx5_core]
[  295.561816 ] sp : ffff80000809b850
[  295.711155 ] Call trace:
[  295.716030 ]  devlink_free+0x174/0x1fc
[  295.723346 ]  mlx5_devlink_free+0x18/0x2c [mlx5_core]
[  295.733351 ]  mlx5_sf_dev_remove+0x98/0xb0 [mlx5_core]
[  295.743534 ]  auxiliary_bus_remove+0x2c/0x50
[  295.751893 ]  __device_release_driver+0x19c/0x280
[  295.761120 ]  device_release_driver+0x34/0x50
[  295.769649 ]  bus_remove_device+0xdc/0x170
[  295.777656 ]  device_del+0x17c/0x3a4
[  295.784620 ]  mlx5_sf_dev_remove+0x28/0xf0 [mlx5_core]
[  295.794800 ]  mlx5_sf_dev_table_destroy+0x98/0x110 [mlx5_core]
[  295.806375 ]  mlx5_unload+0x34/0xd0 [mlx5_core]
[  295.815339 ]  mlx5_unload_one+0x70/0xe4 [mlx5_core]
[  295.824998 ]  shutdown+0xb0/0xd8 [mlx5_core]
[  295.833439 ]  pci_device_shutdown+0x3c/0xa0
[  295.841651 ]  device_shutdown+0x170/0x340
[  295.849486 ]  __do_sys_reboot+0x1f4/0x2a0
[  295.857322 ]  __arm64_sys_reboot+0x2c/0x40
[  295.865329 ]  invoke_syscall+0x78/0x100
[  295.872817 ]  el0_svc_common.constprop.0+0x54/0x184
[  295.882392 ]  do_el0_svc+0x30/0xac
[  295.889008 ]  el0_svc+0x48/0x160
[  295.895278 ]  el0t_64_sync_handler+0xa4/0x130
[  295.903807 ]  el0t_64_sync+0x1a4/0x1a8
[  295.911120 ] ---[ end trace 4f1d2381d00d9dce  ]---

Fixes: fe578cbb2f ("net/mlx5: Move devlink registration before mlx5_load")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:05 -07:00
Shay Drory
62752c0bc6 net/mlx5: DR, Fix peer domain namespace setting
The offending patch is based on the assumption that for PFs,
mlx5_get_dev_index() is the same as vhca_id. However, this assumption
is wrong in case of DPU (ECPF).
Fix it by using vhca_id directly, and switch the array of peers to
xarray.

Fixes: 6d5b7321d8 ("net/mlx5: DR, handle more than one peer domain")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:04 -07:00
Chris Mi
61eab651f6 net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported
The cited commit sets ft prio to fs_base_prio. But if
ignore_flow_level it not supported, ft prio must be set based on
tc filter prio. Otherwise, all the ft prio are the same on the same
chain. It is invalid if ignore_flow_level is not supported.

Fix it by setting ft prio based on tc filter prio and setting
fs_base_prio to 0 for fdb.

Fixes: 8e80e56480 ("net/mlx5: fs_chains: Refactor to detach chains from tc usage")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:04 -07:00
Jianbo Liu
3e4cf1dd2c net/mlx5e: kTLS, Fix protection domain in use syndrome when devlink reload
There are DEK objects cached in DEK pool after kTLS is used, and they
are freed only in mlx5e_ktls_cleanup().

mlx5e_destroy_mdev_resources() is called in mlx5e_suspend() to
free mdev resources, including protection domain (PD). However, PD is
still referenced by the cached DEK objects in this case, because
profile->cleanup() (and therefore mlx5e_ktls_cleanup()) is called
after mlx5e_suspend() during devlink reload. So the following FW
syndrome is generated:

 mlx5_cmd_out_err:803:(pid 12948): DEALLOC_PD(0x801) op_mod(0x0) failed,
    status bad resource state(0x9), syndrome (0xef0c8a), err(-22)

To avoid this syndrome, move DEK pool destruction to
mlx5e_ktls_cleanup_tx(), which is called by profile->cleanup_tx(). And
move pool creation to mlx5e_ktls_init_tx() for symmetry.

Fixes: f741db1a51 ("net/mlx5e: kTLS, Improve connection rate by using fast update encryption key")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:04 -07:00
Vlad Buslov
eb02b93aad net/mlx5: Bridge, set debugfs access right to root-only
As suggested during code review set the access rights for bridge 'fdb'
debugfs file to root-only.

Fixes: 791eb78285 ("net/mlx5: Bridge, expose FDB state via debugfs")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20230619120515.5045132a@kernel.org/
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:04 -07:00
Dragos Tatulea
39646d9bcd net/mlx5e: xsk: Fix crash on regular rq reactivation
When the regular rq is reactivated after the XSK socket is closed
it could be reading stale cqes which eventually corrupts the rq.
This leads to no more traffic being received on the regular rq and a
crash on the next close or deactivation of the rq.

Kal Cuttler Conely reported this issue as a crash on the release
path when the xdpsock sample program is stopped (killed) and restarted
in sequence while traffic is running.

This patch flushes all cqes when during the rq flush. The cqe flushing
is done in the reset state of the rq. mlx5e_rq_to_ready code is moved
into the flush function to allow for this.

Fixes: 082a9edf12 ("net/mlx5e: xsk: Flush RQ on XSK activation to save memory")
Reported-by: Kal Cutter Conley <kal.conley@dectris.com>
Closes: https://lore.kernel.org/xdp-newbies/CAHApi-nUAs4TeFWUDV915CZJo07XVg2Vp63-no7UDfj6wur9nQ@mail.gmail.com
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:04 -07:00
Dragos Tatulea
e0f52298fe net/mlx5e: xsk: Fix invalid buffer access for legacy rq
The below crash can be encountered when using xdpsock in rx mode for
legacy rq: the buffer gets released in the XDP_REDIRECT path, and then
once again in the driver. This fix sets the flag to avoid releasing on
the driver side.

XSK handling of buffers for legacy rq was relying on the caller to set
the skip release flag. But the referenced fix started using fragment
counts for pages instead of the skip flag.

Crash log:
 general protection fault, probably for non-canonical address 0xffff8881217e3a: 0000 [#1] SMP
 CPU: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 6.5.0-rc1+ #31
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:bpf_prog_03b13f331978c78c+0xf/0x28
 Code:  ...
 RSP: 0018:ffff88810082fc98 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff888138404901 RCX: c0ffffc900027cbc
 RDX: ffffffffa000b514 RSI: 00ffff8881217e32 RDI: ffff888138404901
 RBP: ffff88810082fc98 R08: 0000000000091100 R09: 0000000000000006
 R10: 0000000000000800 R11: 0000000000000800 R12: ffffc9000027a000
 R13: ffff8881217e2dc0 R14: ffff8881217e2910 R15: ffff8881217e2f00
 FS:  0000000000000000(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000564cb2e2cde0 CR3: 000000010e603004 CR4: 0000000000370eb0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  ? die_addr+0x32/0x80
  ? exc_general_protection+0x192/0x390
  ? asm_exc_general_protection+0x22/0x30
  ? 0xffffffffa000b514
  ? bpf_prog_03b13f331978c78c+0xf/0x28
  mlx5e_xdp_handle+0x48/0x670 [mlx5_core]
  ? dev_gro_receive+0x3b5/0x6e0
  mlx5e_xsk_skb_from_cqe_linear+0x6e/0x90 [mlx5_core]
  mlx5e_handle_rx_cqe+0x55/0x100 [mlx5_core]
  mlx5e_poll_rx_cq+0x87/0x6e0 [mlx5_core]
  mlx5e_napi_poll+0x45e/0x6b0 [mlx5_core]
  __napi_poll+0x25/0x1a0
  net_rx_action+0x28a/0x300
  __do_softirq+0xcd/0x279
  ? sort_range+0x20/0x20
  run_ksoftirqd+0x1a/0x20
  smpboot_thread_fn+0xa2/0x130
  kthread+0xc9/0xf0
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>
 Modules linked in: mlx5_ib mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay zram zsmalloc fuse [last unloaded: mlx5_core]
 ---[ end trace 0000000000000000 ]---

Fixes: 7abd955a58 ("net/mlx5e: RX, Fix page_pool page fragment tracking for XDP")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:04 -07:00
Jianbo Liu
d03b6e6f31 net/mlx5e: Move representor neigh cleanup to profile cleanup_tx
For IP tunnel encapsulation in ECMP (Equal-Cost Multipath) mode, as
the flow is duplicated to the peer eswitch, the related neighbour
information on the peer uplink representor is created as well.

In the cited commit, eswitch devcom unpair is moved to uplink unload
API, specifically the profile->cleanup_tx. If there is a encap rule
offloaded in ECMP mode, when one eswitch does unpair (because of
unloading the driver, for instance), and the peer rule from the peer
eswitch is going to be deleted, the use-after-free error is triggered
while accessing neigh info, as it is already cleaned up in uplink's
profile->disable, which is before its profile->cleanup_tx.

To fix this issue, move the neigh cleanup to profile's cleanup_tx
callback, and after mlx5e_cleanup_uplink_rep_tx is called. The neigh
init is moved to init_tx for symmeter.

[ 2453.376299] BUG: KASAN: slab-use-after-free in mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
[ 2453.379125] Read of size 4 at addr ffff888127af9008 by task modprobe/2496

[ 2453.381542] CPU: 7 PID: 2496 Comm: modprobe Tainted: G    B              6.4.0-rc7+ #15
[ 2453.383386] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 2453.384335] Call Trace:
[ 2453.384625]  <TASK>
[ 2453.384891]  dump_stack_lvl+0x33/0x50
[ 2453.385285]  print_report+0xc2/0x610
[ 2453.385667]  ? __virt_addr_valid+0xb1/0x130
[ 2453.386091]  ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
[ 2453.386757]  kasan_report+0xae/0xe0
[ 2453.387123]  ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
[ 2453.387798]  mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
[ 2453.388465]  mlx5e_rep_encap_entry_detach+0xa6/0xe0 [mlx5_core]
[ 2453.389111]  mlx5e_encap_dealloc+0xa7/0x100 [mlx5_core]
[ 2453.389706]  mlx5e_tc_tun_encap_dests_unset+0x61/0xb0 [mlx5_core]
[ 2453.390361]  mlx5_free_flow_attr_actions+0x11e/0x340 [mlx5_core]
[ 2453.391015]  ? complete_all+0x43/0xd0
[ 2453.391398]  ? free_flow_post_acts+0x38/0x120 [mlx5_core]
[ 2453.392004]  mlx5e_tc_del_fdb_flow+0x4ae/0x690 [mlx5_core]
[ 2453.392618]  mlx5e_tc_del_fdb_peers_flow+0x308/0x370 [mlx5_core]
[ 2453.393276]  mlx5e_tc_clean_fdb_peer_flows+0xf5/0x140 [mlx5_core]
[ 2453.393925]  mlx5_esw_offloads_unpair+0x86/0x540 [mlx5_core]
[ 2453.394546]  ? mlx5_esw_offloads_set_ns_peer.isra.0+0x180/0x180 [mlx5_core]
[ 2453.395268]  ? down_write+0xaa/0x100
[ 2453.395652]  mlx5_esw_offloads_devcom_event+0x203/0x530 [mlx5_core]
[ 2453.396317]  mlx5_devcom_send_event+0xbb/0x190 [mlx5_core]
[ 2453.396917]  mlx5_esw_offloads_devcom_cleanup+0xb0/0xd0 [mlx5_core]
[ 2453.397582]  mlx5e_tc_esw_cleanup+0x42/0x120 [mlx5_core]
[ 2453.398182]  mlx5e_rep_tc_cleanup+0x15/0x30 [mlx5_core]
[ 2453.398768]  mlx5e_cleanup_rep_tx+0x6c/0x80 [mlx5_core]
[ 2453.399367]  mlx5e_detach_netdev+0xee/0x120 [mlx5_core]
[ 2453.399957]  mlx5e_netdev_change_profile+0x84/0x170 [mlx5_core]
[ 2453.400598]  mlx5e_vport_rep_unload+0xe0/0xf0 [mlx5_core]
[ 2453.403781]  mlx5_eswitch_unregister_vport_reps+0x15e/0x190 [mlx5_core]
[ 2453.404479]  ? mlx5_eswitch_register_vport_reps+0x200/0x200 [mlx5_core]
[ 2453.405170]  ? up_write+0x39/0x60
[ 2453.405529]  ? kernfs_remove_by_name_ns+0xb7/0xe0
[ 2453.405985]  auxiliary_bus_remove+0x2e/0x40
[ 2453.406405]  device_release_driver_internal+0x243/0x2d0
[ 2453.406900]  ? kobject_put+0x42/0x2d0
[ 2453.407284]  bus_remove_device+0x128/0x1d0
[ 2453.407687]  device_del+0x240/0x550
[ 2453.408053]  ? waiting_for_supplier_show+0xe0/0xe0
[ 2453.408511]  ? kobject_put+0xfa/0x2d0
[ 2453.408889]  ? __kmem_cache_free+0x14d/0x280
[ 2453.409310]  mlx5_rescan_drivers_locked.part.0+0xcd/0x2b0 [mlx5_core]
[ 2453.409973]  mlx5_unregister_device+0x40/0x50 [mlx5_core]
[ 2453.410561]  mlx5_uninit_one+0x3d/0x110 [mlx5_core]
[ 2453.411111]  remove_one+0x89/0x130 [mlx5_core]
[ 2453.411628]  pci_device_remove+0x59/0xf0
[ 2453.412026]  device_release_driver_internal+0x243/0x2d0
[ 2453.412511]  ? parse_option_str+0x14/0x90
[ 2453.412915]  driver_detach+0x7b/0xf0
[ 2453.413289]  bus_remove_driver+0xb5/0x160
[ 2453.413685]  pci_unregister_driver+0x3f/0xf0
[ 2453.414104]  mlx5_cleanup+0xc/0x20 [mlx5_core]

Fixes: 2be5bd42a5 ("net/mlx5: Handle pairing of E-switch via uplink un/load APIs")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:03 -07:00
Amir Tzin
3ec43c1b08 net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set
Moving to switchdev mode with ntuple offload on causes the kernel to
crash since fs->arfs is freed during nic profile cleanup flow.

Ntuple offload is not supported in switchdev mode and it is already
unset by mlx5 fix feature ndo in switchdev mode. Verify fs->arfs is
valid before disabling it.

trace:
[] RIP: 0010:_raw_spin_lock_bh+0x17/0x30
[] arfs_del_rules+0x44/0x1a0 [mlx5_core]
[] mlx5e_arfs_disable+0xe/0x20 [mlx5_core]
[] mlx5e_handle_feature+0x3d/0xb0 [mlx5_core]
[] ? __rtnl_unlock+0x25/0x50
[] mlx5e_set_features+0xfe/0x160 [mlx5_core]
[] __netdev_update_features+0x278/0xa50
[] ? netdev_run_todo+0x5e/0x2a0
[] netdev_update_features+0x22/0x70
[] ? _cond_resched+0x15/0x30
[] mlx5e_attach_netdev+0x12a/0x1e0 [mlx5_core]
[] mlx5e_netdev_attach_profile+0xa1/0xc0 [mlx5_core]
[] mlx5e_netdev_change_profile+0x77/0xe0 [mlx5_core]
[] mlx5e_vport_rep_load+0x1ed/0x290 [mlx5_core]
[] mlx5_esw_offloads_rep_load+0x88/0xd0 [mlx5_core]
[] esw_offloads_load_rep.part.38+0x31/0x50 [mlx5_core]
[] esw_offloads_enable+0x6c5/0x710 [mlx5_core]
[] mlx5_eswitch_enable_locked+0x1bb/0x290 [mlx5_core]
[] mlx5_devlink_eswitch_mode_set+0x14f/0x320 [mlx5_core]
[] devlink_nl_cmd_eswitch_set_doit+0x94/0x120
[] genl_family_rcv_msg_doit.isra.17+0x113/0x150
[] genl_family_rcv_msg+0xb7/0x170
[] ? devlink_nl_cmd_port_split_doit+0x100/0x100
[] genl_rcv_msg+0x47/0xa0
[] ? genl_family_rcv_msg+0x170/0x170
[] netlink_rcv_skb+0x4c/0x130
[] genl_rcv+0x24/0x40
[] netlink_unicast+0x19a/0x230
[] netlink_sendmsg+0x204/0x3d0
[] sock_sendmsg+0x50/0x60

Fixes: 90b22b9bcd ("net/mlx5e: Disable Rx ntuple offload for uplink representor")
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:03 -07:00
Chris Mi
93a331939d net/mlx5e: Don't hold encap tbl lock if there is no encap action
The cited commit holds encap tbl lock unconditionally when setting
up dests. But it may cause the following deadlock:

 PID: 1063722  TASK: ffffa062ca5d0000  CPU: 13   COMMAND: "handler8"
  #0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91
  #1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb
  #2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528
  #3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb
  #4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb
  #5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core]
  #6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core]
  #7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core]
  #8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core]
  #9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core]
 #10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core]
 #11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core]
 #12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core]
 #13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8
 #14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower]
 #15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower]
 #16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0

[1031218.028143]  wait_for_completion+0x24/0x30
[1031218.028589]  mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core]
[1031218.029256]  mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core]
[1031218.029885]  process_one_work+0x24e/0x510

Actually no need to hold encap tbl lock if there is no encap action.
Fix it by checking if encap action exists or not before holding
encap tbl lock.

Fixes: 37c3b9fa7c ("net/mlx5e: Prevent encap offload when neigh update is running")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:03 -07:00
Shay Drory
0507f2c8be net/mlx5: Honor user input for migratable port fn attr
Currently, whenever a user is setting migratable port fn attr, the
driver is always turn migratable capability on.
Fix it by honor the user input

Fixes: e5b9642a33 ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:03 -07:00
Yuanjun Gong
e5bcb7564d net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
mlx5e_ipsec_remove_trailer() should return an error code if function
pskb_trim() returns an unexpected value.

Fixes: 2ac9cfe782 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:03 -07:00
Zhengchao Shao
c6cf0b6097 net/mlx5: fix potential memory leak in mlx5e_init_rep_rx
The memory pointed to by the priv->rx_res pointer is not freed in the error
path of mlx5e_init_rep_rx, which can lead to a memory leak. Fix by freeing
the memory in the error path, thereby making the error path identical to
mlx5e_cleanup_rep_rx().

Fixes: af8bbf7300 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:02 -07:00
Zhengchao Shao
5dd77585dd net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
when mlx5_cmd_exec failed in mlx5dr_cmd_create_reformat_ctx, the memory
pointed by 'in' is not released, which will cause memory leak. Move memory
release after mlx5_cmd_exec.

Fixes: 1d9186476e ("net/mlx5: DR, Add direct rule command utilities")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:02 -07:00
Zhengchao Shao
aeb660171b net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups
In function macsec_fs_tx_create_crypto_table_groups(), when the ft->g
memory is successfully allocated but the 'in' memory fails to be
allocated, the memory pointed to by ft->g is released once. And in function
macsec_fs_tx_create(), macsec_fs_tx_destroy() is called to release the
memory pointed to by ft->g again. This will cause double free problem.

Fixes: e467b283ff ("net/mlx5e: Add MACsec TX steering rules")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-26 14:31:02 -07:00
Muhammad Husaini Zulkifli
d4a7ce6421 igc: Fix Kernel Panic during ndo_tx_timeout callback
The Xeon validation group has been carrying out some loaded tests
with various HW configurations, and they have seen some transmit
queue time out happening during the test. This will cause the
reset adapter function to be called by igc_tx_timeout().
Similar race conditions may arise when the interface is being brought
down and up in igc_reinit_locked(), an interrupt being generated, and
igc_clean_tx_irq() being called to complete the TX.

When the igc_tx_timeout() function is invoked, this patch will turn
off all TX ring HW queues during igc_down() process. TX ring HW queues
will be activated again during the igc_configure_tx_ring() process
when performing the igc_up() procedure later.

This patch also moved existing igc_disable_tx_ring_hw() to avoid using
forward declaration.

Kernel trace:
[ 7678.747813] ------------[ cut here ]------------
[ 7678.757914] NETDEV WATCHDOG: enp1s0 (igc): transmit queue 2 timed out
[ 7678.770117] WARNING: CPU: 0 PID: 13 at net/sched/sch_generic.c:525 dev_watchdog+0x1ae/0x1f0
[ 7678.784459] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype nft_compat
nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO) rktpm(PO)
cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO) svfs_pci_hotplug(PO)
vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO) svheartbeat(PO) ioapic(PO)
sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO) smbus(PO) spiflash_cdf(PO) arden(PO)
dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO) pch(PO) sviotargets(PO) svbdf(PO) svmem(PO)
svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO) svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO)
fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O) ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO)
regsupport(O) libnvdimm nls_cp437 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel
snd_intel_dspcfg snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci
[ 7678.784496]  input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm fuse backlight
configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic pegasus mmc_block usbhid
mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa scsi_transport_sas e1000e e1000 e100 ax88179_178a
usbnet xhci_pci sd_mod xhci_hcd t10_pi crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore
crct10dif_generic ptp crct10dif_common usb_common pps_core
[ 7679.200403] RIP: 0010:dev_watchdog+0x1ae/0x1f0
[ 7679.210201] Code: 28 e9 53 ff ff ff 4c 89 e7 c6 05 06 42 b9 00 01 e8 17 d1 fb ff 44 89 e9 4c
89 e6 48 c7 c7 40 ad fb 81 48 89 c2 e8 52 62 82 ff <0f> 0b e9 72 ff ff ff 65 8b 05 80 7d 7c 7e
89 c0 48 0f a3 05 0a c1
[ 7679.245438] RSP: 0018:ffa00000001f7d90 EFLAGS: 00010282
[ 7679.256021] RAX: 0000000000000000 RBX: ff11000109938440 RCX: 0000000000000000
[ 7679.268710] RDX: ff11000361e26cd8 RSI: ff11000361e1b880 RDI: ff11000361e1b880
[ 7679.281314] RBP: ffa00000001f7da8 R08: ff1100035f8fffe8 R09: 0000000000027ffb
[ 7679.293840] R10: 0000000000001f0a R11: ff1100035f840000 R12: ff11000109938000
[ 7679.306276] R13: 0000000000000002 R14: dead000000000122 R15: ffa00000001f7e18
[ 7679.318648] FS:  0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
[ 7679.332064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7679.342757] CR2: 00007ffff7fca168 CR3: 000000013b08a006 CR4: 0000000000471ef8
[ 7679.354984] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7679.367207] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 7679.379370] PKRU: 55555554
[ 7679.386446] Call Trace:
[ 7679.393152]  <TASK>
[ 7679.399363]  ? __pfx_dev_watchdog+0x10/0x10
[ 7679.407870]  call_timer_fn+0x31/0x110
[ 7679.415698]  expire_timers+0xb2/0x120
[ 7679.423403]  run_timer_softirq+0x179/0x1e0
[ 7679.431532]  ? __schedule+0x2b1/0x820
[ 7679.439078]  __do_softirq+0xd1/0x295
[ 7679.446426]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 7679.454867]  run_ksoftirqd+0x22/0x30
[ 7679.462058]  smpboot_thread_fn+0xb7/0x160
[ 7679.469670]  kthread+0xcd/0xf0
[ 7679.476097]  ? __pfx_kthread+0x10/0x10
[ 7679.483211]  ret_from_fork+0x29/0x50
[ 7679.490047]  </TASK>
[ 7679.495204] ---[ end trace 0000000000000000 ]---
[ 7679.503179] igc 0000:01:00.0 enp1s0: Register Dump
[ 7679.511230] igc 0000:01:00.0 enp1s0: Register Name   Value
[ 7679.519892] igc 0000:01:00.0 enp1s0: CTRL            181c0641
[ 7679.528782] igc 0000:01:00.0 enp1s0: STATUS          40280683
[ 7679.537551] igc 0000:01:00.0 enp1s0: CTRL_EXT        10000040
[ 7679.546284] igc 0000:01:00.0 enp1s0: MDIC            180a3800
[ 7679.554942] igc 0000:01:00.0 enp1s0: ICR             00000081
[ 7679.563503] igc 0000:01:00.0 enp1s0: RCTL            04408022
[ 7679.571963] igc 0000:01:00.0 enp1s0: RDLEN[0-3]      00001000 00001000 00001000 00001000
[ 7679.583075] igc 0000:01:00.0 enp1s0: RDH[0-3]        00000068 000000b6 0000000f 00000031
[ 7679.594162] igc 0000:01:00.0 enp1s0: RDT[0-3]        00000066 000000b2 0000000e 00000030
[ 7679.605174] igc 0000:01:00.0 enp1s0: RXDCTL[0-3]     02040808 02040808 02040808 02040808
[ 7679.616196] igc 0000:01:00.0 enp1s0: RDBAL[0-3]      1bb7c000 1bb7f000 1bb82000 0ef33000
[ 7679.627242] igc 0000:01:00.0 enp1s0: RDBAH[0-3]      00000001 00000001 00000001 00000001
[ 7679.638256] igc 0000:01:00.0 enp1s0: TCTL            a503f0fa
[ 7679.646607] igc 0000:01:00.0 enp1s0: TDBAL[0-3]      2ba4a000 1bb6f000 1bb74000 1bb79000
[ 7679.657609] igc 0000:01:00.0 enp1s0: TDBAH[0-3]      00000001 00000001 00000001 00000001
[ 7679.668551] igc 0000:01:00.0 enp1s0: TDLEN[0-3]      00001000 00001000 00001000 00001000
[ 7679.679470] igc 0000:01:00.0 enp1s0: TDH[0-3]        000000a7 0000002d 000000bf 000000d9
[ 7679.690406] igc 0000:01:00.0 enp1s0: TDT[0-3]        000000a7 0000002d 000000bf 000000d9
[ 7679.701264] igc 0000:01:00.0 enp1s0: TXDCTL[0-3]     02100108 02100108 02100108 02100108
[ 7679.712123] igc 0000:01:00.0 enp1s0: Reset adapter
[ 7683.085967] igc 0000:01:00.0 enp1s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 8086.945561] ------------[ cut here ]------------
Entering kdb (current=0xffffffff8220b200, pid 0) on processor 0
Oops: (null) due to oops @ 0xffffffff81573888
RIP: 0010:dql_completed+0x148/0x160
Code: c9 00 48 89 57 58 e9 46 ff ff ff 45 85 e4 41 0f 95 c4 41 39 db 0f 95
c1 41 84 cc 74 05 45 85 ed 78 0a 44 89 c1 e9 27 ff ff ff <0f> 0b 01 f6 44 89
c1 29 f1 0f 48 ca eb 8c cc cc cc cc cc cc cc cc
RSP: 0018:ffa0000000003e00 EFLAGS: 00010287
RAX: 000000000000006c RBX: ffa0000003eb0f78 RCX: ff11000109938000
RDX: 0000000000000003 RSI: 0000000000000160 RDI: ff110001002e9480
RBP: ffa0000000003ed8 R08: ff110001002e93c0 R09: ffa0000000003d28
R10: 0000000000007cc0 R11: 0000000000007c54 R12: 00000000ffffffd9
R13: ff1100037039cb00 R14: 00000000ffffffd9 R15: ff1100037039c048
FS:  0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <IRQ>
 ? igc_poll+0x1a9/0x14d0 [igc]
 __napi_poll+0x2e/0x1b0
 net_rx_action+0x126/0x250
 __do_softirq+0xd1/0x295
 irq_exit_rcu+0xc5/0xf0
 common_interrupt+0x86/0xa0
 </IRQ>
 <TASK>
 asm_common_interrupt+0x27/0x40
RIP: 0010:cpuidle_enter_state+0xd3/0x3e0
Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00 31 ff e8 1b
de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00 49 63 cf
4c 2b 75 c8 48 8d 04 49 48 89 ca 48 8d
RSP: 0018:ffffffff82203df0 EFLAGS: 00000202
RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000
RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018
R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000
R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002
 cpuidle_enter+0x2e/0x50
 call_cpuidle+0x23/0x40
 do_idle+0x1be/0x220
 cpu_startup_entry+0x20/0x30
 rest_init+0xb5/0xc0
 arch_call_rest_init+0xe/0x30
 start_kernel+0x448/0x760
 x86_64_start_kernel+0x109/0x150
 secondary_startup_64_no_verify+0xe0/0xeb
 </TASK>
more>
[0]kdb>

[0]kdb>
[0]kdb> go
Catastrophic error detected
kdb_continue_catastrophic=0, type go a second time if you really want to
continue
[0]kdb> go
Catastrophic error detected
kdb_continue_catastrophic=0, attempting to continue
[ 8086.955689] refcount_t: underflow; use-after-free.
[ 8086.955697] WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0xc2/0x110
[ 8086.955706] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype nft_compat
nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO) rktpm(PO)
cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO)
svfs_pci_hotplug(PO) vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO)
svheartbeat(PO) ioapic(PO) sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO)
smbus(PO) spiflash_cdf(PO) arden(PO) dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO)
pch(PO) sviotargets(PO) svbdf(PO) svmem(PO) svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO)
svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO) fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O)
ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO) regsupport(O) libnvdimm nls_cp437
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg
snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci
[ 8086.955751]  input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm
fuse backlight configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic
pegasus mmc_block usbhid mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa
scsi_transport_sas e1000e e1000 e100 ax88179_178a usbnet xhci_pci sd_mod xhci_hcd t10_pi
crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore crct10dif_generic ptp crct10dif_common
usb_common pps_core
[ 8086.955784] RIP: 0010:refcount_warn_saturate+0xc2/0x110
[ 8086.955788] Code: 01 e8 82 e7 b4 ff 0f 0b 5d c3 cc cc cc cc 80 3d 68 c6 eb 00 00 75 81
48 c7 c7 a0 87 f6 81 c6 05 58 c6 eb 00 01 e8 5e e7 b4 ff <0f> 0b 5d c3 cc cc cc cc 80 3d
42 c6 eb 00 00 0f 85 59 ff ff ff 48
[ 8086.955790] RSP: 0018:ffa0000000003da0 EFLAGS: 00010286
[ 8086.955793] RAX: 0000000000000000 RBX: ff1100011da40ee0 RCX: ff11000361e1b888
[ 8086.955794] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ff11000361e1b880
[ 8086.955795] RBP: ffa0000000003da0 R08: 80000000ffff9f45 R09: ffa0000000003d28
[ 8086.955796] R10: ff1100035f840000 R11: 0000000000000028 R12: ff11000319ff8000
[ 8086.955797] R13: ff1100011bb79d60 R14: 00000000ffffffd6 R15: ff1100037039cb00
[ 8086.955798] FS:  0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
[ 8086.955800] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8086.955801] CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8
[ 8086.955803] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8086.955803] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 8086.955804] PKRU: 55555554
[ 8086.955805] Call Trace:
[ 8086.955806]  <IRQ>
[ 8086.955808]  tcp_wfree+0x112/0x130
[ 8086.955814]  skb_release_head_state+0x24/0xa0
[ 8086.955818]  napi_consume_skb+0x9c/0x160
[ 8086.955821]  igc_poll+0x5d8/0x14d0 [igc]
[ 8086.955835]  __napi_poll+0x2e/0x1b0
[ 8086.955839]  net_rx_action+0x126/0x250
[ 8086.955843]  __do_softirq+0xd1/0x295
[ 8086.955846]  irq_exit_rcu+0xc5/0xf0
[ 8086.955851]  common_interrupt+0x86/0xa0
[ 8086.955857]  </IRQ>
[ 8086.955857]  <TASK>
[ 8086.955858]  asm_common_interrupt+0x27/0x40
[ 8086.955862] RIP: 0010:cpuidle_enter_state+0xd3/0x3e0
[ 8086.955866] Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00 31 ff e8
1b de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00 49 63 cf 4c 2b 75
c8 48 8d 04 49 48 89 ca 48 8d
[ 8086.955867] RSP: 0018:ffffffff82203df0 EFLAGS: 00000202
[ 8086.955869] RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f
[ 8086.955870] RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000
[ 8086.955871] RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018
[ 8086.955872] R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000
[ 8086.955873] R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002
[ 8086.955875]  cpuidle_enter+0x2e/0x50
[ 8086.955880]  call_cpuidle+0x23/0x40
[ 8086.955884]  do_idle+0x1be/0x220
[ 8086.955887]  cpu_startup_entry+0x20/0x30
[ 8086.955889]  rest_init+0xb5/0xc0
[ 8086.955892]  arch_call_rest_init+0xe/0x30
[ 8086.955895]  start_kernel+0x448/0x760
[ 8086.955898]  x86_64_start_kernel+0x109/0x150
[ 8086.955900]  secondary_startup_64_no_verify+0xe0/0xeb
[ 8086.955904]  </TASK>
[ 8086.955904] ---[ end trace 0000000000000000 ]---
[ 8086.955912] ------------[ cut here ]------------
[ 8086.955913] kernel BUG at lib/dynamic_queue_limits.c:27!
[ 8086.955918] invalid opcode: 0000 [#1] SMP
[ 8086.955922] RIP: 0010:dql_completed+0x148/0x160
[ 8086.955925] Code: c9 00 48 89 57 58 e9 46 ff ff ff 45 85 e4 41 0f 95 c4 41 39 db
0f 95 c1 41 84 cc 74 05 45 85 ed 78 0a 44 89 c1 e9 27 ff ff ff <0f> 0b 01 f6 44 89
c1 29 f1 0f 48 ca eb 8c cc cc cc cc cc cc cc cc
[ 8086.955927] RSP: 0018:ffa0000000003e00 EFLAGS: 00010287
[ 8086.955928] RAX: 000000000000006c RBX: ffa0000003eb0f78 RCX: ff11000109938000
[ 8086.955929] RDX: 0000000000000003 RSI: 0000000000000160 RDI: ff110001002e9480
[ 8086.955930] RBP: ffa0000000003ed8 R08: ff110001002e93c0 R09: ffa0000000003d28
[ 8086.955931] R10: 0000000000007cc0 R11: 0000000000007c54 R12: 00000000ffffffd9
[ 8086.955932] R13: ff1100037039cb00 R14: 00000000ffffffd9 R15: ff1100037039c048
[ 8086.955933] FS:  0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
[ 8086.955934] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8086.955935] CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8
[ 8086.955936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8086.955937] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 8086.955938] PKRU: 55555554
[ 8086.955939] Call Trace:
[ 8086.955939]  <IRQ>
[ 8086.955940]  ? igc_poll+0x1a9/0x14d0 [igc]
[ 8086.955949]  __napi_poll+0x2e/0x1b0
[ 8086.955952]  net_rx_action+0x126/0x250
[ 8086.955956]  __do_softirq+0xd1/0x295
[ 8086.955958]  irq_exit_rcu+0xc5/0xf0
[ 8086.955961]  common_interrupt+0x86/0xa0
[ 8086.955964]  </IRQ>
[ 8086.955965]  <TASK>
[ 8086.955965]  asm_common_interrupt+0x27/0x40
[ 8086.955968] RIP: 0010:cpuidle_enter_state+0xd3/0x3e0
[ 8086.955971] Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00
31 ff e8 1b de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00
49 63 cf 4c 2b 75 c8 48 8d 04 49 48 89 ca 48 8d
[ 8086.955972] RSP: 0018:ffffffff82203df0 EFLAGS: 00000202
[ 8086.955973] RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f
[ 8086.955974] RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000
[ 8086.955974] RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018
[ 8086.955975] R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000
[ 8086.955976] R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002
[ 8086.955978]  cpuidle_enter+0x2e/0x50
[ 8086.955981]  call_cpuidle+0x23/0x40
[ 8086.955984]  do_idle+0x1be/0x220
[ 8086.955985]  cpu_startup_entry+0x20/0x30
[ 8086.955987]  rest_init+0xb5/0xc0
[ 8086.955990]  arch_call_rest_init+0xe/0x30
[ 8086.955992]  start_kernel+0x448/0x760
[ 8086.955994]  x86_64_start_kernel+0x109/0x150
[ 8086.955996]  secondary_startup_64_no_verify+0xe0/0xeb
[ 8086.955998]  </TASK>
[ 8086.955999] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype
nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO)
rktpm(PO) cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO)
svfs_pci_hotplug(PO) vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO)
svheartbeat(PO) ioapic(PO) sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO)
smbus(PO) spiflash_cdf(PO) arden(PO) dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO)
pch(PO) sviotargets(PO) svbdf(PO) svmem(PO) svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO)
svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO) fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O)
ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO) regsupport(O) libnvdimm nls_cp437
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg
snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci
[ 8086.956029]  input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm
fuse backlight configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic
pegasus mmc_block usbhid mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa
scsi_transport_sas e1000e e1000 e100 ax88179_178a usbnet xhci_pci sd_mod xhci_hcd t10_pi
crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore crct10dif_generic ptp crct10dif_common
usb_common pps_core
[16762.543675] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.593 msecs
[16762.543678] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.595 msecs
[16762.543673] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.495 msecs
[16762.543679] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.599 msecs
[16762.543678] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.598 msecs
[16762.543690] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.605 msecs
[16762.543684] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.599 msecs
[16762.543693] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.613 msecs
[16762.543784] ---[ end trace 0000000000000000 ]---
[16762.849099] RIP: 0010:dql_completed+0x148/0x160
PANIC: Fatal exception in interrupt

Fixes: 9b27517627 ("igc: Add ndo_tx_timeout support")
Tested-by: Alejandra Victoria Alcaraz <alejandra.victoria.alcaraz@intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-26 09:54:40 +01:00
Christian Marangi
dfd739f182 net: dsa: qca8k: fix mdb add/del case with 0 VID
The qca8k switch doesn't support using 0 as VID and require a default
VID to be always set. MDB add/del function doesn't currently handle
this and are currently setting the default VID.

Fix this by correctly handling this corner case and internally use the
default VID for VID 0 case.

Fixes: ba8f870dfa ("net: dsa: qca8k: add support for mdb_add/del")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-26 08:50:10 +01:00
Christian Marangi
ae70dcb9d9 net: dsa: qca8k: fix broken search_and_del
On deleting an MDB entry for a port, fdb_search_and_del is used.
An FDB entry can't be modified so it needs to be deleted and readded
again with the new portmap (and the port deleted as requested)

We use the SEARCH operator to search the entry to edit by vid and mac
address and then we check the aging if we actually found an entry.

Currently the code suffer from a bug where the searched fdb entry is
never read again with the found values (if found) resulting in the code
always returning -EINVAL as aging was always 0.

Fix this by correctly read the fdb entry after it was searched.

Fixes: ba8f870dfa ("net: dsa: qca8k: add support for mdb_add/del")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-26 08:50:10 +01:00
Christian Marangi
80248d4160 net: dsa: qca8k: fix search_and_insert wrong handling of new rule
On inserting a mdb entry, fdb_search_and_insert is used to add a port to
the qca8k target entry in the FDB db.

A FDB entry can't be modified so it needs to be removed and insert again
with the new values.

To detect if an entry already exist, the SEARCH operation is used and we
check the aging of the entry. If the entry is not 0, the entry exist and
we proceed to delete it.

Current code have 2 main problem:
- The condition to check if the FDB entry exist is wrong and should be
  the opposite.
- When a FDB entry doesn't exist, aging was never actually set to the
  STATIC value resulting in allocating an invalid entry.

Fix both problem by adding aging support to the function, calling the
function with STATIC as aging by default and finally by correct the
condition to check if the entry actually exist.

Fixes: ba8f870dfa ("net: dsa: qca8k: add support for mdb_add/del")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-26 08:50:10 +01:00
Christian Marangi
2c39dd025d net: dsa: qca8k: enable use_single_write for qca8xxx
The qca8xxx switch supports 2 way to write reg values, a slow way using
mdio and a fast way by sending specially crafted mgmt packet to
read/write reg.

The fast way can support up to 32 bytes of data as eth packet are used
to send/receive.

This correctly works for almost the entire regmap of the switch but with
the use of some kernel selftests for dsa drivers it was found a funny
and interesting hw defect/limitation.

For some specific reg, bulk write won't work and will result in writing
only part of the requested regs resulting in half data written. This was
especially hard to track and discover due to the total strangeness of
the problem and also by the specific regs where this occurs.

This occurs in the specific regs of the ATU table, where multiple entry
needs to be written to compose the entire entry.
It was discovered that with a bulk write of 12 bytes on
QCA8K_REG_ATU_DATA0 only QCA8K_REG_ATU_DATA0 and QCA8K_REG_ATU_DATA2
were written, but QCA8K_REG_ATU_DATA1 was always zero.
Tcpdump was used to make sure the specially crafted packet was correct
and this was confirmed.

The problem was hard to track as the lack of QCA8K_REG_ATU_DATA1
resulted in an entry somehow possible as the first bytes of the mac
address are set in QCA8K_REG_ATU_DATA0 and the entry type is set in
QCA8K_REG_ATU_DATA2.

Funlly enough writing QCA8K_REG_ATU_DATA1 results in the same problem
with QCA8K_REG_ATU_DATA2 empty and QCA8K_REG_ATU_DATA1 and
QCA8K_REG_ATU_FUNC correctly written.
A speculation on the problem might be that there are some kind of
indirection internally when accessing these regs and they can't be
accessed all together, due to the fact that it's really a table mapped
somewhere in the switch SRAM.

Even more funny is the fact that every other reg was tested with all
kind of combination and they are not affected by this problem. Read
operation was also tested and always worked so it's not affected by this
problem.

The problem is not present if we limit writing a single reg at times.

To handle this hardware defect, enable use_single_write so that bulk
api can correctly split the write in multiple different operation
effectively reverting to a non-bulk write.

Cc: Mark Brown <broonie@kernel.org>
Fixes: c766e077d9 ("net: dsa: qca8k: convert to regmap read/write API")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-26 08:50:10 +01:00
Kalle Valo
a1ce186db7 Revert "wifi: ath6k: silence false positive -Wno-dangling-pointer warning on GCC 12"
This reverts commit bd1d129daa.

The dangling-pointer warnings were disabled kernel-wide by commit 49beadbd47
("gcc-12: disable '-Wdangling-pointer' warning for now") for v5.19. So this
hack in ath6kl is not needed anymore.

Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230724100823.2948804-1-kvalo@kernel.org
2023-07-26 10:09:28 +03:00
Kalle Valo
d265ebe41c Revert "wifi: ath11k: Enable threaded NAPI"
This reverts commit 13aa2fb692.

This commit broke QCN9074 initialisation:

[  358.960477] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 36866
[  358.960481] ath11k_pci 0000:04:00.0: failed to send WMI_STA_POWERSAVE_PARAM_CMDID
[  358.960484] ath11k_pci 0000:04:00.0: could not set uapsd params -105

As there's no fix available let's just revert it to get QCN9074 working again.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217536
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230720151444.2016637-1-kvalo@kernel.org
2023-07-26 09:56:53 +03:00
Alex Elder
e11ec2b868 net: ipa: only reset hashed tables when supported
Last year, the code that manages GSI channel transactions switched
from using spinlock-protected linked lists to using indexes into the
ring buffer used for a channel.  Recently, Google reported seeing
transaction reference count underflows occasionally during shutdown.

Doug Anderson found a way to reproduce the issue reliably, and
bisected the issue to the commit that eliminated the linked lists
and the lock.  The root cause was ultimately determined to be
related to unused transactions being committed as part of the modem
shutdown cleanup activity.  Unused transactions are not normally
expected (except in error cases).

The modem uses some ranges of IPA-resident memory, and whenever it
shuts down we zero those ranges.  In ipa_filter_reset_table() a
transaction is allocated to zero modem filter table entries.  If
hashing is not supported, hashed table memory should not be zeroed.
But currently nothing prevents that, and the result is an unused
transaction.  Something similar occurs when we zero routing table
entries for the modem.

By preventing any attempt to clear hashed tables when hashing is not
supported, the reference count underflow is avoided in this case.

Note that there likely remains an issue with properly freeing unused
transactions (if they occur due to errors).  This patch addresses
only the underflows that Google originally reported.

Cc: <stable@vger.kernel.org> # 6.1.x
Fixes: d338ae28d8 ("net: ipa: kill all other transaction lists")
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20230724224055.1688854-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-25 20:34:20 -07:00
Lin Ma
55cef78c24 macvlan: add forgotten nla_policy for IFLA_MACVLAN_BC_CUTOFF
The previous commit 954d1fa1ac ("macvlan: Add netlink attribute for
broadcast cutoff") added one additional attribute named
IFLA_MACVLAN_BC_CUTOFF to allow broadcast cutfoff.

However, it forgot to describe the nla_policy at macvlan_policy
(drivers/net/macvlan.c). Hence, this suppose NLA_S32 (4 bytes) integer
can be faked as empty (0 bytes) by a malicious user, which could leads
to OOB in heap just like CVE-2023-3773.

To fix it, this commit just completes the nla_policy description for
IFLA_MACVLAN_BC_CUTOFF. This enforces the length check and avoids the
potential OOB read.

Fixes: 954d1fa1ac ("macvlan: Add netlink attribute for broadcast cutoff")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230723080205.3715164-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-25 19:48:50 -07:00
Vincent Whitchurch
284779dbf4 net: stmmac: Apply redundant write work around on 4.xx too
commit a3a57bf07d ("net: stmmac: work
around sporadic tx issue on link-up") worked around a problem with TX
sometimes not working after a link-up by avoiding a redundant write to
MAC_CTRL_REG (aka GMAC_CONFIG), since the IP appeared to have problems
with handling multiple writes to that register in some cases.

That commit however only added the work around to dwmac_lib.c (apart
from the common code in stmmac_main.c), but my systems with version
4.21a of the IP exhibit the same problem, so add the work around to
dwmac4_lib.c too.

Fixes: a3a57bf07d ("net: stmmac: work around sporadic tx issue on link-up")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230721-stmmac-tx-workaround-v1-1-9411cbd5ee07@axis.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-25 11:03:55 +02:00
Suman Ghosh
4e62c99d71 octeontx2-af: Fix hash extraction enable configuration
As of today, hash extraction support is enabled for all the silicons.
Because of which we are facing initialization issues when the silicon
does not support hash extraction. During creation of the hardware
parsing table for IPv6 address, we need to consider if hash extraction
is enabled then extract only 32 bit, otherwise 128 bit needs to be
extracted. This patch fixes the issue and configures the hardware parser
based on the availability of the feature.

Fixes: a95ab93550 ("octeontx2-af: Use hashed field in MCAM key")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230721061222.2632521-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-25 10:12:26 +02:00
Hangbin Liu
fa532bee17 team: reset team's flags when down link is P2P device
When adding a point to point downlink to team device, we neglected to reset
the team's flags, which were still using flags like BROADCAST and
MULTICAST. Consequently, this would initiate ARP/DAD for P2P downlink
interfaces, such as when adding a GRE device to team device. Fix this by
remove multicast/broadcast flags and add p2p and noarp flags.

After removing the none ethernet interface and adding an ethernet interface
to team, we need to reset team interface flags. Unlike bonding interface,
team do not need restore IFF_MASTER, IFF_SLAVE flags.

Reported-by: Liang Li <liali@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2221438
Fixes: 1d76efe157 ("team: add support for non-ethernet devices")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-25 09:16:17 +02:00
Hangbin Liu
da19a2b967 bonding: reset bond's flags when down link is P2P device
When adding a point to point downlink to the bond, we neglected to reset
the bond's flags, which were still using flags like BROADCAST and
MULTICAST. Consequently, this would initiate ARP/DAD for P2P downlink
interfaces, such as when adding a GRE device to the bonding.

To address this issue, let's reset the bond's flags for P2P interfaces.

Before fix:
7: gre0@NONE: <POINTOPOINT,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UNKNOWN group default qlen 1000
    link/gre6 2006:70:10::1 peer 2006:70:10::2 permaddr 167f:18:f188::
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/gre6 2006:70:10::1 brd 2006:70:10::2
    inet6 fe80::200:ff:fe00:0/64 scope link
       valid_lft forever preferred_lft forever

After fix:
7: gre0@NONE: <POINTOPOINT,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond2 state UNKNOWN group default qlen 1000
    link/gre6 2006:70:10::1 peer 2006:70:10::2 permaddr c29e:557a:e9d9::
8: bond0: <POINTOPOINT,NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/gre6 2006:70:10::1 peer 2006:70:10::2
    inet6 fe80::1/64 scope link
       valid_lft forever preferred_lft forever

Reported-by: Liang Li <liali@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2221438
Fixes: 872254dd6b ("net/bonding: Enable bonding to enslave non ARPHRD_ETHER")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-25 09:16:17 +02:00
Jakub Kicinski
f0291103d2 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-07-21 (i40e, iavf)

This series contains updates to i40e and iavf drivers.

Wang Ming corrects an error check on i40e.

Jake unlocks crit_lock on allocation failure to prevent deadlock and
stops re-enabling of interrupts when it's not intended for iavf.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  iavf: check for removal state before IAVF_FLAG_PF_COMMS_FAILED
  iavf: fix potential deadlock on allocation failure
  i40e: Fix an NULL vs IS_ERR() bug for debugfs_create_dir()
====================

Link: https://lore.kernel.org/r/20230721155812.1292752-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24 17:12:06 -07:00
Jakub Kicinski
ac2a7b1317 Merge tag 'linux-can-fixes-for-6.5-20230724' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:

====================
pull-request: can 2023-07-24

The first patch is by me and adds a missing set of CAN state to
CAN_STATE_STOPPED on close in the gs_usb driver.

The last patch is by Eric Dumazet and fixes a lockdep issue in the CAN
raw protocol.

* tag 'linux-can-fixes-for-6.5-20230724' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: raw: fix lockdep issue in raw_release()
  can: gs_usb: gs_can_close(): add missing set of CAN state to CAN_STATE_STOPPED
====================

Link: https://lore.kernel.org/r/20230724150141.766047-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24 17:10:10 -07:00
Jedrzej Jagielski
a333605650 ice: Fix memory management in ice_ethtool_fdir.c
Fix ethtool FDIR logic to not use memory after its release.
In the ice_ethtool_fdir.c file there are 2 spots where code can
refer to pointers which may be missing.

In the ice_cfg_fdir_xtrct_seq() function seg may be freed but
even then may be still used by memcpy(&tun_seg[1], seg, sizeof(*seg)).

In the ice_add_fdir_ethtool() function struct ice_fdir_fltr *input
may first fail to be added via ice_fdir_update_list_entry() but then
may be deleted by ice_fdir_update_list_entry.

Terminate in both cases when the returned value of the previous
operation is other than 0, free memory and don't use it anymore.

Reported-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2208423
Fixes: cac2a27cd9 ("ice: Support IPv4 Flow Director filters")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230721155854.1292805-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24 17:07:51 -07:00
Wei Fang
bb7a015636 net: fec: avoid tx queue timeout when XDP is enabled
According to the implementation of XDP of FEC driver, the XDP path
shares the transmit queues with the kernel network stack, so it is
possible to lead to a tx timeout event when XDP uses the tx queue
pretty much exclusively. And this event will cause the reset of the
FEC hardware.
To avoid timeout in this case, we use the txq_trans_cond_update()
interface to update txq->trans_start to jiffies so that watchdog
won't generate a transmit timeout warning.

Fixes: 6d6b39f180 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://lore.kernel.org/r/20230721083559.2857312-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24 16:45:29 -07:00
Yuanjun Gong
69a184f7a3 ethernet: atheros: fix return value check in atl1e_tso_csum()
in atl1e_tso_csum, it should check the return value of pskb_trim(),
and return an error code if an unexpected value is returned
by pskb_trim().

Fixes: a6a5325239 ("atl1e: Atheros L1E Gigabit Ethernet driver")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230720144219.39285-1-ruc_gongyuanjun@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24 15:43:02 -07:00
Yuanjun Gong
ed96824b71 atheros: fix return value check in atl1_tso()
in atl1_tso(), it should check the return value of pskb_trim(),
and return an error code if an unexpected value is returned
by pskb_trim().

Fixes: 401c0aabec ("atl1: simplify tx packet descriptor")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Link: https://lore.kernel.org/r/20230722142511.12448-1-ruc_gongyuanjun@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24 15:37:56 -07:00
Paul Fertser
421033deb9 wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC)
On DBDC devices the first (internal) phy is only capable of using
2.4 GHz band, and the 5 GHz band is exposed via a separate phy object,
so avoid the false advertising.

Reported-by: Rani Hod <rani.hod@gmail.com>
Closes: https://github.com/openwrt/openwrt/pull/12361
Fixes: 7660a1bd0c ("mt76: mt7615: register ext_phy if DBDC is detected")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230605073408.8699-1-fercerpav@gmail.com
2023-07-24 16:32:14 +03:00
Jiri Benc
b0b672c4d0 vxlan: fix GRO with VXLAN-GPE
In VXLAN-GPE, there may not be an Ethernet header following the VXLAN
header. But in GRO, the vxlan driver calls eth_gro_receive
unconditionally, which means the following header is incorrectly parsed
as Ethernet.

Introduce GPE specific GRO handling.

For better performance, do not check for GPE during GRO but rather
install a different set of functions at setup time.

Fixes: e1e5314de0 ("vxlan: implement GPE")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24 10:47:09 +01:00
Jiri Benc
17a0a64448 vxlan: generalize vxlan_parse_gpe_hdr and remove unused args
The vxlan_parse_gpe_hdr function extracts the next protocol value from
the GPE header and marks GPE bits as parsed.

In order to be used in the next patch, split the function into protocol
extraction and bit marking. The bit marking is meaningful only in
vxlan_rcv; move it directly there.

Rename the function to vxlan_parse_gpe_proto to reflect what it now
does. Remove unused arguments skb and vxflags. Move the function earlier
in the file to allow it to be called from more places in the next patch.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24 10:47:09 +01:00
Yuanjun Gong
8d01da0a1d ethernet: atheros: fix return value check in atl1c_tso_csum()
in atl1c_tso_csum, it should check the return value of pskb_trim(),
and return an error code if an unexpected value is returned
by pskb_trim().

Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24 10:39:11 +01:00
Jiri Benc
94d166c531 vxlan: calculate correct header length for GPE
VXLAN-GPE does not add an extra inner Ethernet header. Take that into
account when calculating header length.

This causes problems in skb_tunnel_check_pmtu, where incorrect PMTU is
cached.

In the collect_md mode (which is the only mode that VXLAN-GPE
supports), there's no magic auto-setting of the tunnel interface MTU.
It can't be, since the destination and thus the underlying interface
may be different for each packet.

So, the administrator is responsible for setting the correct tunnel
interface MTU. Apparently, the administrators are capable enough to
calculate that the maximum MTU for VXLAN-GPE is (their_lower_MTU - 36).
They set the tunnel interface MTU to 1464. If you run a TCP stream over
such interface, it's then segmented according to the MTU 1464, i.e.
producing 1514 bytes frames. Which is okay, this still fits the lower
MTU.

However, skb_tunnel_check_pmtu (called from vxlan_xmit_one) uses 50 as
the header size and thus incorrectly calculates the frame size to be
1528. This leads to ICMP too big message being generated (locally),
PMTU of 1450 to be cached and the TCP stream to be resegmented.

The fix is to use the correct actual header size, especially for
skb_tunnel_check_pmtu calculation.

Fixes: e1e5314de0 ("vxlan: implement GPE")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24 09:37:32 +01:00
Jijie Shao
882481b1c5 net: hns3: fix wrong bw weight of disabled tc issue
In dwrr mode, the default bandwidth weight of disabled tc is set to 0.
If the bandwidth weight is 0, the mode will change to sp.
Therefore, disabled tc default bandwidth weight need changed to 1,
and 0 is returned when query the bandwidth weight of disabled tc.
In addition, driver need stop configure bandwidth weight if tc is disabled.

Fixes: 848440544b ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24 09:36:23 +01:00
Jijie Shao
116d9f732e net: hns3: fix wrong tc bandwidth weight data issue
Currently, the weight saved by the driver is used as the query result,
which may be different from the actual weight in the register.
Therefore, the register value read from the firmware is used
as the query result

Fixes: 0e32038dc8 ("net: hns3: refactor dump tc of debugfs")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24 09:36:23 +01:00
Hao Lan
6d2336120a net: hns3: add tm flush when setting tm
When the tm module is configured with traffic, traffic
may be abnormal. This patch fixes this problem.
Before the tm module is configured, traffic processing
should be stopped. After the tm module is configured,
traffic processing is enabled.

Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24 09:36:23 +01:00
Hao Lan
b27d0232e8 net: hns3: fix the imp capability bit cannot exceed 32 bits issue
Current only the first 32 bits of the capability flag bit are considered.
When the matching capability flag bit is greater than 31 bits,
it will get an error bit.This patch use bitmap to solve this issue.
It can handle each capability bit whitout bit width limit.

Fixes: da77aef9cc ("net: hns3: create common cmdq resource allocate/free/query APIs")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24 09:36:22 +01:00
Jiawen Wu
c7b75bea85 net: phy: marvell10g: fix 88x3310 power up
Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY,
it sometimes does not take effect immediately. And a read of this
register causes the bit not to clear. This will cause mv3310_reset()
to time out, which will fail the config initialization. So add a delay
before the next access.

Fixes: c9cc1c815d ("net: phy: marvell10g: place in powersave mode at probe")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-23 11:47:07 +01:00
Jacob Keller
91896c8acc iavf: check for removal state before IAVF_FLAG_PF_COMMS_FAILED
In iavf_adminq_task(), if the function can't acquire the
adapter->crit_lock, it checks if the driver is removing. If so, it simply
exits without re-enabling the interrupt. This is done to ensure that the
task stops processing as soon as possible once the driver is being removed.

However, if the IAVF_FLAG_PF_COMMS_FAILED is set, the function checks this
before attempting to acquire the lock. In this case, the function exits
early and re-enables the interrupt. This will happen even if the driver is
already removing.

Avoid this, by moving the check to after the adapter->crit_lock is
acquired. This way, if the driver is removing, we will not re-enable the
interrupt.

Fixes: fc2e6b3b13 ("iavf: Rework mutexes for better synchronisation")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-21 08:49:37 -07:00