Commit Graph

1152832 Commits

Author SHA1 Message Date
David Arinzon
a8ee104f98 net: ena: Update NUMA TPH hint register upon NUMA node update
The device supports a PCIe optimization hint, which indicates on
which NUMA the queue is currently processed. This hint is utilized
by PCIe in order to reduce its access time by accessing the
correct NUMA resources and maintaining cache coherence.

The driver calls the register update for the hint (called TPH -
TLP Processing Hint) during the NAPI loop.

Though the update is expected upon a NUMA change (when a queue
is moved from one NUMA to the other), the current logic performs
a register update when the queue is moved to a different CPU,
but the CPU is not necessarily in a different NUMA.

The changes include:
1. Performing the TPH update only when the queue has switched
a NUMA node.
2. Moving the TPH update call to be triggered only when NAPI was
scheduled from interrupt context, as opposed to a busy-polling loop.
This is due to the fact that during busy-polling, the frequency
of CPU switches for a particular queue is significantly higher,
thus, the likelihood to switch NUMA is much higher. Therefore,
providing the frequent updates to the device upon a NUMA update
are unlikely to be beneficial.

Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:43:44 +00:00
David Arinzon
e712f3e492 net: ena: Set default value for RX interrupt moderation
RX ring can be NULL in XDP use cases where only TX queues
are configured. In this scenario, the RX interrupt moderation
value sent to the device remains in its default value of 0.

In this change, setting the default value of the RX interrupt
moderation to be the same as of the TX.

Fixes: 548c4940b9 ("net: ena: Implement XDP_TX action")
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:43:44 +00:00
David Arinzon
c7062aaee0 net: ena: Fix rx_copybreak value update
Make the upper bound on rx_copybreak tighter, by
making sure it is smaller than the minimum of mtu and
ENA_PAGE_SIZE. With the current upper bound of mtu,
rx_copybreak can be larger than a page. Such large
rx_copybreak will not bring any performance benefit to
the user and therefore makes no sense.

In addition, the value update was only reflected in
the adapter structure, but not applied for each ring,
causing it to not take effect.

Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Osama Abboud <osamaabb@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:43:44 +00:00
David Arinzon
59811faa2c net: ena: Use bitmask to indicate packet redirection
Redirecting packets with XDP Redirect is done in two phases:
1. A packet is passed by the driver to the kernel using
   xdp_do_redirect().
2. After finishing polling for new packets the driver lets the kernel
   know that it can now process the redirected packet using
   xdp_do_flush_map().
   The packets' redirection is handled in the napi context of the
   queue that called xdp_do_redirect()

To avoid calling xdp_do_flush_map() each time the driver first checks
whether any packets were redirected, using
	xdp_flags |= xdp_verdict;
and
	if (xdp_flags & XDP_REDIRECT)
	    xdp_do_flush_map()

essentially treating XDP instructions as a bitmask, which isn't the case:
    enum xdp_action {
	    XDP_ABORTED = 0,
	    XDP_DROP,
	    XDP_PASS,
	    XDP_TX,
	    XDP_REDIRECT,
    };

Given the current possible values of xdp_action, the current design
doesn't have a bug (since XDP_REDIRECT = 100b), but it is still
flawed.

This patch makes the driver use a bitmask instead, to avoid future
issues.

Fixes: a318c70ad1 ("net: ena: introduce XDP redirect implementation")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:43:44 +00:00
David Arinzon
c7f5e34d90 net: ena: Account for the number of processed bytes in XDP
The size of packets that were forwarded or dropped by XDP wasn't added
to the total processed bytes statistic.

Fixes: 548c4940b9 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:43:43 +00:00
David Arinzon
9c9e539956 net: ena: Don't register memory info on XDP exchange
Since the queues aren't destroyed when we only exchange XDP programs,
there's no need to re-register them again.

Fixes: 548c4940b9 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:43:43 +00:00
David Arinzon
332b49ff63 net: ena: Fix toeplitz initial hash value
On driver initialization, RSS hash initial value is set to zero,
instead of the default value. This happens because we pass NULL as
the RSS key parameter, which caused us to never initialize
the RSS hash value.

This patch fixes it by making sure the initial value is set, no matter
what the value of the RSS key is.

Fixes: 91a65b7d3e ("net: ena: fix potential crash when rxfh key is NULL")
Signed-off-by: Nati Koler <nkoler@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:43:43 +00:00
Po-Hsu Lin
1573c68820 selftests: net: fix cmsg_so_mark.sh test hang
This cmsg_so_mark.sh test will hang on non-amd64 systems because of the
infinity loop for argument parsing in cmsg_sender.

Variable "o" in cs_parse_args() for taking getopt() should be an int,
otherwise it will be 255 when getopt() returns -1 on non-amd64 system
and thus causing infinity loop.

Link: https://lore.kernel.org/lkml/CA+G9fYsM2k7mrF7W4V_TrZ-qDauWM394=8yEJ=-t1oUg8_40YA@mail.gmail.com/t/
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:37:26 +00:00
David S. Miller
a512807c24 Merge tag 'mlx5-fixes-2022-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
mlx5-fixes-2022-12-28
2022-12-30 07:33:55 +00:00
Jiguang Xiao
d530ece70f net: amd-xgbe: add missed tasklet_kill
The driver does not call tasklet_kill in several places.
Add the calls to fix it.

Fixes: 85b85c8534 ("amd-xgbe: Re-issue interrupt if interrupt status not cleared")
Signed-off-by: Jiguang Xiao <jiguang.xiao@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:33:05 +00:00
Jian Shen
fec7352117 net: hns3: refine the handling for VF heartbeat
Currently, the PF check the VF alive by the KEEP_ALVE
mailbox from VF. VF keep sending the mailbox per 2
seconds. Once PF lost the mailbox for more than 8
seconds, it will regards the VF is abnormal, and stop
notifying the state change to VF, include link state,
vf mac, reset, even though it receives the KEEP_ALIVE
mailbox again. It's inreasonable.

This patch fixes it. PF will record the state change which
need to notify VF when lost the VF's KEEP_ALIVE mailbox.
And notify VF when receive the mailbox again. Introduce a
new flag HCLGE_VPORT_STATE_INITED, used to distinguish the
case whether VF driver loaded or not. For VF will query
these states when initializing, so it's unnecessary to
notify it in this case.

Fixes: aa5c4f175b ("net: hns3: add reset handling for VF when doing PF reset")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:32:00 +00:00
Uwe Kleine-König
af691c94d0 net: ethernet: freescale: enetc: Drop empty platform remove function
A remove callback just returning 0 is equivalent to no remove callback
at all. So drop the useless function.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:28:49 +00:00
Uwe Kleine-König
6b57bffa5f net: ethernet: broadcom: bcm63xx_enet: Drop empty platform remove function
A remove callback just returning 0 is equivalent to no remove callback
at all. So drop the useless function.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:28:49 +00:00
David S. Miller
0798311cfd Merge branch 'tcp-bhash2-fixes'
Kuniyuki Iwashima says:

===================
tcp: Fix bhash2 and TIME_WAIT regression.

We forgot to add twsk to bhash2.  Therefore TIME_WAIT sockets cannot
prevent bind() to the same local address and port.

Changes:
  v1:
    * Patch 1:
      * Add tw_bind2_node in inet_timewait_sock instead of
        moving sk_bind2_node from struct sock to struct
	sock_common.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:25:53 +00:00
Kuniyuki Iwashima
2c042e8e54 tcp: Add selftest for bind() and TIME_WAIT.
bhash2 split the bind() validation logic into wildcard and non-wildcard
cases.  Let's add a test to catch future regression.

Before the previous patch:

  # ./bind_timewait
  TAP version 13
  1..2
  # Starting 2 tests from 3 test cases.
  #  RUN           bind_timewait.localhost.1 ...
  # bind_timewait.c:87:1:Expected ret (0) == -1 (-1)
  # 1: Test terminated by assertion
  #          FAIL  bind_timewait.localhost.1
  not ok 1 bind_timewait.localhost.1
  #  RUN           bind_timewait.addrany.1 ...
  #            OK  bind_timewait.addrany.1
  ok 2 bind_timewait.addrany.1
  # FAILED: 1 / 2 tests passed.
  # Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0

After:

  # ./bind_timewait
  TAP version 13
  1..2
  # Starting 2 tests from 3 test cases.
  #  RUN           bind_timewait.localhost.1 ...
  #            OK  bind_timewait.localhost.1
  ok 1 bind_timewait.localhost.1
  #  RUN           bind_timewait.addrany.1 ...
  #            OK  bind_timewait.addrany.1
  ok 2 bind_timewait.addrany.1
  # PASSED: 2 / 2 tests passed.
  # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:25:53 +00:00
Kuniyuki Iwashima
936a192f97 tcp: Add TIME_WAIT sockets in bhash2.
Jiri Slaby reported regression of bind() with a simple repro. [0]

The repro creates a TIME_WAIT socket and tries to bind() a new socket
with the same local address and port.  Before commit 28044fc1d4 ("net:
Add a bhash2 table hashed by port and address"), the bind() failed with
-EADDRINUSE, but now it succeeds.

The cited commit should have put TIME_WAIT sockets into bhash2; otherwise,
inet_bhash2_conflict() misses TIME_WAIT sockets when validating bind()
requests if the address is not a wildcard one.

The straight option is to move sk_bind2_node from struct sock to struct
sock_common to add twsk to bhash2 as implemented as RFC. [1]  However, the
binary layout change in the struct sock could affect performances moving
hot fields on different cachelines.

To avoid that, we add another TIME_WAIT list in inet_bind2_bucket and check
it while validating bind().

[0]: https://lore.kernel.org/netdev/6b971a4e-c7d8-411e-1f92-fda29b5b2fb9@kernel.org/
[1]: https://lore.kernel.org/netdev/20221221151258.25748-2-kuniyu@amazon.com/

Fixes: 28044fc1d4 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-30 07:25:52 +00:00
Eli Cohen
4d1c1379d7 net/mlx5: Lag, fix failure to cancel delayed bond work
Commit 0d4e8ed139 ("net/mlx5: Lag, avoid lockdep warnings")
accidentally removed a call to cancel delayed bond work thus it may
cause queued delay to expire and fall on an already destroyed work
queue.

Fix by restoring the call cancel_delayed_work_sync() before
destroying the workqueue.

This prevents call trace such as this:

[  329.230417] BUG: kernel NULL pointer dereference, address: 0000000000000000
 [  329.231444] #PF: supervisor write access in kernel mode
 [  329.232233] #PF: error_code(0x0002) - not-present page
 [  329.233007] PGD 0 P4D 0
 [  329.233476] Oops: 0002 [#1] SMP
 [  329.234012] CPU: 5 PID: 145 Comm: kworker/u20:4 Tainted: G OE      6.0.0-rc5_mlnx #1
 [  329.235282] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 [  329.236868] Workqueue: mlx5_cmd_0000:08:00.1 cmd_work_handler [mlx5_core]
 [  329.237886] RIP: 0010:_raw_spin_lock+0xc/0x20
 [  329.238585] Code: f0 0f b1 17 75 02 f3 c3 89 c6 e9 6f 3c 5f ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 02 f3 c3 89 c6 e9 45 3c 5f ff 0f 1f 44 00 00 0f 1f
 [  329.241156] RSP: 0018:ffffc900001b0e98 EFLAGS: 00010046
 [  329.241940] RAX: 0000000000000000 RBX: ffffffff82374ae0 RCX: 0000000000000000
 [  329.242954] RDX: 0000000000000001 RSI: 0000000000000014 RDI: 0000000000000000
 [  329.243974] RBP: ffff888106ccf000 R08: ffff8881004000c8 R09: ffff888100400000
 [  329.244990] R10: 0000000000000000 R11: ffffffff826669f8 R12: 0000000000002000
 [  329.246009] R13: 0000000000000005 R14: ffff888100aa7ce0 R15: ffff88852ca80000
 [  329.247030] FS:  0000000000000000(0000) GS:ffff88852ca80000(0000) knlGS:0000000000000000
 [  329.248260] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [  329.249111] CR2: 0000000000000000 CR3: 000000016d675001 CR4: 0000000000770ee0
 [  329.250133] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [  329.251152] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [  329.252176] PKRU: 55555554

Fixes: 0d4e8ed139 ("net/mlx5: Lag, avoid lockdep warnings")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:51 -08:00
Maor Dickman
e54638a838 net/mlx5e: Set geneve_tlv_option_0_exist when matching on geneve option
The cited patch added support of matching on geneve option by setting
geneve_tlv_option_0_data mask and key but didn't set geneve_tlv_option_0_exist
bit which is required on some HWs when matching geneve_tlv_option_0_data parameter,
this may cause in some cases for packets to wrongly match on rules with different
geneve option.

Example of such case is packet with geneve_tlv_object class=789 and data=456
will wrongly match on rule with match geneve_tlv_object class=123 and data=456.

Fix it by setting geneve_tlv_option_0_exist bit when supported by the HW when matching
on geneve_tlv_option_0_data parameter.

Fixes: 9272e3df30 ("net/mlx5e: Geneve, Add support for encap/decap flows offload")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:51 -08:00
Adham Faris
1e267ab88d net/mlx5e: Fix hw mtu initializing at XDP SQ allocation
Current xdp xmit functions logic (mlx5e_xmit_xdp_frame_mpwqe or
mlx5e_xmit_xdp_frame), validates xdp packet length by comparing it to
hw mtu (configured at xdp sq allocation) before xmiting it. This check
does not account for ethernet fcs length (calculated and filled by the
nic). Hence, when we try sending packets with length > (hw-mtu -
ethernet-fcs-size), the device port drops it and tx_errors_phy is
incremented. Desired behavior is to catch these packets and drop them
by the driver.

Fix this behavior in XDP SQ allocation function (mlx5e_alloc_xdpsq) by
subtracting ethernet FCS header size (4 Bytes) from current hw mtu
value, since ethernet FCS is calculated and written to ethernet frames
by the nic.

Fixes: d8bec2b29a ("net/mlx5e: Support bpf_xdp_adjust_head()")
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:51 -08:00
Chris Mi
2951b2e142 net/mlx5e: Always clear dest encap in neigh-update-del
The cited commit introduced a bug for multiple encapsulations flow.
If one dest encap becomes invalid, the flow is set slow path flag.
But when other dests encap become invalid, they are not cleared due
to slow path flag of the flow. When neigh-update-add is running, it
will use invalid encap.

Fix it by checking slow path flag after clearing dest encap.

Fixes: 9a5f9cc794 ("net/mlx5e: Fix possible use-after-free deleting fdb rule")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:50 -08:00
Chris Mi
849190e3e4 net/mlx5e: CT: Fix ct debugfs folder name
Need to use sprintf to build a string instead of sscanf. Otherwise
dirname is null and both "ct_nic" and "ct_fdb" won't be created.
But its redundant anyway as driver could be in switchdev mode but
still add nic rules. So use "ct" as folder name.

Fixes: 77422a8f6f ("net/mlx5e: CT: Add ct driver counters")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:50 -08:00
Tariq Toukan
f8c18a5749 net/mlx5e: Fix RX reporter for XSK RQs
RX reporter mistakenly reads from the regular (inactive) RQ
when XSK RQ is active. Fix it here.

Fixes: 3db4c85cde ("net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:50 -08:00
Dragos Tatulea
b12d581e83 net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by default
mlx5e_build_nic_params will turn CQE compression on if the hardware
capability is enabled and the slow_pci_heuristic condition is detected.
As IPoIB doesn't support CQE compression, make sure to disable the
feature in the IPoIB profile init.

Please note that the feature is not exposed to the user for IPoIB
interfaces, so it can't be subsequently turned on.

Fixes: b797a684b0 ("net/mlx5e: Enable CQE compression when PCI is slower than link")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:50 -08:00
Shay Drory
c4ad5f2bda net/mlx5: Fix RoCE setting at HCA level
mlx5 PF can disable RoCE for its VFs and SFs. In such case RoCE is
marked as unsupported on those VFs/SFs.
The cited patch added an option for disable (and enable) RoCE at HCA
level. However, that commit didn't check whether RoCE is supported on
the HCA and enabled user to try and set RoCE to on.
Fix it by checking whether the HCA supports RoCE.

Fixes: fbfa97b4d7 ("net/mlx5: Disable roce at HCA level")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:50 -08:00
Shay Drory
9078e843ef net/mlx5: Avoid recovery in probe flows
Currently, recovery is done without considering whether the device is
still in probe flow.
This may lead to recovery before device have finished probed
successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
using functionality that is loaded only by mlx5_init_one(), and there
is no point in running recovery without mlx5_init_one() finished
successfully.

Fix it by waiting for probe flow to finish and checking whether the
device is probed before trying to perform recovery.

Fixes: 51d138c261 ("net/mlx5: Fix health error state handling")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:50 -08:00
Shay Drory
44aee8ea15 net/mlx5: Fix io_eq_size and event_eq_size params validation
io_eq_size and event_eq_size params are of param type
DEVLINK_PARAM_TYPE_U32. But, the validation callback is addressing them
as DEVLINK_PARAM_TYPE_U16.

This cause mismatch in validation in big-endian systems, in which
values in range were rejected while 268500991 was accepted.
Fix it by checking the U32 value in the validation callback.

Fixes: 0844fa5f7b ("net/mlx5: Let user configure io_eq_size param")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:49 -08:00
Jiri Pirko
2a35b2c2e6 net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path
There are two cleanup calls missing in mlx5_init_once() error path.
Add them making the error path flow to be the same as
mlx5_cleanup_once().

Fixes: 52ec462eca ("net/mlx5: Add reserved-gids support")
Fixes: 7c39afb394 ("net/mlx5: PTP code migration to driver core section")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:49 -08:00
Moshe Shemesh
1f0ae22ab4 net/mlx5: E-Switch, properly handle ingress tagged packets on VST
Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already
present in the frame. Previous VST mode behavior was to drop packets or
override existing tag, depending on the device version.

In this patch we fix this behavior by correctly building the HW steering
rule with a push vlan action, or for older devices we ask the FW to stack
the vlan when a vlan is already present.

Fixes: 07bab95026 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: dfcb1ed3c3 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:49 -08:00
Pedro Tammela
40cab44b90 net/sched: fix retpoline wrapper compilation on configs without tc filters
Rudi reports a compilation failure on x86_64 when CONFIG_NET_CLS or
CONFIG_NET_CLS_ACT is not set but CONFIG_RETPOLINE is set.
A misplaced '#endif' was causing the issue.

Fixes: 7f0e810220 ("net/sched: add retpoline wrapper for tc")

Tested-by: Rudi Heitbaum <rudi@heitbaum.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 12:11:32 +00:00
Xuezhi Zhang
c2052189f1 s390/qeth: convert sysfs snprintf to sysfs_emit
Follow the advice of the Documentation/filesystems/sysfs.rst
and show() should only use sysfs_emit() or sysfs_emit_at()
when formatting the value to be returned to user space.

Signed-off-by: Xuezhi Zhang <zhangxuezhi1@coolpad.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 12:10:42 +00:00
David S. Miller
0e3d18359a Merge branch 'r8169-fixes'
Chunhao Lin says:

====================
r8169: fix dmar pte write access is not set error

This series fixes dmar pte write access is not set error.

Chunhao Lin (2):
  r8169: move rtl_wol_enable_rx() and rtl_prepare_power_down()
  r8169: fix dmar pte write access is not set error

v2:
-update commit message
-adjust the code according to current kernel code
v3:
-update title and commit message
-split the patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 11:58:08 +00:00
Chunhao Lin
bb41c13c05 r8169: fix dmar pte write access is not set error
When close device, if wol is enabled, rx will be enabled. When open
device it will cause rx packet to be dma to the wrong memory address
after pci_set_master() and system log will show blow messages.

DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write] Request device [02:00.0] PASID ffffffff fault addr
ffdd4000 [fault reason 05] PTE Write access is not set

In this patch, driver disable tx/rx when close device. If wol is
enabled, only enable rx filter and disable rxdv_gate(if support) to
let hardware only receive packet to fifo but not to dma it.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 11:58:08 +00:00
Chunhao Lin
ad425666a1 r8169: move rtl_wol_enable_rx() and rtl_prepare_power_down()
There is no functional change. Moving these two functions for following
patch "r8169: fix dmar pte write access is not set error".

Signed-off-by: Chunhao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 11:58:07 +00:00
David S. Miller
e71460d446 Merge branch 'ethtool_gert_phy_stats-fixes'
Daniil Tatianin says:

====================
net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers

This series fixes a potential NULL dereference in ethtool_get_phy_stats
while also attempting to refactor/split said function into multiple
helpers so that it's easier to reason about what's going on.

I've taken Andrew Lunn's suggestions on the previous version of this
patch and added a bit of my own.

Changes since v1:
- Remove an extra newline in the first patch
- Move WARN_ON_ONCE into the if check as it already returns the
  result of the comparison
- Actually split ethtool_get_phy_stats instead of attempting to
  refactor it
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 11:55:24 +00:00
Daniil Tatianin
201ed315f9 net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers
So that it's easier to follow and make sense of the branching and
various conditions.

Stats retrieval has been split into two separate functions
ethtool_get_phy_stats_phydev & ethtool_get_phy_stats_ethtool.
The former attempts to retrieve the stats using phydev & phy_ops, while
the latter uses ethtool_ops.

Actual n_stats validation & array allocation has been moved into a new
ethtool_vzalloc_stats_array helper.

This also fixes a potential NULL dereference of
ops->get_ethtool_phy_stats where it was getting called in an else branch
unconditionally without making sure it was actually present.

Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.

Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 11:55:24 +00:00
Daniil Tatianin
fd4778581d net/ethtool/ioctl: remove if n_stats checks from ethtool_get_phy_stats
Now that we always early return if we don't have any stats we can remove
these checks as they're no longer necessary.

Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 11:55:24 +00:00
Daniil Tatianin
9deb1e9fb8 net/ethtool/ioctl: return -EOPNOTSUPP if we have no phy stats
It's not very useful to copy back an empty ethtool_stats struct and
return 0 if we didn't actually have any stats. This also allows for
further simplification of this function in the future commits.

Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 11:55:24 +00:00
David S. Miller
8ac718cc0e Merge branch 'bnxt_en-fixes'
Michael Chan says:

====================
bnxt_en: Bug fixes

This series fixes a devlink bug and several XDP related bugs.  The
devlink bug causes a kernel crash on VF devices.  The XDP driver
patches fix and clean up the RX XDP path and re-enable header-data
split that was disabled by mistake when adding the XDP multi-buffer
support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 10:16:57 +00:00
Michael Chan
a056ebcc30 bnxt_en: Fix HDS and jumbo thresholds for RX packets
The recent XDP multi-buffer feature has introduced regressions in the
setting of HDS and jumbo thresholds.  HDS was accidentally disabled in
the nornmal mode without XDP.  This patch restores jumbo HDS placement
when not in XDP mode.  In XDP multi-buffer mode, HDS should be disabled
and the jumbo threshold should be set to the usable page size in the
first page buffer.

Fixes: 3286123619 ("bnxt: change receive ring space parameters")
Reviewed-by: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 10:16:57 +00:00
Michael Chan
1abeacc197 bnxt_en: Fix first buffer size calculations for XDP multi-buffer
The size of the first buffer is always page size, and the useable
space is the page size minus the offset and the skb_shared_info size.
Make sure SKB and XDP buf sizes match so that the skb_shared_info
is at the same offset seen from the SKB and XDP_BUF.

build_skb() should be passed PAGE_SIZE.  xdp_init_buff() should
be passed PAGE_SIZE as well.  xdp_get_shared_info_from_buff() will
automatically deduct the skb_shared_info size if the XDP buffer
has frags.  There is no need to keep bp->xdp_has_frags.

Change BNXT_PAGE_MODE_BUF_SIZE to BNXT_MAX_PAGE_MODE_MTU_SBUF
since this constant is really the MTU with ethernet header size
subtracted.

Also fix the BNXT_MAX_PAGE_MODE_MTU macro with proper parentheses.

Fixes: 3286123619 ("bnxt: change receive ring space parameters")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 10:16:57 +00:00
Michael Chan
9b3e607871 bnxt_en: Fix XDP RX path
The XDP program can change the starting address of the RX data buffer and
this information needs to be passed back from bnxt_rx_xdp() to
bnxt_rx_pkt() for the XDP_PASS case so that the SKB can point correctly
to the modified buffer address.  Add back the data_ptr parameter to
bnxt_rx_xdp() to make this work.

Fixes: b231c3f341 ("bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 10:16:57 +00:00
Michael Chan
bbfc17e50b bnxt_en: Simplify bnxt_xdp_buff_init()
bnxt_xdp_buff_init() does not modify the data_ptr or the len parameters,
so no need to pass in the addresses of these parameters.

Fixes: b231c3f341 ("bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 10:16:57 +00:00
Vikas Gupta
0020ae2a4a bnxt_en: fix devlink port registration to netdev
We don't register a devlink port in case of a VF so
avoid setting the devlink pointer to netdev.
Also, SET_NETDEV_DEVLINK_PORT has to be moved
so that we determine whether the device is PF/VF first.

This fixes the NULL pointer dereference of devlink_port->devlink
when creating VFs:

BUG: kernel NULL pointer dereference, address: 0000000000000160
PGD 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 14 PID: 388 Comm: kworker/14:1 Kdump: loaded Not tainted 6.1.0-rc8 #5
Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.3.8 08/31/2021
Workqueue: events work_for_cpu_fn
RIP: 0010:devlink_nl_port_handle_size+0xb/0x50
Code: 83 c4 10 5b 5d c3 cc cc cc cc b8 a6 ff ff ff eb de e8 c9 59 21 00 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 53 48 8b 47 20 <48> 8b a8 60 01 00 00 48 8b 45 60 48 8b 38 e8 92 90 1a 00 48 8b 7d
RSP: 0018:ff4fe5394846fcd8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000794 RCX: 0000000000000000
RDX: ff1f129683a30a40 RSI: 0000000000000008 RDI: ff1f1296bb496188
RBP: 0000000000000334 R08: 0000000000000cc0 R09: 0000000000000000
R10: ff1f1296bb494298 R11: ffffffffffffffc0 R12: 0000000000000000
R13: 0000000000000000 R14: ff1f1296bb494000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ff1f129e5fa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000160 CR3: 000000131f610006 CR4: 0000000000771ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 if_nlmsg_size+0x14a/0x220
 rtmsg_ifinfo_build_skb+0x3c/0x100
 rtmsg_ifinfo+0x9c/0xc0
 register_netdevice+0x59d/0x670
 register_netdev+0x1c/0x40
 bnxt_init_one+0x674/0xa60 [bnxt_en]
 local_pci_probe+0x42/0x80
 work_for_cpu_fn+0x13/0x20
 process_one_work+0x1e2/0x3b0
 ? rescuer_thread+0x390/0x390
 worker_thread+0x1c4/0x3a0
 ? rescuer_thread+0x390/0x390
 kthread+0xd6/0x100
 ? kthread_complete_and_exit+0x20/0x20

Fixes: ac73d4bf2c ("net: make drivers to use SET_NETDEV_DEVLINK_PORT to set devlink_port")
Cc: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 10:16:56 +00:00
David S. Miller
3ec3ebec7c Merge branch 'rswitch-fixes'
Yoshihiro Shimoda says:

====================
net: ethernet: renesas: rswitch: Fix minor issues

This patch series is based on v6.2-rc2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 10:09:50 +00:00
Yoshihiro Shimoda
bd2adfe3b3 net: ethernet: renesas: rswitch: Fix getting mac address from device tree
To get mac address from device tree which is from each ethernet-port,
fix the first argument of of_get_ethdev_address().

Fixes: 3590918b5d ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 10:09:50 +00:00
Yoshihiro Shimoda
8e6a8d7a3d net: ethernet: renesas: rswitch: Fix error path in renesas_eth_sw_probe()
If rswitch_init() returns non-zero and this driver is re-probed,
the following error happens:

    renesas_eth_sw e6880000.ethernet: Unbalanced pm_runtime_enable!

So, fix error path in renesas_eth_sw_probe().

Fixes: 3590918b5d ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 10:09:49 +00:00
David S. Miller
81852018f2 Merge branch 'netdev-doc-defaq'
Jakub Kicinski says:

====================
netdev doc de-FAQization

We have outgrown the FAQ format for our process doc.
I often find myself struggling to locate information in this doc,
because the questions do not serve well as section headers.
Reformat the document.

v2: update the headers
v1: https://lore.kernel.org/all/20221221184007.1170384-1-kuba@kernel.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 10:06:06 +00:00
Jakub Kicinski
ff249be5cc docs: netdev: convert to a non-FAQ document
The netdev-FAQ document has grown over the years to the point
where finding information in it is somewhat challenging.
The length of the questions prevents readers from locating
content that's relevant at a glance.

Convert to a more standard documentation format with sections
and sub-sections rather than questions and answers.

The content edits are limited to what's necessary to change
the format, and very minor clarifications.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 10:06:06 +00:00
Jakub Kicinski
f4ef681115 docs: netdev: reshuffle sections in prep for de-FAQization
Subsequent changes will reformat the doc away from FAQ.
To make that more readable perform the pure section moves now.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 10:06:06 +00:00
David Howells
0e50d99990 rxrpc: Fix a couple of potential use-after-frees
At the end of rxrpc_recvmsg(), if a call is found, the call is put and then
a trace line is emitted referencing that call in a couple of places - but
the call may have been deallocated by the time those traces happen.

Fix this by stashing the call debug_id in a variable and passing that to
the tracepoint rather than the call pointer.

Fixes: 849979051c ("rxrpc: Add a tracepoint to follow what recvmsg does")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 09:59:23 +00:00