Commit Graph

1249026 Commits

Author SHA1 Message Date
Christian Marangi
355c6dc37e dt-bindings: net: phy: Document LED inactive high impedance mode
Document LED inactive high impedance mode to set the LED to require high
impedance configuration to be turned OFF.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Lee Jones <lee@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240125203702.4552-3-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 21:03:31 -08:00
Christian Marangi
c94d178313 dt-bindings: net: phy: Make LED active-low property common
Move LED active-low property to common.yaml. This property is currently
defined multiple times by bcm LEDs. This property will now be supported
in a generic way for PHY LEDs with the use of a generic function.

With active-low bool property not defined, active-high is always
assumed.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Lee Jones <lee@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240125203702.4552-2-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 21:03:31 -08:00
Kees Cook
5642c82b94 bnx2x: Fix firmware version string character counts
A potential string truncation was reported in bnx2x_fill_fw_str(),
when a long bp->fw_ver and a long phy_fw_ver might coexist, but seems
unlikely with real-world hardware.

Use scnprintf() to indicate the intent that truncations are tolerated.

While reading this code, I found a collection of various buffer size
counting issues. None looked like they might lead to a buffer overflow
with current code (the small buffers are 20 bytes and might only ever
consume 10 bytes twice with a trailing %NUL). However, early truncation
(due to a %NUL in the middle of the string) might be happening under
likely rare conditions. Regardless fix the formatters and related
functions:

- Switch from a separate strscpy() to just adding an additional "%s" to
  the format string that immediately follows it in bnx2x_fill_fw_str().
- Use sizeof() universally instead of using unbound defines.
- Fix bnx2x_7101_format_ver() and bnx2x_null_format_ver() to report the
  number of characters written, not including the trailing %NUL (as
  already done with the other firmware formatting functions).
- Require space for at least 1 byte in bnx2x_get_ext_phy_fw_version()
  for the trailing %NUL.
- Correct the needed buffer size in bnx2x_3_seq_format_ver().

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401260858.jZN6vD1k-lkp@intel.com/
Cc: Ariel Elior <aelior@marvell.com>
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: Manish Chopra <manishc@marvell.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240126041044.work.220-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 21:01:30 -08:00
Li Zhijian
8d0293302d drivers/ptp: Convert snprintf to sysfs_emit
Per filesystems/sysfs.rst, show() should only use sysfs_emit()
or sysfs_emit_at() when formatting the value to be returned to user space.

coccinelle complains that there are still a couple of functions that use
snprintf(). Convert them to sysfs_emit().

> ./drivers/ptp/ptp_sysfs.c:27:8-16: WARNING: please use sysfs_emit

No functional change intended

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20240125015329.123023-1-lizhijian@fujitsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 20:35:17 -08:00
Jakub Kicinski
1cf05e2508 Merge branch 'af_unix-random-improvements-for-gc'
Kuniyuki Iwashima says:

====================
af_unix: Random improvements for GC.

If more than 16000 inflight AF_UNIX sockets exist on a host, each
sendmsg() will be forced to wait for unix_gc() even if a process
is not sending any FD.

This series tries not to impose such a penalty on sane users who
do not send AF_UNIX FDs or do not have inflight sockets more than
SCM_MAX_FD * 8.

The first patch can be backported to -stable.

Cleanup patches for commit 69db702c83 ("io_uring/af_unix: disable
sending io_uring over sockets") and large refactoring of GC will
be followed later.

v4: https://lore.kernel.org/netdev/20231219030102.27509-1-kuniyu@amazon.com/
v3: https://lore.kernel.org/netdev/20231218075020.60826-1-kuniyu@amazon.com/
v2: https://lore.kernel.org/netdev/20231123014747.66063-1-kuniyu@amazon.com/
v1: https://lore.kernel.org/netdev/20231122013629.28554-1-kuniyu@amazon.com/
====================

Link: https://lore.kernel.org/r/20240123170856.41348-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 20:34:28 -08:00
Kuniyuki Iwashima
d9f21b3613 af_unix: Try to run GC async.
If more than 16000 inflight AF_UNIX sockets exist and the garbage
collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc().
Also, they wait for unix_gc() to complete.

In unix_gc(), all inflight AF_UNIX sockets are traversed at least once,
and more if they are the GC candidate.  Thus, sendmsg() significantly
slows down with too many inflight AF_UNIX sockets.

However, if a process sends data with no AF_UNIX FD, the sendmsg() call
does not need to wait for GC.  After this change, only the process that
meets the condition below will be blocked under such a situation.

  1) cmsg contains AF_UNIX socket
  2) more than 32 AF_UNIX sent by the same user are still inflight

Note that even a sendmsg() call that does not meet the condition but has
AF_UNIX FD will be blocked later in unix_scm_to_skb() by the spinlock,
but we allow that as a bonus for sane users.

The results below are the time spent in unix_dgram_sendmsg() sending 1
byte of data with no FD 4096 times on a host where 32K inflight AF_UNIX
sockets exist.

Without series: the sane sendmsg() needs to wait gc unreasonably.

  $ sudo /usr/share/bcc/tools/funclatency -p 11165 unix_dgram_sendmsg
  Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end.
  ^C
       nsecs               : count     distribution
  [...]
      524288 -> 1048575    : 0        |                                        |
     1048576 -> 2097151    : 3881     |****************************************|
     2097152 -> 4194303    : 214      |**                                      |
     4194304 -> 8388607    : 1        |                                        |

  avg = 1825567 nsecs, total: 7477526027 nsecs, count: 4096

With series: the sane sendmsg() can finish much faster.

  $ sudo /usr/share/bcc/tools/funclatency -p 8702  unix_dgram_sendmsg
  Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end.
  ^C
       nsecs               : count     distribution
  [...]
         128 -> 255        : 0        |                                        |
         256 -> 511        : 4092     |****************************************|
         512 -> 1023       : 2        |                                        |
        1024 -> 2047       : 0        |                                        |
        2048 -> 4095       : 0        |                                        |
        4096 -> 8191       : 1        |                                        |
        8192 -> 16383      : 1        |                                        |

  avg = 410 nsecs, total: 1680510 nsecs, count: 4096

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240123170856.41348-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 20:34:25 -08:00
Kuniyuki Iwashima
8b90a9f819 af_unix: Run GC on only one CPU.
If more than 16000 inflight AF_UNIX sockets exist and the garbage
collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc().
Also, they wait for unix_gc() to complete.

In unix_gc(), all inflight AF_UNIX sockets are traversed at least once,
and more if they are the GC candidate.  Thus, sendmsg() significantly
slows down with too many inflight AF_UNIX sockets.

There is a small window to invoke multiple unix_gc() instances, which
will then be blocked by the same spinlock except for one.

Let's convert unix_gc() to use struct work so that it will not consume
CPUs unnecessarily.

Note WRITE_ONCE(gc_in_progress, true) is moved before running GC.
If we leave the WRITE_ONCE() as is and use the following test to
call flush_work(), a process might not call it.

    CPU 0                                     CPU 1
    ---                                       ---
                                              start work and call __unix_gc()
    if (work_pending(&unix_gc_work) ||        <-- false
        READ_ONCE(gc_in_progress))            <-- false
            flush_work();                     <-- missed!
	                                      WRITE_ONCE(gc_in_progress, true)

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240123170856.41348-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 20:34:25 -08:00
Kuniyuki Iwashima
5b17307bd0 af_unix: Return struct unix_sock from unix_get_socket().
Currently, unix_get_socket() returns struct sock, but after calling
it, we always cast it to unix_sk().

Let's return struct unix_sock from unix_get_socket().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240123170856.41348-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 20:34:25 -08:00
Kuniyuki Iwashima
97af84a6bb af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
When touching unix_sk(sk)->inflight, we are always under
spin_lock(&unix_gc_lock).

Let's convert unix_sk(sk)->inflight to the normal unsigned long.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240123170856.41348-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 20:34:24 -08:00
Kuniyuki Iwashima
31e0320711 af_unix: Annotate data-race of gc_in_progress in wait_for_unix_gc().
gc_in_progress is changed under spin_lock(&unix_gc_lock),
but wait_for_unix_gc() reads it locklessly.

Let's use READ_ONCE().

Fixes: 5f23b73496 ("net: Fix soft lockups/OOM issues w/ unix garbage collector")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240123170856.41348-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 20:34:24 -08:00
Arınç ÜNAL
fb4bb62aaa net: dsa: mt7530: select MEDIATEK_GE_PHY for NET_DSA_MT7530_MDIO
Quoting from commit 4223f86512 ("net: dsa: mt7530: make NET_DSA_MT7530
select MEDIATEK_GE_PHY"):

Make MediaTek MT753x DSA driver enable MediaTek Gigabit PHYs driver to
properly control MT7530 and MT7531 switch PHYs.

A noticeable change is that the behaviour of switchport interfaces going
up-down-up-down is no longer there.

Now, the switch can be used without the PHYs but, at the moment, every
hardware design out there that I have seen uses them. For that, it would
make the most sense to force the selection of MEDIATEK_GE_PHY for the MDIO
interface which currently controls the MT7530 and MT7531 switches.

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240122053451.8004-1-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 20:31:52 -08:00
Gerhard Engleder
5f76499fb5 tsnep: Add link down PHY loopback support
PHY loopback turns off link state change signalling. Therefore, the
loopback only works if the link is already up before the PHY loopback is
activated.

Ensure that PHY loopback works even if the link is not already up during
activation by calling netif_carrier_on() explicitly.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://lore.kernel.org/r/20240123200151.60848-1-gerhard@engleder-embedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 17:33:51 -08:00
Ankit Garg
3df1841626 gve: Modify rx_buf_alloc_fail counter centrally and closer to failure
Previously, each caller of gve_rx_alloc_buffer had to increase counter
 and as a result one caller was not tracking those failure. Increasing
 counters at a common location now so callers don't have to duplicate
 code or miss counter management.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Link: https://lore.kernel.org/r/20240124205435.1021490-1-nktgrg@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 17:18:56 -08:00
Jakub Kicinski
5535fcc59a Merge branch 'selftests-updates-to-fcnal-test-for-autoamted-environment'
David Ahern says:

====================
selftests: Updates to fcnal-test for autoamted environment

The first patch updates the PATH for fcnal-test.sh to find the nettest
binary when invoked at the top-level directory via
   make -C tools/testing/selftests TARGETS=net run_tests

Second patch fixes a bug setting the ping_group; it has a compound value
and that value is not traversing the various helper functions in tact.
Fix by creating a helper specific to setting it.

Third patch adds more output when a test fails - e.g., to catch a change
in the return code of some test.

With these 3 patches, the entire suite completes successfully when
run on Ubuntu 23.10 with 6.5 kernel - 914 tests pass, 0 fail.
====================

Link: https://lore.kernel.org/r/20240124214117.24687-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 17:14:25 -08:00
David Ahern
70863c902d selftest: Show expected and actual return codes for test failures in fcnal-test
Capture expected and actual return codes for a test that fails in
the fcnal-test suite.

Signed-off-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240124214117.24687-4-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 17:14:11 -08:00
David Ahern
79bf0d4a07 selftest: Fix set of ping_group_range in fcnal-test
ping_group_range sysctl has a compound value which does not go
through the various function layers in tact. Create a helper
function to bypass the layers and correctly set the value.

Signed-off-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240124214117.24687-3-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 17:14:11 -08:00
David Ahern
ad9b701aed selftest: Update PATH for nettest in fcnal-test
Allow fcnal-test.sh to be run from top level directory in the
kernel repo as well as from tools/testing/selftests/net by
setting the PATH to find the in-tree nettest.

Signed-off-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240124214117.24687-2-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 17:14:11 -08:00
Jakub Kicinski
b54846da45 Merge tag 'wireless-next-2024-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:

====================
wireless-next patches for v6.9

The first "new features" pull request for v6.9. We have only driver
changes this time and most of them are for Realtek drivers. Really
nice to see activity in Broadcom drivers again.

Major changes:

rtwl8xxxu
 * RTL8188F: concurrent interface support
 * Channel Switch Announcement (CSA) support in AP mode

brcmfmac
 * per-vendor feature support
 * per-vendor SAE password setup

rtlwifi
 * speed up USB firmware initialisation

* tag 'wireless-next-2024-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (105 commits)
  wifi: iwlegacy: Use kcalloc() instead of kzalloc()
  wifi: rtw89: fix disabling concurrent mode TX hang issue
  wifi: rtw89: fix HW scan timeout due to TSF sync issue
  wifi: rtw89: add wait/completion for abort scan
  wifi: rtw89: fix null pointer access when abort scan
  wifi: rtw89: disable RTS when broadcast/multicast
  wifi: rtw89: Set default CQM config if not present
  wifi: rtw89: refine hardware scan C2H events
  wifi: rtw89: refine add_chan H2C command to encode_bits
  wifi: rtw89: 8922a: add BTG functions to assist BT coexistence to control TX/RX
  wifi: rtw89: 8922a: add TX power related ops
  wifi: rtw89: 8922a: add register definitions of H2C, C2H, page, RRSR and EDCCA
  wifi: rtw89: 8922a: add chip_ops related to BB init
  wifi: rtw89: 8922a: add chip_ops::{enable,disable}_bb_rf
  wifi: rtw89: add mlo_dbcc_mode for WiFi 7 chips
  wifi: rtlwifi: Speed up firmware loading for USB
  wifi: rtl8xxxu: add missing number of sec cam entries for all variants
  wifi: brcmfmac: allow per-vendor event handling
  wifi: brcmfmac: avoid invalid list operation when vendor attach fails
  wifi: brcmfmac: Demote vendor-specific attach/detach messages to info
  ...
====================

Link: https://lore.kernel.org/r/20240125104030.B6CA6C433C7@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 16:49:55 -08:00
Arseniy Krasnov
767ec326f9 vsock/test: print type for SOCK_SEQPACKET
SOCK_SEQPACKET is supported for virtio transport, so do not interpret
such type of socket as unknown.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240124193255.3417803-1-avkrasnov@salutedevices.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 16:39:21 -08:00
Jakub Kicinski
85da9d9ff2 Merge branch 'selftests-tc-testing-misc-changes-for-tdc'
Pedro Tammela says:

====================
selftests: tc-testing: misc changes for tdc

Patches 1 and 3 are fixes for tdc that were discovered when running it
using defconfig + tc-testing config and against the latest iproute2.

Patch 2 improves the taprio tests.

Patch 4 enables all tdc tests.

Patch 5 fixes the return code of tdc for when a test fails
setup/teardown.
====================

Link: https://lore.kernel.org/r/20240124181933.75724-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 16:38:19 -08:00
Pedro Tammela
8981a85e1b selftests: tc-testing: return fail if a test fails in setup/teardown
As of today tests throwing exceptions in setup/teardown phase are
treated as skipped but they should really be failures.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240124181933.75724-6-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 16:38:16 -08:00
Pedro Tammela
d17d0e3337 selftests: tc-testing: enable all tdc tests
For the longest time tdc ran only actions and qdiscs tests.
It's time to enable all the remaining tests so every user visible
piece of TC is tested by the downstream CIs.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240124181933.75724-5-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 16:38:16 -08:00
Pedro Tammela
3007d8712c selftests: tc-testing: adjust fq test to latest iproute2
Adjust the fq verify regex to the latest iproute2

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240124181933.75724-4-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 16:38:16 -08:00
Pedro Tammela
4f4d384121 selftests: tc-testing: check if 'jq' is available in taprio tests
If 'jq' is not available the taprio tests might enter an infinite loop,
use the "dependsOn" feature from tdc to check if jq is present. If it's
not the test is skipped.

Suggested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240124181933.75724-3-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 16:38:16 -08:00
Pedro Tammela
14a12e6c0b selftests: tc-testing: add missing netfilter config
On a default config + tc-testing config build, tdc will miss
all the netfilter related tests because it's missing:
   CONFIG_NETFILTER=y

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240124181933.75724-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 16:38:16 -08:00
Jakub Kicinski
06f609b311 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25 14:20:08 -08:00
Linus Torvalds
ecb1b8288d Merge tag 'net-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf, netfilter and WiFi.

  Jakub is doing a lot of work to include the self-tests in our CI, as a
  result a significant amount of self-tests related fixes is flowing in
  (and will likely continue in the next few weeks).

  Current release - regressions:

   - bpf: fix a kernel crash for the riscv 64 JIT

   - bnxt_en: fix memory leak in bnxt_hwrm_get_rings()

   - revert "net: macsec: use skb_ensure_writable_head_tail to expand
     the skb"

  Previous releases - regressions:

   - core: fix removing a namespace with conflicting altnames

   - tc/flower: fix chain template offload memory leak

   - tcp:
      - make sure init the accept_queue's spinlocks once
      - fix autocork on CPUs with weak memory model

   - udp: fix busy polling

   - mlx5e:
      - fix out-of-bound read in port timestamping
      - fix peer flow lists corruption

   - iwlwifi: fix a memory corruption

  Previous releases - always broken:

   - netfilter:
      - nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress
        basechain
      - nft_limit: reject configurations that cause integer overflow

   - bpf: fix bpf_xdp_adjust_tail() with XSK zero-copy mbuf, avoiding a
     NULL pointer dereference upon shrinking

   - llc: make llc_ui_sendmsg() more robust against bonding changes

   - smc: fix illegal rmb_desc access in SMC-D connection dump

   - dpll: fix pin dump crash for rebound module

   - bnxt_en: fix possible crash after creating sw mqprio TCs

   - hv_netvsc: calculate correct ring size when PAGE_SIZE is not 4kB

  Misc:

   - several self-tests fixes for better integration with the netdev CI

   - added several missing modules descriptions"

* tag 'net-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ring
  tsnep: Remove FCS for XDP data path
  net: fec: fix the unhandled context fault from smmu
  selftests: bonding: do not test arp/ns target with mode balance-alb/tlb
  fjes: fix memleaks in fjes_hw_setup
  i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
  i40e: set xdp_rxq_info::frag_size
  xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
  ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
  intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
  ice: remove redundant xdp_rxq_info registration
  i40e: handle multi-buffer packets that are shrunk by xdp prog
  ice: work on pre-XDP prog frag count
  xsk: fix usage of multi-buffer BPF helpers for ZC XDP
  xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
  xsk: recycle buffer in case Rx queue was full
  net: fill in MODULE_DESCRIPTION()s for rvu_mbox
  net: fill in MODULE_DESCRIPTION()s for litex
  net: fill in MODULE_DESCRIPTION()s for fsl_pq_mdio
  net: fill in MODULE_DESCRIPTION()s for fec
  ...
2024-01-25 10:58:35 -08:00
Linus Torvalds
bdc010200e Merge tag 'ovl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs fix from Amir Goldstein:
 "Change the on-disk format for the new "xwhiteouts" feature introduced
  in v6.7

  The change reduces unneeded overhead of an extra getxattr per readdir.
  The only user of the "xwhiteout" feature is the external composefs
  tool, which has been updated to support the new on-disk format.

  This change is also designated for 6.7.y"

* tag 'ovl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: mark xwhiteouts directory with overlay.opaque='x'
2024-01-25 10:52:30 -08:00
Linus Torvalds
a658e0e986 Merge tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull netfs fixes from Christian Brauner:
 "This contains various fixes for the netfs work merged earlier this
  cycle:

  afs:
   - Fix locking imbalance in afs_proc_addr_prefs_show()
   - Remove afs_dynroot_d_revalidate() which is redundant
   - Fix error handling during lookup
   - Hide sillyrenames from userspace. This fixes a race between
     silly-rename files being created/removed and userspace iterating
     over directory entries
   - Don't use unnecessary folio_*() functions

  cifs:
   - Don't use unnecessary folio_*() functions

  cachefiles:
   - erofs: Fix Null dereference when cachefiles are not doing
     ondemand-mode
   - Update mailing list

  netfs library:
   - Add Jeff Layton as reviewer
   - Update mailing list
   - Fix a error checking in netfs_perform_write()
   - fscache: Check error before dereferencing
   - Don't use unnecessary folio_*() functions"

* tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  afs: Fix missing/incorrect unlocking of RCU read lock
  afs: Remove afs_dynroot_d_revalidate() as it is redundant
  afs: Fix error handling with lookup via FS.InlineBulkStatus
  afs: Hide silly-rename files from userspace
  cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode
  netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write()
  netfs, fscache: Prevent Oops in fscache_put_cache()
  cifs: Don't use certain unnecessary folio_*() functions
  afs: Don't use certain unnecessary folio_*() functions
  netfs: Don't use certain unnecessary folio_*() functions
  netfs: Add Jeff Layton as reviewer
  netfs, cachefiles: Change mailing list
2024-01-25 10:41:29 -08:00
Linus Torvalds
b9fa4cbd84 Merge tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:

 - Fix in-kernel RPC UDP transport

 - Fix NFSv4.0 RELEASE_LOCKOWNER

* tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: fix RELEASE_LOCKOWNER
  SUNRPC: use request size to initialize bio_vec in svc_udp_sendto()
2024-01-25 10:26:52 -08:00
Linus Torvalds
3cb9871f81 Merge tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux
Pull RCU fix from Neeraj Upadhyay:
 "This fixes RCU grace period stalls, which are observed when an
  outgoing CPU's quiescent state reporting results in wakeup of one of
  the grace period kthreads, to complete the grace period.

  If those kthreads have SCHED_FIFO policy, the wake up can indirectly
  arm the RT bandwith timer to the local offline CPU.

  Earlier migration of the hrtimers from the CPU introduced in commit
  5c0930ccaa ("hrtimers: Push pending hrtimers away from outgoing CPU
  earlier") results in this timer getting ignored.

  If the RCU grace period kthreads are waiting for RT bandwidth to be
  available, they may never be actually scheduled, resulting in RCU
  stall warnings"

* tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux:
  rcu: Defer RCU kthreads wakeup when CPU is dying
2024-01-25 10:21:21 -08:00
Arınç ÜNAL
91374ba537 net: dsa: mt7530: support OF-based registration of switch MDIO bus
Currently the MDIO bus of the switches the MT7530 DSA subdriver controls
can only be registered as non-OF-based. Bring support for registering the
bus OF-based.

The subdrivers that control switches [with MDIO bus] probed on OF must
follow this logic to support all cases properly:

No switch MDIO bus defined: Populate ds->user_mii_bus, register the MDIO
bus, set the interrupts for PHYs if "interrupt-controller" is defined at
the switch node. This case should only be covered for the switches which
their dt-bindings documentation didn't document the MDIO bus from the
start. This is to keep supporting the device trees that do not describe the
MDIO bus on the device tree but the MDIO bus is being used nonetheless.

Switch MDIO bus defined: Don't populate ds->user_mii_bus, register the MDIO
bus, set the interrupts for PHYs if ["interrupt-controller" is defined at
the switch node and "interrupts" is defined at the PHY nodes under the
switch MDIO bus node].

Switch MDIO bus defined but explicitly disabled: If the device tree says
status = "disabled" for the MDIO bus, we shouldn't need an MDIO bus at all.
Instead, just exit as early as possible and do not call any MDIO API.

The use of ds->user_mii_bus is inappropriate when the MDIO bus of the
switch is described on the device tree [1], which is why we don't populate
ds->user_mii_bus in that case.

Link: https://lore.kernel.org/netdev/20231213120656.x46fyad6ls7sqyzv@skbuf/ [1]
Suggested-by: David Bauer <mail@david-bauer.net>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240122053431.7751-1-arinc.unal@arinc9.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25 15:39:28 +01:00
Paolo Abeni
0a5bd0ffe7 Merge branch 'tsnep-xdp-fixes'
Gerhard Engleder says:

====================
tsnep: XDP fixes

Found two driver specific problems during XDP and XSK testing.
====================

Link: https://lore.kernel.org/r/20240123200918.61219-1-gerhard@engleder-embedded.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25 11:59:44 +01:00
Gerhard Engleder
9a91c05f4b tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ring
The fill ring of the XDP socket may contain not enough buffers to
completey fill the RX queue during socket creation. In this case the
flag XDP_RING_NEED_WAKEUP is not set as this flag is only set if the RX
queue is not completely filled during polling.

Set XDP_RING_NEED_WAKEUP flag also if RX queue is not completely filled
during XDP socket creation.

Fixes: 3fc2333933 ("tsnep: Add XDP socket zero-copy RX support")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25 11:59:42 +01:00
Gerhard Engleder
50bad6f797 tsnep: Remove FCS for XDP data path
The RX data buffer includes the FCS. The FCS is already stripped for the
normal data path. But for the XDP data path the FCS is included and
acts like additional/useless data.

Remove the FCS from the RX data buffer also for XDP.

Fixes: 65b28c8100 ("tsnep: Add XDP RX support")
Fixes: 3fc2333933 ("tsnep: Add XDP socket zero-copy RX support")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25 11:59:42 +01:00
Paolo Abeni
5da4597163 Merge tag 'mlx5-fixes-2024-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5 fixes 2024-01-24

This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.

* tag 'mlx5-fixes-2024-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: fix a potential double-free in fs_any_create_groups
  net/mlx5e: fix a double-free in arfs_create_groups
  net/mlx5e: Ignore IPsec replay window values on sender side
  net/mlx5e: Allow software parsing when IPsec crypto is enabled
  net/mlx5: Use mlx5 device constant for selecting CQ period mode for ASO
  net/mlx5: DR, Can't go to uplink vport on RX rule
  net/mlx5: DR, Use the right GVMI number for drop action
  net/mlx5: Bridge, fix multicast packets sent to uplink
  net/mlx5: Fix a WARN upon a callback command failure
  net/mlx5e: Fix peer flow lists handling
  net/mlx5e: Fix inconsistent hairpin RQT sizes
  net/mlx5e: Fix operation precedence bug in port timestamping napi_poll context
  net/mlx5: Fix query of sd_group field
  net/mlx5e: Use the correct lag ports number when creating TISes
====================

Link: https://lore.kernel.org/r/20240124081855.115410-1-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25 11:42:27 +01:00
Paolo Abeni
fdf8e6d18c Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2024-01-25

The following pull-request contains BPF updates for your *net* tree.

We've added 12 non-merge commits during the last 2 day(s) which contain
a total of 13 files changed, 190 insertions(+), 91 deletions(-).

The main changes are:

1) Fix bpf_xdp_adjust_tail() in context of XSK zero-copy drivers which
   support XDP multi-buffer. The former triggered a NULL pointer
   dereference upon shrinking, from Maciej Fijalkowski & Tirthendu Sarkar.

2) Fix a bug in riscv64 BPF JIT which emitted a wrong prologue and
   epilogue for struct_ops programs, from Pu Lehui.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
  i40e: set xdp_rxq_info::frag_size
  xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
  ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
  intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
  ice: remove redundant xdp_rxq_info registration
  i40e: handle multi-buffer packets that are shrunk by xdp prog
  ice: work on pre-XDP prog frag count
  xsk: fix usage of multi-buffer BPF helpers for ZC XDP
  xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
  xsk: recycle buffer in case Rx queue was full
  riscv, bpf: Fix unpredictable kernel crash about RV64 struct_ops
====================

Link: https://lore.kernel.org/r/20240125084416.10876-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25 11:30:31 +01:00
Shenwei Wang
5e34480773 net: fec: fix the unhandled context fault from smmu
When repeatedly changing the interface link speed using the command below:

ethtool -s eth0 speed 100 duplex full
ethtool -s eth0 speed 1000 duplex full

The following errors may sometimes be reported by the ARM SMMU driver:

[ 5395.035364] fec 5b040000.ethernet eth0: Link is Down
[ 5395.039255] arm-smmu 51400000.iommu: Unhandled context fault:
fsr=0x402, iova=0x00000000, fsynr=0x100001, cbfrsynra=0x852, cb=2
[ 5398.108460] fec 5b040000.ethernet eth0: Link is Up - 100Mbps/Full -
flow control off

It is identified that the FEC driver does not properly stop the TX queue
during the link speed transitions, and this results in the invalid virtual
I/O address translations from the SMMU and causes the context faults.

Fixes: dbc64a8ea2 ("net: fec: move calls to quiesce/resume packet processing out of fec_restart()")
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Link: https://lore.kernel.org/r/20240123165141.2008104-1-shenwei.wang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25 11:22:27 +01:00
Hangbin Liu
a2933a8759 selftests: bonding: do not test arp/ns target with mode balance-alb/tlb
The prio_arp/ns tests hard code the mode to active-backup. At the same
time, The balance-alb/tlb modes do not support arp/ns target. So remove
the prio_arp/ns tests from the loop and only test active-backup mode.

Fixes: 481b56e039 ("selftests: bonding: re-format bond option tests")
Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Closes: https://lore.kernel.org/netdev/17415.1705965957@famine/
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20240123075917.1576360-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25 09:50:54 +01:00
Jakub Kicinski
a717932db1 Merge tag 'nf-24-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Update nf_tables kdoc to keep it in sync with the code, from George Guo.

2) Handle NETDEV_UNREGISTER event for inet/ingress basechain.

3) Reject configuration that cause nft_limit to overflow,
   from Florian Westphal.

4) Restrict anonymous set/map names to 16 bytes, from Florian Westphal.

5) Disallow to encode queue number and error in verdicts. This reverts
   a patch which seems to have introduced an early attempt to support for
   nfqueue maps, which is these days supported via nft_queue expression.

6) Sanitize family via .validate for expressions that explicitly refer
   to NF_INET_* hooks.

* tag 'nf-24-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: validate NFPROTO_* family
  netfilter: nf_tables: reject QUEUE/DROP verdict parameters
  netfilter: nf_tables: restrict anonymous set and map names to 16 bytes
  netfilter: nft_limit: reject configurations that cause integer overflow
  netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain
  netfilter: nf_tables: cleanup documentation
====================

Link: https://lore.kernel.org/r/20240124191248.75463-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-24 21:03:17 -08:00
Zhipeng Lu
f6cc4b6a3a fjes: fix memleaks in fjes_hw_setup
In fjes_hw_setup, it allocates several memory and delay the deallocation
to the fjes_hw_exit in fjes_probe through the following call chain:

fjes_probe
  |-> fjes_hw_init
        |-> fjes_hw_setup
  |-> fjes_hw_exit

However, when fjes_hw_setup fails, fjes_hw_exit won't be called and thus
all the resources allocated in fjes_hw_setup will be leaked. In this
patch, we free those resources in fjes_hw_setup and prevents such leaks.

Fixes: 2fcbca6877 ("fjes: platform_driver's .probe and .remove routine")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240122172445.3841883-1-alexious@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-24 18:03:53 -08:00
Randy Dunlap
5ca1a5153a tipc: node: remove Excess struct member kernel-doc warnings
Remove 2 kernel-doc descriptions to squelch warnings:

node.c:150: warning: Excess struct member 'inputq' description in 'tipc_node'
node.c:150: warning: Excess struct member 'namedq' description in 'tipc_node'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jon Maloy <jmaloy@redhat.com>
Cc: Ying Xue <ying.xue@windriver.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240123051152.23684-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-24 17:48:29 -08:00
Randy Dunlap
88bf1b8f3c tipc: socket: remove Excess struct member kernel-doc warning
Remove a kernel-doc description to squelch a warning:

socket.c:143: warning: Excess struct member 'blocking_link' description in 'tipc_sock'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jon Maloy <jmaloy@redhat.com>
Cc: Ying Xue <ying.xue@windriver.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240123051201.24701-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-24 17:48:09 -08:00
Arseniy Krasnov
e18c709230 vsock/test: add '--peer-port' input argument
Implement port for given CID as input argument instead of using
hardcoded value '1234'. This allows to run different test instances
on a single CID. Port argument is not required parameter and if it is
not set, then default value will be '1234' - thus we preserve previous
behaviour.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240123072750.4084181-1-avkrasnov@salutedevices.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-24 17:47:35 -08:00
Linus Torvalds
6098d87eaf Merge tag 'ceph-for-6.8-rc2' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
 "A fix to avoid triggering an assert in some cases where RBD exclusive
  mappings are involved and a deprecated API cleanup"

* tag 'ceph-for-6.8-rc2' of https://github.com/ceph/ceph-client:
  rbd: don't move requests to the running list on errors
  rbd: remove usage of the deprecated ida_simple_*() API
2024-01-24 16:59:52 -08:00
Linus Torvalds
f22face166 Merge tag 'integrity-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity fix from Mimi Zohar:
 "Revert patch that required user-provided key data, since keys can be
  created from kernel-generated random numbers"

* tag 'integrity-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  Revert "KEYS: encrypted: Add check for strsep"
2024-01-24 16:51:59 -08:00
Alexei Starovoitov
9d71bc833f Merge branch 'net-bpf_xdp_adjust_tail-and-intel-mbuf-fixes'
Maciej Fijalkowski says:

====================
net: bpf_xdp_adjust_tail() and Intel mbuf fixes

Hey,

after a break followed by dealing with sickness, here is a v6 that makes
bpf_xdp_adjust_tail() actually usable for ZC drivers that support XDP
multi-buffer. Since v4 I tried also using bpf_xdp_adjust_tail() with
positive offset which exposed yet another issues, which can be observed
by increased commit count when compared to v3.

John, in the end I think we should remove handling
MEM_TYPE_XSK_BUFF_POOL from __xdp_return(), but it is out of the scope
for fixes set, IMHO.

Thanks,
Maciej

v6:
- add acks [Magnus]
- fix spelling mistakes [Magnus]
- avoid touching xdp_buff in xp_alloc_{reused,new_from_fq}() [Magnus]
- s/shrink_data/bpf_xdp_shrink_data [Jakub]
- remove __shrink_data() [Jakub]
- check retvals from __xdp_rxq_info_reg() [Magnus]

v5:
- pick correct version of patch 5 [Simon]
- elaborate a bit more on what patch 2 fixes

v4:
- do not clear frags flag when deleting tail; xsk_buff_pool now does
  that
- skip some NULL tests for xsk_buff_get_tail [Martin, John]
- address problems around registering xdp_rxq_info
- fix bpf_xdp_frags_increase_tail() for ZC mbuf

v3:
- add acks
- s/xsk_buff_tail_del/xsk_buff_del_tail
- address i40e as well (thanks Tirthendu)

v2:
- fix !CONFIG_XDP_SOCKETS builds
- add reviewed-by tag to patch 3
====================

Link: https://lore.kernel.org/r/20240124191602.566724-1-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:07 -08:00
Maciej Fijalkowski
0cbb08707c i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
Now that i40e driver correctly sets up frag_size in xdp_rxq_info, let us
make it work for ZC multi-buffer as well. i40e_ring::rx_buf_len for ZC
is being set via xsk_pool_get_rx_frame_size() and this needs to be
propagated up to xdp_rxq_info.

Fixes: 1c9ba9c146 ("i40e: xsk: add RX multi-buffer support")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-12-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:07 -08:00
Maciej Fijalkowski
a045d2f2d0 i40e: set xdp_rxq_info::frag_size
i40e support XDP multi-buffer so it is supposed to use
__xdp_rxq_info_reg() instead of xdp_rxq_info_reg() and set the
frag_size. It can not be simply converted at existing callsite because
rx_buf_len could be un-initialized, so let us register xdp_rxq_info
within i40e_configure_rx_ring(), which happen to be called with already
initialized rx_buf_len value.

Commit 5180ff1364 ("i40e: use int for i40e_status") converted 'err' to
int, so two variables to deal with return codes are not needed within
i40e_configure_rx_ring(). Remove 'ret' and use 'err' to handle status
from xdp_rxq_info registration.

Fixes: e213ced19b ("i40e: add support for XDP multi-buffer Rx")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-11-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:07 -08:00
Maciej Fijalkowski
fbadd83a61 xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
XSK ZC Rx path calculates the size of data that will be posted to XSK Rx
queue via subtracting xdp_buff::data_end from xdp_buff::data.

In bpf_xdp_frags_increase_tail(), when underlying memory type of
xdp_rxq_info is MEM_TYPE_XSK_BUFF_POOL, add offset to data_end in tail
fragment, so that later on user space will be able to take into account
the amount of bytes added by XDP program.

Fixes: 24ea50127e ("xsk: support mbuf on ZC RX")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-10-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:07 -08:00