Commit Graph

1294708 Commits

Author SHA1 Message Date
Lorenzo Bianconi
9304640f2f net: airoha: Link the gdm port to the selected qdma controller
Link the running gdm port to the qdma controller used to connect with
the CPU. Moreover, load all QDMA controllers available on EN7581 SoC.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/95b515df34ba4727f7ae5b14a1d0462cceec84ff.1722522582.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:31:11 -07:00
Lorenzo Bianconi
160231e34b net: airoha: Start all qdma NAPIs in airoha_probe()
This is a preliminary patch to support multi-QDMA controllers.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/b51cf69c94d8cbc81e0a0b35587f024d01e6d9c0.1722522582.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:31:11 -07:00
Lorenzo Bianconi
e618447cf4 net: airoha: Allow mapping IO region for multiple qdma controllers
Map MMIO regions of both qdma controllers available on EN7581 SoC.
Run airoha_hw_cleanup routine for both QDMA controllers available on
EN7581 SoC removing airoha_eth module or in airoha_probe error path.
This is a preliminary patch to support multi-QDMA controllers.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/a734ae608da14b67ae749b375d880dbbc70868ea.1722522582.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:31:10 -07:00
Lorenzo Bianconi
e3d6bfdfc0 net: airoha: Use qdma pointer as private structure in airoha_irq_handler routine
This is a preliminary patch to support multi-QDMA controllers.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/1e40c3cb973881c0eb3c3c247c78550da62054ab.1722522582.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:31:10 -07:00
Lorenzo Bianconi
9a2500ab22 net: airoha: Add airoha_qdma pointer in airoha_tx_irq_queue/airoha_queue structures
Move airoha_eth pointer in airoha_qdma structure from
airoha_tx_irq_queue/airoha_queue ones. This is a preliminary patch to
introduce support for multi-QDMA controllers available on EN7581.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/074565b82fd0ceefe66e186f21133d825dbd48eb.1722522582.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:31:10 -07:00
Lorenzo Bianconi
19e47fc2ae net: airoha: Move irq_mask in airoha_qdma structure
QDMA controllers have independent irq lines, so move irqmask in
airoha_qdma structure. This is a preliminary patch to support multiple
QDMA controllers.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/1c8a06e8be605278a7b2f3cd8ac06e74bf5ebf2b.1722522582.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:31:10 -07:00
Lorenzo Bianconi
245c7bc86b net: airoha: Move airoha_queues in airoha_qdma
QDMA controllers available in EN7581 SoC have independent tx/rx hw queues
so move them in airoha_queues structure.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/795fc4797bffbf7f0a1351308aa9bf0e65b5126e.1722522582.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:31:10 -07:00
Lorenzo Bianconi
16874d1cf3 net: airoha: Introduce airoha_qdma struct
Introduce airoha_qdma struct and move qdma IO register mapping in
airoha_qdma. This is a preliminary patch to enable both QDMA controllers
available on EN7581 SoC.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/7df163bdc72ee29c3d27a0cbf54522ffeeafe53c.1722522582.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:31:10 -07:00
Simon Horman
9a95b7a89d eth: fbnic: select DEVLINK and PAGE_POOL
Build bot reports undefined references to devlink functions.
And local testing revealed undefined references to page_pool functions.

Based on a patch by Jakub Kicinski <kuba@kernel.org>

Fixes: 1a9d48892e ("eth: fbnic: Allocate core device specific structures and devlink interface")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408011219.hiPmwwAs-lkp@intel.com/
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240802-fbnic-select-v2-1-41f82a3e0178@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:23:59 -07:00
Jakub Kicinski
46619175f1 selftests: net: ksft: print more of the stack for checks
Print more stack frames and the failing line when check fails.
This helps when tests use helpers to do the checks.

Before:

  # At ./ksft/drivers/net/hw/rss_ctx.py line 92:
  # Check failed 1037698 >= 396893.0 traffic on other queues:[344612, 462380, 233020, 449174, 342298]
  not ok 8 rss_ctx.test_rss_context_queue_reconfigure

After:

  # Check| At ./ksft/drivers/net/hw/rss_ctx.py, line 387, in test_rss_context_queue_reconfigure:
  # Check|     test_rss_queue_reconfigure(cfg, main_ctx=False)
  # Check| At ./ksft/drivers/net/hw/rss_ctx.py, line 230, in test_rss_queue_reconfigure:
  # Check|     _send_traffic_check(cfg, port, ctx_ref, { 'target': (0, 3),
  # Check| At ./ksft/drivers/net/hw/rss_ctx.py, line 92, in _send_traffic_check:
  # Check|     ksft_lt(sum(cnts[i] for i in params['noise']), directed / 2,
  # Check failed 1045235 >= 405823.5 traffic on other queues (context 1)':[460068, 351995, 565970, 351579, 127270]
  not ok 8 rss_ctx.test_rss_context_queue_reconfigure

Link: https://patch.msgid.link/20240801232317.545577-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:09:47 -07:00
Stanislav Fomichev
a48395f22b selftests: net: ksft: replace 95 with errno.EOPNOTSUPP
Petr suggested to use errno.EOPNOTSUPP instead of hard-coded 95
in the new test case. Adjust existing ones to match this style.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20240802000309.2368-3-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:09:27 -07:00
Stanislav Fomichev
f879306834 selftests: net: ksft: support marking tests as disruptive
Add new @ksft_disruptive decorator to mark the tests that might
be disruptive to the system. Depending on how well the previous
test works in the CI we might want to disable disruptive tests
by default and only let the developers run them manually.

KSFT framework runs disruptive tests by default. DISRUPTIVE=False
environment (or config file) can be used to disable these tests.
ksft_setup should be called by the test cases that want to use
new decorator (ksft_setup is only called via NetDrvEnv/NetDrvEpEnv for now).

In the future we can add similar decorators to, for example, avoid
running slow tests all the time. And/or have some option to run
only 'fast' tests for some sort of smoke test scenario.

  $ DISRUPTIVE=False ./stats.py
  KTAP version 1
  1..5
  ok 1 stats.check_pause
  ok 2 stats.check_fec
  ok 3 stats.pkt_byte_sum
  ok 4 stats.qstat_by_ifindex
  ok 5 stats.check_down # SKIP marked as disruptive
  # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:1 error:0

v3:
- parse yes and properly treat non-zero nums as true (Petr)

v2:
- convert from cli argument to env variable (Jakub)

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20240802000309.2368-2-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:09:27 -07:00
Stanislav Fomichev
ab1000976c selftests: net-drv: exercise queue stats when the device is down
Verify that total device stats don't decrease after it has been turned down.
Also make sure the device doesn't crash when we access per-queue stats
when it's down (in case it tries to access some pointers that are NULL).

  KTAP version 1
  1..5
  ok 1 stats.check_pause
  ok 2 stats.check_fec
  ok 3 stats.pkt_byte_sum
  ok 4 stats.qstat_by_ifindex
  ok 5 stats.check_down
  # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0

v3:
- use errno.EOPNOTSUPP (Petr)
- move qstat[0] under try (Petr)

v2:
- KTAP output formatting (Jakub)
- defer instead of try/finally (Jakub)
- disappearing stats is an error (Jakub)
- ksft_ge instead of open coding (Jakub)

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20240802000309.2368-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:09:27 -07:00
Jakub Kicinski
49675f5bdf net: remove IFF_* re-definition
We re-define values of enum netdev_priv_flags as preprocessor
macros with the same name. I guess this was done to avoid breaking
out of tree modules which may use #ifdef X for kernel compatibility?
Commit 7aa98047df ("net: move net_device priv_flags out from UAPI")
which added the enum doesn't say. In any case, the flags with defines
are quite old now, and defines for new flags don't get added.
OOT drivers have to resort to code greps for compat detection, anyway.
Let's delete these defines, save LoC, help LXR link to the right place.

Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20240801163401.378723-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02 16:03:51 -07:00
David S. Miller
ce21e520fd Merge branch 'axienet-coding-style' into main
Radhey Shyam Pandey says:

====================
net: axienet: Fix coding style issues

This patchset replace all occurences of (1<<x) by BIT(x) to get rid
of checkpatch.pl "CHECK" output "Prefer using the BIT macro".

It also removes unnecessary ftrace-like logging, add missing blank line
after declaration and remove unnecessary parentheses around 'ndev->mtu
<= XAE_JUMBO_MTU' and 'ndev->mtu > XAE_MTU'.

Changes for v2:
- Split each coding style change into separate patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02 10:25:23 +01:00
Radhey Shyam Pandey
48ba8a1d04 net: axienet: remove unnecessary parentheses
Remove unnecessary parentheses around 'ndev->mtu
<= XAE_JUMBO_MTU' and 'ndev->mtu > XAE_MTU'. Reported
by checkpatch.

CHECK: Unnecessary parentheses around 'ndev->mtu > XAE_MTU'
+       if ((ndev->mtu > XAE_MTU) &&
+           (ndev->mtu <= XAE_JUMBO_MTU)) {

CHECK: Unnecessary parentheses around 'ndev->mtu <= XAE_JUMBO_MTU'
+       if ((ndev->mtu > XAE_MTU) &&
+           (ndev->mtu <= XAE_JUMBO_MTU)) {

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02 10:25:22 +01:00
Radhey Shyam Pandey
f83828a052 net: axienet: remove unnecessary ftrace-like logging
remove unnecessary ftrace-like logging. Also fixes below
checkpatch WARNING.

WARNING: Unnecessary ftrace-like logging - prefer using ftrace
+       dev_dbg(&ndev->dev, "%s\n", __func__);

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02 10:25:22 +01:00
Radhey Shyam Pandey
f7061a3e04 net: axienet: add missing blank line after declaration
Add missing blank line after declaration. Fixes below
checkpatch warnings.

WARNING: Missing a blank line after declarations
+       struct sockaddr *addr = p;
+       axienet_set_mac_address(ndev, addr->sa_data);

WARNING: Missing a blank line after declarations
+       struct axienet_local *lp = netdev_priv(ndev);
+       disable_irq(lp->tx_irq);

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02 10:25:22 +01:00
Appana Durga Kedareswara Rao
3ff578c91c net: axienet: Replace the occurrences of (1<<x) by BIT(x)
Replace all occurences of (1<<x) by BIT(x) to get rid of checkpatch.pl
"CHECK" output "Prefer using the BIT macro".

Signed-off-by: Appana Durga Kedareswara Rao <appana.durga.rao@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02 10:25:22 +01:00
David S. Miller
3361a6eae5 Merge branch 'vsock-virtio' into main
Luigi Leonardi says:

====================
vsock: avoid queuing on intermediate queue if possible

This series introduces an optimization for vsock/virtio to reduce latency
and increase the throughput: When the guest sends a packet to the host,
and the intermediate queue (send_pkt_queue) is empty, if there is enough
space, the packet is put directly in the virtqueue.

v3->v4
While running experiments on fio with 64B payload, I realized that there
was a mistake in my fio configuration, so I re-ran all the experiments
and now the latency numbers are indeed lower with the patch applied.
I also noticed that I was kicking the host without the lock.

- Fixed a configuration mistake on fio and re-ran all experiments.
- Fio latency measurement using 64B payload.
- virtio_transport_send_skb_fast_path sends kick with the tx_lock acquired
- Addressed all minor style changes requested by maintainer.
- Rebased on latest net-next
- Link to v3: https://lore.kernel.org/r/20240711-pinna-v3-0-697d4164fe80@outlook.com

v2->v3
- Performed more experiments using iperf3 using multiple streams
- Handling of reply packets removed from virtio_transport_send_skb,
  as is needed just by the worker.
- Removed atomic_inc/atomic_sub when queuing directly to the vq.
- Introduced virtio_transport_send_skb_fast_path that handles the
  steps for sending on the vq.
- Fixed a missing mutex_unlock in error path.
- Changed authorship of the second commit
- Rebased on latest net-next

v1->v2
In this v2 I replaced a mutex_lock with a mutex_trylock because it was
insidea RCU critical section. I also added a check on tx_run, so if the
module is being removed the packet is not queued. I'd like to thank Stefano
for reporting the tx_run issue.

Applied all Stefano's suggestions:
    - Minor code style changes
    - Minor commit text rewrite
Performed more experiments:
     - Check if all the packets go directly to the vq (Matias' suggestion)
     - Used iperf3 to see if there is any improvement in overall throughput
      from guest to host
     - Pinned the vhost process to a pCPU.
     - Run fio using 512B payload
Rebased on latest net-next
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02 09:20:29 +01:00
Luigi Leonardi
18ee44ce97 test/vsock: add ioctl unsent bytes test
Introduce two tests, one for SOCK_STREAM and one for SOCK_SEQPACKET,
which use SIOCOUTQ ioctl to check that the number of unsent bytes is
zero after delivering a packet.

vsock_connect and vsock_accept are no longer static: this is to
create more generic tests, allowing code to be reused for SEQPACKET
and STREAM.

Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02 09:20:28 +01:00
Luigi Leonardi
e6ab450057 vsock/virtio: add SIOCOUTQ support for all virtio based transports
Introduce support for virtio_transport_unsent_bytes
ioctl for virtio_transport, vhost_vsock and vsock_loopback.

For all transports the unsent bytes counter is incremented
in virtio_transport_get_credit.

In virtio_transport (G2H) and in vhost-vsock (H2G) the counter
is decremented when the skbuff is consumed. In vsock_loopback the
same skbuff is passed from the transmitter to the receiver, so
the counter is decremented before queuing the skbuff to the
receiver.

Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02 09:20:28 +01:00
Luigi Leonardi
744500d81f vsock: add support for SIOCOUTQ ioctl
Add support for ioctl(s) in AF_VSOCK.
The only ioctl available is SIOCOUTQ/TIOCOUTQ, which returns the number
of unsent bytes in the socket. This information is transport-specific
and is delegated to them using a callback.

Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02 09:20:28 +01:00
Rob Herring (Arm)
5fe164fb0e net: Use of_property_read_bool()
Use of_property_read_bool() to read boolean properties rather than
of_find_property(). This is part of a larger effort to remove callers
of of_find_property() and similar functions. of_find_property() leaks
the DT struct property and data pointers which is a problem for
dynamically allocated nodes which may be freed.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20240731191601.1714639-2-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01 18:15:18 -07:00
Rob Herring (Arm)
0b0e9cdb3d net: mdio: Use of_property_count_u32_elems() to get property length
Replace of_get_property() with the type specific
of_property_count_u32_elems() to get the property length.

This is part of a larger effort to remove callers of of_get_property()
and similar functions. of_get_property() leaks the DT property data
pointer which is a problem for dynamically allocated nodes which may
be freed.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20240731201514.1839974-2-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01 18:13:15 -07:00
Rob Herring (Arm)
46e6acfe35 net: phy: qca807x: Drop unnecessary and broken DT validation
The check for "leds" and "gpio-controller" both being present is never
true because "leds" is a node, not a property. This could be fixed
with a check for child node, but there's really no need for the kernel
to validate a DT. Just continue ignoring the LEDs if GPIOs are present.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20240731201703.1842022-2-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01 18:12:42 -07:00
John Wang
5fcf0801ef net: mctp: Consistent peer address handling in ioctl tag allocation
When executing ioctl to allocate tags, if the peer address is 0,
mctp_alloc_local_tag now replaces it with 0xff. However, during tag
dropping, this replacement is not performed, potentially causing the key
not to be dropped as expected.

Signed-off-by: John Wang <wangzhiqiang02@ieisystem.com>
Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/20240730084636.184140-1-wangzhiqiang02@ieisystem.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01 18:04:12 -07:00
Jakub Kicinski
5fa35bd39c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts or adjacent changes.

Link: https://patch.msgid.link/20240801131917.34494-1-pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01 10:45:23 -07:00
Linus Torvalds
183d46ff42 Merge tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from wireless, bleutooth, BPF and netfilter.

  Current release - regressions:

   - core: drop bad gso csum_start and offset in virtio_net_hdr

   - wifi: mt76: fix null pointer access in mt792x_mac_link_bss_remove

   - eth: tun: add missing bpf_net_ctx_clear() in do_xdp_generic()

   - phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and
     aqr115c

  Current release - new code bugs:

   - smc: prevent UAF in inet_create()

   - bluetooth: btmtk: fix kernel crash when entering btmtk_usb_suspend

   - eth: bnxt: reject unsupported hash functions

  Previous releases - regressions:

   - sched: act_ct: take care of padding in struct zones_ht_key

   - netfilter: fix null-ptr-deref in iptable_nat_table_init().

   - tcp: adjust clamping window for applications specifying SO_RCVBUF

  Previous releases - always broken:

   - ethtool: rss: small fixes to spec and GET

   - mptcp:
      - fix signal endpoint re-add
      - pm: fix backup support in signal endpoints

   - wifi: ath12k: fix soft lockup on suspend

   - eth: bnxt_en: fix RSS logic in __bnxt_reserve_rings()

   - eth: ice: fix AF_XDP ZC timeout and concurrency issues

   - eth: mlx5:
      - fix missing lock on sync reset reload
      - fix error handling in irq_pool_request_irq"

* tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
  mptcp: fix duplicate data handling
  mptcp: fix bad RCVPRUNED mib accounting
  ipv6: fix ndisc_is_useropt() handling for PIO
  igc: Fix double reset adapter triggered from a single taprio cmd
  net: MAINTAINERS: Demote Qualcomm IPA to "maintained"
  net: wan: fsl_qmc_hdlc: Discard received CRC
  net: wan: fsl_qmc_hdlc: Convert carrier_lock spinlock to a mutex
  net/mlx5e: Add a check for the return value from mlx5_port_set_eth_ptys
  net/mlx5e: Fix CT entry update leaks of modify header context
  net/mlx5e: Require mlx5 tc classifier action support for IPsec prio capability
  net/mlx5: Fix missing lock on sync reset reload
  net/mlx5: Lag, don't use the hardcoded value of the first port
  net/mlx5: DR, Fix 'stack guard page was hit' error in dr_rule
  net/mlx5: Fix error handling in irq_pool_request_irq
  net/mlx5: Always drain health in shutdown callback
  net: Add skbuff.h to MAINTAINERS
  r8169: don't increment tx_dropped in case of NETDEV_TX_BUSY
  netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init().
  netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init().
  net: drop bad gso csum_start and offset in virtio_net_hdr
  ...
2024-08-01 09:42:09 -07:00
Simon Horman
743ff02152 ethtool: Don't check for NULL info in prepare_data callbacks
Since commit f946270d05 ("ethtool: netlink: always pass genl_info to
.prepare_data") the info argument of prepare_data callbacks is never
NULL. Remove checks present in callback implementations.

Link: https://lore.kernel.org/netdev/20240703121237.3f8b9125@kernel.org/
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240731-prepare_data-null-check-v1-1-627f2320678f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01 09:03:52 -07:00
Yue Haibing
f9c141fc33 RDS: IB: Remove unused declarations
Commit f4f943c958 ("RDS: IB: ack more receive completions to improve performance")
removed rds_ib_recv_tasklet_fn() implementation but not the declaration.
And commit ec16227e14 ("RDS/IB: Infiniband transport") declared but never implemented
other functions.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20240731063630.3592046-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01 09:03:28 -07:00
Daniel Golle
887b1d1adb net: ethernet: mtk_eth_soc: drop clocks unused by Ethernet driver
Clocks for SerDes and PHY are going to be handled by standalone drivers
for each of those hardware components. Drop them from the Ethernet driver.

The clocks which are being removed for this patch are responsible for
the for the SerDes PCS and PHYs used for the 2nd and 3rd MAC which are
anyway not yet supported. Hence backwards compatibility is not an issue.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/b5faaf69b5c6e3e155c64af03706c3c423c6a1c9.1722335682.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01 08:58:13 -07:00
Anand Khoje
501c3005f0 net/mlx5: Reclaim max 50K pages at once
In non FLR context, at times CX-5 requests release of ~8 million FW pages.
This needs humongous number of cmd mailboxes, which to be released once
the pages are reclaimed. Release of humongous number of cmd mailboxes is
consuming cpu time running into many seconds. Which with non preemptible
kernels is leading to critical process starving on that cpu’s RQ.
On top of it, the FW does not use all the mailbox messages as it has a
limit of releasing 50K pages at once per MLX5_CMD_OP_MANAGE_PAGES +
MLX5_PAGES_TAKE device command. Hence, the allocation of these many
mailboxes is extra and adds unnecessary overhead.
To alleviate this, this change restricts the total number of pages
a worker will try to reclaim to maximum 50K pages in one go.

Our tests have shown significant benefit of this change in terms of
time consumed by dma_pool_free().
During a test where an event was raised by HCA
to release 1.3 Million pages, following observations were made:

- Without this change:
Number of mailbox messages allocated was around 20K, to accommodate
the DMA addresses of 1.3 million pages.
The average time spent by dma_pool_free() to free the DMA pool is between
16 usec to 32 usec.
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@                                        287
            1024 |@@@                                      1332
            2048 |@                                        656
            4096 |@@@@@                                    2599
            8192 |@@@@@@@@@@                               4755
           16384 |@@@@@@@@@@@@@@@                          7545
           32768 |@@@@@                                    2501
           65536 |                                         0

- With this change:
Number of mailbox messages allocated was around 800; this was to
accommodate DMA addresses of only 50K pages.
The average time spent by dma_pool_free() to free the DMA pool in this case
lies between 1 usec to 2 usec.
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@@@@@@@@@@@@@@@@@@                       346
            1024 |@@@@@@@@@@@@@@@@@@@@@@                   435
            2048 |                                         0
            4096 |                                         0
            8192 |                                         1
           16384 |                                         0

Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20240730073634.114407-1-anand.a.khoje@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01 08:52:37 -07:00
Paolo Abeni
25010bfdf8 Merge branch 'mptcp-fix-duplicate-data-handling'
Matthieu Baerts says:

====================
mptcp: fix duplicate data handling

In some cases, the subflow-level's copied_seq counter was incorrectly
increased, leading to an unexpected subflow reset.

Patch 1/2 fixes the RCVPRUNED MIB counter that was attached to the wrong
event since its introduction in v5.14, backported to v5.11.

Patch 2/2 fixes the copied_seq counter issues, is present since v5.10.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
====================

Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-dup-data-v1-0-bde833fa628a@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01 12:30:16 +02:00
Paolo Abeni
68cc924729 mptcp: fix duplicate data handling
When a subflow receives and discards duplicate data, the mptcp
stack assumes that the consumed offset inside the current skb is
zero.

With multiple subflows receiving data simultaneously such assertion
does not held true. As a result the subflow-level copied_seq will
be incorrectly increased and later on the same subflow will observe
a bad mapping, leading to subflow reset.

Address the issue taking into account the skb consumed offset in
mptcp_subflow_discard_data().

Fixes: 04e4cd4f7c ("mptcp: cleanup mptcp_subflow_discard_data()")
Cc: stable@vger.kernel.org
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/501
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01 12:30:13 +02:00
Paolo Abeni
0a567c2a10 mptcp: fix bad RCVPRUNED mib accounting
Since its introduction, the mentioned MIB accounted for the wrong
event: wake-up being skipped as not-needed on some edge condition
instead of incoming skb being dropped after landing in the (subflow)
receive queue.

Move the increment in the correct location.

Fixes: ce599c5163 ("mptcp: properly account bulk freed memory")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01 12:30:13 +02:00
Paolo Abeni
2b4a32daa6 Merge tag 'nf-24-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Fix a possible null-ptr-deref sometimes triggered by iptables-restore at
boot time. Register iptables {ipv4,ipv6} nat table pernet in first place
to fix this issue. Patch #1 and #2 from Kuniyuki Iwashima.

netfilter pull request 24-07-31

* tag 'nf-24-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init().
  netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init().
====================

Link: https://patch.msgid.link/20240731213046.6194-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01 12:08:28 +02:00
Maciej Żenczykowski
a46c68debf ipv6: fix ndisc_is_useropt() handling for PIO
The current logic only works if the PIO is between two
other ND user options.  This fixes it so that the PIO
can also be either before or after other ND user options
(for example the first or last option in the RA).

side note: there's actually Android tests verifying
a portion of the old broken behaviour, so:
  https://android-review.googlesource.com/c/kernel/tests/+/3196704
fixes those up.

Cc: Jen Linkova <furry@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Patrick Rohr <prohr@google.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Fixes: 048c796beb ("ipv6: adjust ndisc_is_useropt() to also return true for PIO")
Link: https://patch.msgid.link/20240730001748.147636-1-maze@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01 11:40:29 +02:00
Breno Leitao
c9c0ee5f20 net: skbuff: Skip early return in skb_unref when debugging
This patch modifies the skb_unref function to skip the early return
optimization when CONFIG_DEBUG_NET is enabled. The change ensures that
the reference count decrement always occurs in debug builds, allowing
for more thorough checking of SKB reference counting.

Previously, when the SKB's reference count was 1 and CONFIG_DEBUG_NET
was not set, the function would return early after a memory barrier
(smp_rmb()) without decrementing the reference count. This optimization
assumes it's safe to proceed with freeing the SKB without the overhead
of an atomic decrement from 1 to 0.

With this change:
- In non-debug builds (CONFIG_DEBUG_NET not set), behavior remains
  unchanged, preserving the performance optimization.
- In debug builds (CONFIG_DEBUG_NET set), the reference count is always
  decremented, even when it's 1, allowing for consistent behavior and
  potentially catching subtle SKB management bugs.

This modification enhances debugging capabilities for networking code
without impacting performance in production kernels. It helps kernel
developers identify and diagnose issues related to SKB management and
reference counting in the network stack.

Cc: Chris Mason <clm@fb.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240729104741.370327-1-leitao@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01 11:07:38 +02:00
Jakub Kicinski
8e0c0ec9b7 Merge branch 'ethernet-convert-from-tasklet-to-bh-workqueue'
Allen Pais says:

====================
ethernet: Convert from tasklet to BH workqueue [part]

The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts a few drivers in drivers/ethernet/* from tasklet
to BH workqueue. The next set will be sent out after the next -rc is
out.

v2: https://lore.kernel.org/20240621183947.4105278-1-allen.lkml@gmail.com
v1: https://lore.kernel.org/20240507190111.16710-2-apais@linux.microsoft.com
====================

Link: https://patch.msgid.link/20240730183403.4176544-1-allen.lkml@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31 19:05:20 -07:00
Allen Pais
c5092ba315 net: macb: Convert tasklet API to new bottom half workqueue mechanism
Migrate tasklet APIs to the new bottom half workqueue mechanism. It
replaces all occurrences of tasklet usage with the appropriate workqueue
APIs throughout the macb driver. This transition ensures compatibility
with the latest design and enhances performance.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Link: https://patch.msgid.link/20240730183403.4176544-5-allen.lkml@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31 18:59:46 -07:00
Allen Pais
8d3beb6bc7 net: cnic: Convert tasklet API to new bottom half workqueue mechanism
Migrate tasklet APIs to the new bottom half workqueue mechanism. It
replaces all occurrences of tasklet usage with the appropriate workqueue
APIs throughout the cnic driver. This transition ensures compatibility
with the latest design and enhances performance.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Link: https://patch.msgid.link/20240730183403.4176544-4-allen.lkml@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31 18:59:46 -07:00
Allen Pais
2d671dc6f0 net: xgbe: Convert tasklet API to new bottom half workqueue mechanism
Migrate tasklet APIs to the new bottom half workqueue mechanism. It
replaces all occurrences of tasklet usage with the appropriate workqueue
APIs throughout the xgbe driver. This transition ensures compatibility
with the latest design and enhances performance.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Link: https://patch.msgid.link/20240730183403.4176544-3-allen.lkml@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31 18:59:46 -07:00
Allen Pais
20a3bcfe93 net: alteon: Convert tasklet API to new bottom half workqueue mechanism
Migrate tasklet APIs to the new bottom half workqueue mechanism. It
replaces all occurrences of tasklet usage with the appropriate workqueue
APIs throughout the alteon driver. This transition ensures compatibility
with the latest design and enhances performance.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Link: https://patch.msgid.link/20240730183403.4176544-2-allen.lkml@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31 18:59:46 -07:00
Faizal Rahim
b9e7fc0aed igc: Fix double reset adapter triggered from a single taprio cmd
Following the implementation of "igc: Add TransmissionOverrun counter"
patch, when a taprio command is triggered by user, igc processes two
commands: TAPRIO_CMD_REPLACE followed by TAPRIO_CMD_STATS. However, both
commands unconditionally pass through igc_tsn_offload_apply() which
evaluates and triggers reset adapter. The double reset causes issues in
the calculation of adapter->qbv_count in igc.

TAPRIO_CMD_REPLACE command is expected to reset the adapter since it
activates qbv. It's unexpected for TAPRIO_CMD_STATS to do the same
because it doesn't configure any driver-specific TSN settings. So, the
evaluation in igc_tsn_offload_apply() isn't needed for TAPRIO_CMD_STATS.

To address this, commands parsing are relocated to
igc_tsn_enable_qbv_scheduling(). Commands that don't require an adapter
reset will exit after processing, thus avoiding igc_tsn_offload_apply().

Fixes: d3750076d4 ("igc: Add TransmissionOverrun counter")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240730173304.865479-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31 18:56:23 -07:00
Jakub Kicinski
9bb3ec18d0 Merge branch 'mlxsw-core_thermal-small-cleanups'
Petr Machata says:

====================
mlxsw: core_thermal: Small cleanups

Ido Schimmel says:

Clean up various issues which I noticed while addressing feedback on a
different patchset.
====================

Link: https://patch.msgid.link/cover.1722345311.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31 18:38:32 -07:00
Ido Schimmel
b0d2132114 mlxsw: core_thermal: Fix -Wformat-truncation warning
The name of a thermal zone device cannot be longer than 19 characters
('THERMAL_NAME_LENGTH - 1'). The format string 'mlxsw-lc%d-module%d' can
exceed this limitation if the maximum number of line cards cannot be
represented using a single digit and the maximum number of transceiver
modules cannot be represented using two digits.

This is not the case with current systems nor future ones. Therefore,
increase the size of the result buffer beyond 'THERMAL_NAME_LENGTH' and
suppress the following build warning [1].

If this limitation is ever exceeded, we will know about it since the
thermal core validates the thermal device's name during registration.

[1]
drivers/net/ethernet/mellanox/mlxsw/core_thermal.c: In function ‘mlxsw_thermal_modules_init.part.0’:
drivers/net/ethernet/mellanox/mlxsw/core_thermal.c:418:70: error: ‘%d’ directive output may be truncated writing between 1 and 3 bytes into a region of size between 2 and 4 [-Werror=format-truncation=]
  418 |                 snprintf(tz_name, sizeof(tz_name), "mlxsw-lc%d-module%d",
      |                                                                      ^~
In function ‘mlxsw_thermal_module_tz_init’,
    inlined from ‘mlxsw_thermal_module_init’ at drivers/net/ethernet/mellanox/mlxsw/core_thermal.c:465:9,
    inlined from ‘mlxsw_thermal_modules_init.part.0’ at drivers/net/ethernet/mellanox/mlxsw/core_thermal.c:500:9:
drivers/net/ethernet/mellanox/mlxsw/core_thermal.c:418:52: note: directive argument in the range [1, 256]
  418 |                 snprintf(tz_name, sizeof(tz_name), "mlxsw-lc%d-module%d",
      |                                                    ^~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlxsw/core_thermal.c:418:17: note: ‘snprintf’ output between 18 and 22 bytes into a destination of size 20
  418 |                 snprintf(tz_name, sizeof(tz_name), "mlxsw-lc%d-module%d",
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  419 |                          module_tz->slot_index, module_tz->module + 1);
      |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/583a70c6dbe75e6bf0c2c58abbb3470a860d2dc3.1722345311.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31 18:38:29 -07:00
Ido Schimmel
ec672931d1 mlxsw: core_thermal: Remove unnecessary assignments
Setting both pointers to NULL is unnecessary since the code never checks
whether these pointers are NULL or not. Remove the assignments.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/ea646f5d7642fffd5393fa23650660ab8f77a511.1722345311.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31 18:38:29 -07:00
Ido Schimmel
e7e3a450e5 mlxsw: core_thermal: Remove unnecessary checks
mlxsw_thermal_module_fini() cannot be invoked with a thermal module
which is NULL or which is not associated with a thermal zone, so remove
these checks.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/8db5fe0a3a28ba09a15d4102cc03f7e8ca7675be.1722345311.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31 18:38:28 -07:00
Ido Schimmel
e25f3040a6 mlxsw: core_thermal: Simplify rollback
During rollback, instead of calling mlxsw_thermal_module_fini() for all
the modules, only call it for modules that were successfully
initialized. This is not a bug fix since mlxsw_thermal_module_fini()
first checks that the module was initialized.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/905bebc45f6e246031f0c5c177bba8efe11e05f5.1722345311.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31 18:38:28 -07:00