Commit Graph

1371153 Commits

Author SHA1 Message Date
Gal Pressman
b4d52c6982 selftests: drv-net: Fix remote command checking in require_cmd()
The require_cmd() method was checking for command availability locally
even when remote=True was specified, due to a missing host parameter.

Fix by passing host=self.remote when checking remote command
availability, ensuring commands are verified on the correct host.

Fixes: f1e68a1a4a ("selftests: drv-net: add require_XYZ() helpers for validating env")
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250723135454.649342-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 18:52:00 -07:00
Zhu Yanjun
4335012705 net/mlx5: Fix build -Wframe-larger-than warnings
When building, the following warnings will appear.
"
pci_irq.c: In function ‘mlx5_ctrl_irq_request’:
pci_irq.c:494:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=]

pci_irq.c: In function ‘mlx5_irq_request_vector’:
pci_irq.c:561:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=]

eq.c: In function ‘comp_irq_request_sf’:
eq.c:897:1: warning: the frame size of 1080 bytes is larger than 1024 bytes [-Wframe-larger-than=]

irq_affinity.c: In function ‘irq_pool_request_irq’:
irq_affinity.c:74:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=]
"

These warnings indicate that the stack frame size exceeds 1024 bytes in
these functions.

To resolve this, instead of allocating large memory buffers on the stack,
it is better to use kvzalloc to allocate memory dynamically on the heap.
This approach reduces stack usage and eliminates these frame size warnings.

Acked-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250722212023.244296-1-yanjun.zhu@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 18:49:37 -07:00
Jakub Kicinski
89628a0ec7 Merge branch 'use-enum-to-represent-the-napi-threaded-state'
Samiullah Khawaja says:

====================
Use enum to represent the NAPI threaded state

Instead of using 0/1 to represent the NAPI threaded states use enum
(disabled/enabled) to represent the NAPI threaded states.

This patch series is a subset of patches from the following patch series:
https://lore.kernel.org/20250718232052.1266188-1-skhawaja@google.com

The first 3 patches are being sent separately as per the feedback to
replace the usage of 0/1 as NAPI threaded states with enum. See:
https://lore.kernel.org/20250721164856.1d2208e4@kernel.org
====================

Link: https://patch.msgid.link/20250723013031.2911384-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 18:34:59 -07:00
Samiullah Khawaja
8e7583a4f6 net: define an enum for the napi threaded state
Instead of using '0' and '1' for napi threaded state use an enum with
'disabled' and 'enabled' states.

Tested:
 ./tools/testing/selftests/net/nl_netdev.py
 TAP version 13
 1..7
 ok 1 nl_netdev.empty_check
 ok 2 nl_netdev.lo_check
 ok 3 nl_netdev.page_pool_check
 ok 4 nl_netdev.napi_list_check
 ok 5 nl_netdev.dev_set_threaded
 ok 6 nl_netdev.napi_set_threaded
 ok 7 nl_netdev.nsim_rxq_reset_down
 # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Link: https://patch.msgid.link/20250723013031.2911384-4-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 18:34:55 -07:00
Samiullah Khawaja
78afdadafe net: Use netif_threaded_enable instead of netif_set_threaded in drivers
Prepare for adding an enum type for NAPI threaded states by adding
netif_threaded_enable API. De-export the existing netif_set_threaded API
and only use it internally. Update existing drivers to use
netif_threaded_enable instead of the de-exported netif_set_threaded.

Note that dev_set_threaded used by mt76 debugfs file is unchanged.

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Link: https://patch.msgid.link/20250723013031.2911384-3-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 18:34:55 -07:00
Samiullah Khawaja
71c52411c5 net: Create separate gro_flush_normal function
Move multiple copies of same code snippet doing `gro_flush` and
`gro_normal_list` into separate helper function.

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250723013031.2911384-2-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 18:34:55 -07:00
Jakub Kicinski
d2002ccb47 Merge tag 'for-net-next-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

core:

 - hci_sync: fix double free in 'hci_discovery_filter_clear()'
 - hci_event: Mask data status from LE ext adv reports
 - hci_devcd_dump: fix out-of-bounds via dev_coredumpv
 - ISO: add socket option to report packet seqnum via CMSG
 - hci_event: Add support for handling LE BIG Sync Lost event
 - ISO: Support SCM_TIMESTAMPING for ISO TS
 - hci_core: Add PA_LINK to distinguish BIG sync and PA sync connections
 - hci_sock: Reset cookie to zero in hci_sock_free_cookie()

drivers:

 - btusb: Add new VID/PID 0489/e14e for MT7925
 - btusb: Add a new VID/PID 2c7c/7009 for MT7925
 - btusb: Add RTL8852BE device 0x13d3:0x3618
 - btusb: Add support for variant of RTL8851BE (USB ID 13d3:3601)
 - btusb: Add USB ID 3625:010b for TP-LINK Archer TX10UB Nano
 - btusb: QCA: Support downloading custom-made firmwares
 - btusb: Add one more ID 0x28de:0x1401 for Qualcomm WCN6855
 - nxp: add support for supply and reset
 - btnxpuart: Add support for 4M baudrate
 - btnxpuart: Correct the Independent Reset handling after FW dump
 - btnxpuart: Add uevents for FW dump and FW download complete
 - btintel: Define a macro for Intel Reset vendor command
 - btintel_pcie: Support Function level reset
 - btintel_pcie: Add support for device 0x4d76
 - btintel_pcie: Make driver wait for alive interrupt
 - btintel_pcie: Fix Alive Context State Handling
 - hci_qca: Enable ISO data packet RX

* tag 'for-net-next-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (42 commits)
  Bluetooth: Add PA_LINK to distinguish BIG sync and PA sync connections
  Bluetooth: hci_event: Mask data status from LE ext adv reports
  Bluetooth: btintel_pcie: Fix Alive Context State Handling
  Bluetooth: btintel_pcie: Make driver wait for alive interrupt
  Bluetooth: hci_devcd_dump: fix out-of-bounds via dev_coredumpv
  Bluetooth: hci_sync: fix double free in 'hci_discovery_filter_clear()'
  Bluetooth: btusb: Add one more ID 0x28de:0x1401 for Qualcomm WCN6855
  Bluetooth: btusb: Sort WCN6855 device IDs by VID and PID
  Bluetooth: btusb: QCA: Support downloading custom-made firmwares
  Bluetooth: btnxpuart: Add uevents for FW dump and FW download complete
  Bluetooth: btnxpuart: Correct the Independent Reset handling after FW dump
  Bluetooth: ISO: Support SCM_TIMESTAMPING for ISO TS
  Bluetooth: ISO: add socket option to report packet seqnum via CMSG
  Bluetooth: btintel: Define a macro for Intel Reset vendor command
  Bluetooth: Fix typos in comments
  Bluetooth: RFCOMM: Fix typos in comments
  Bluetooth: aosp: Fix typo in comment
  Bluetooth: hci_bcm4377: Fix typo in comment
  Bluetooth: btrtl: Fix typo in comment
  Bluetooth: btmtk: Fix typo in log string
  ...
====================

Link: https://patch.msgid.link/20250723190233.166823-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 18:12:54 -07:00
Jakub Kicinski
a4f5759b6f Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-07-24

We've added 3 non-merge commits during the last 3 day(s) which contain
a total of 4 files changed, 40 insertions(+), 15 deletions(-).

The main changes are:

1) Improved verifier error message for incorrect narrower load from
   pointer field in ctx, from Paul Chaignon.

2) Disabled migration in nf_hook_run_bpf to address a syzbot report,
   from Kuniyuki Iwashima.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Test invalid narrower ctx load
  bpf: Reject narrower access to pointer ctx fields
  bpf: Disable migration in nf_hook_run_bpf().
====================

Link: https://patch.msgid.link/20250724173306.3578483-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 18:02:24 -07:00
Jakub Kicinski
7dba0cc93c Merge branch 'tools-ynl-gen-print-setters-for-multi-val-attrs'
Jakub Kicinski says:

====================
tools: ynl-gen: print setters for multi-val attrs

ncdevmem seems to manually prepare the queue attributes.
This is not ideal, YNL should be providing helpers for this.
Make YNL output allocation and setter helpers for multi-val attrs.

v1: https://lore.kernel.org/20250722161927.3489203-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250723171046.4027470-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 17:28:52 -07:00
Jakub Kicinski
f70d9819c7 selftests: drv-net: devmem: use new mattr ynl helpers
Use the just-added YNL helpers instead of manually setting
"_present" bits in the queue attrs. Compile tested only.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250723171046.4027470-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 17:28:49 -07:00
Jakub Kicinski
8553fb7c55 tools: ynl-gen: print setters for multi-val attrs
For basic types we "flatten" setters. If a request "a" has a simple
nest "b" with value "val" we print helpers like:

 req_set_a_b(struct a *req, int val)
 {
   req->_present.a = 1;
   req->b._present.val = 1;
   req->b.val = ...
 }

This is not possible for multi-attr because they have to be allocated
dynamically by the user. Print "object level" setters so that user
preparing the object doesn't have to futz with the presence bits
and other YNL internals.

Add the ability to pass in the variable name to generated setters.
Using "req" here doesn't feel right, while the attr is part of a request
it's not the request itself, so it seems cleaner to call it "obj".

Example:

 static inline void
 netdev_queue_id_set_id(struct netdev_queue_id *obj, __u32 id)
 {
	obj->_present.id = 1;
	obj->id = id;
 }

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250723171046.4027470-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 17:28:49 -07:00
Jakub Kicinski
2c222dde61 tools: ynl-gen: print alloc helper for multi-val attrs
In general YNL provides allocation and free helpers for types.
For pure nested structs which are used as multi-attr (and therefore
have to be allocated dynamically) we already print a free helper
as it's needed by free of the containing struct.

Add printing of the alloc helper for consistency. The helper
takes the number of entries to allocate as an argument, e.g.:

  static inline struct netdev_queue_id *netdev_queue_id_alloc(unsigned int n)
  {
	return calloc(n, sizeof(struct netdev_queue_id));
  }

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250723171046.4027470-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 17:28:49 -07:00
Jakub Kicinski
cf58699777 tools: ynl-gen: move free printing to the print_type_full() helper
Just to avoid making the main function even more enormous,
before adding more things to print move the free printing
to a helper which already prints the type.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250723171046.4027470-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 17:28:49 -07:00
Jakub Kicinski
a8a9fd042e tools: ynl-gen: don't add suffix for pure types
Don't add _req to helper names for pure types. We don't currently
print those so it makes no difference to existing codegen.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250723171046.4027470-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 17:28:49 -07:00
Jakub Kicinski
126d85fb04 Merge tag 'wireless-next-2025-07-24' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:

====================
Another wireless update:
 - rtw89:
   - STA+P2P concurrency
   - support for USB devices RTL8851BU/RTL8852BU
 - ath9k: OF support
 - ath12k:
   - more EHT/Wi-Fi 7 features
   - encapsulation/decapsulation offload
 - iwlwifi: some FIPS interoperability
 - brcm80211: support SDIO 43751 device
 - rt2x00: better DT/OF support
 - cfg80211/mac80211:
   - improved S1G support
   - beacon monitor for MLO

* tag 'wireless-next-2025-07-24' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (199 commits)
  ssb: use new GPIO line value setter callbacks for the second GPIO chip
  wifi: Fix typos
  wifi: brcmsmac: Use str_true_false() helper
  wifi: brcmfmac: fix EXTSAE WPA3 connection failure due to AUTH TX failure
  wifi: brcm80211: Remove yet more unused functions
  wifi: brcm80211: Remove more unused functions
  wifi: brcm80211: Remove unused functions
  wifi: iwlwifi: Revert "wifi: iwlwifi: remove support of several iwl_ppag_table_cmd versions"
  wifi: iwlwifi: check validity of the FW API range
  wifi: iwlwifi: don't export symbols that we shouldn't
  wifi: iwlwifi: mld: use spec link id and not FW link id
  wifi: iwlwifi: mld: decode EOF bit for AMPDUs
  wifi: iwlwifi: Remove support for rx OMI bandwidth reduction
  wifi: iwlwifi: stop supporting iwl_omi_send_status_notif ver 1
  wifi: iwlwifi: remove SC2F firmware support
  wifi: iwlwifi: mvm: Remove NAN support
  wifi: iwlwifi: mld: avoid outdated reorder buffer head_sn
  wifi: iwlwifi: mvm: avoid outdated reorder buffer head_sn
  wifi: iwlwifi: disable certain features for fips_enabled
  wifi: iwlwifi: mld: support channel survey collection for ACS scans
  ...
====================

Link: https://patch.msgid.link/20250724100349.21564-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 17:25:42 -07:00
Jakub Kicinski
8b5a19b4ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc8).

Conflicts:

drivers/net/ethernet/microsoft/mana/gdma_main.c
  9669ddda18 ("net: mana: Fix warnings for missing export.h header inclusion")
  7553911210 ("net: mana: Allocate MSI-X vectors dynamically")
https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/ti/icssg/icssg_prueth.h
  6e86fb73de ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG")
  ffe8a49091 ("net: ti: icssg-prueth: Read firmware-names from device tree")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 11:10:46 -07:00
Linus Torvalds
407c114c98 Merge tag 'net-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from can and xfrm.

  The TI regression notified last week is actually on our net-next tree,
  it does not affect 6.16.

  We are investigating a virtio regression which is quite hard to
  reproduce - currently only our CI sporadically hits it. Hopefully it
  should not be critical, and I'm not sure that an additional week would
  be enough to solve it.

  Current release - fix to a fix:

   - sched: sch_qfq: avoid sleeping in atomic context in qfq_delete_class

  Previous releases - regressions:

   - xfrm:
      - set transport header to fix UDP GRO handling
      - delete x->tunnel as we delete x

   - eth:
      - mlx5: fix memory leak in cmd_exec()
      - i40e: when removing VF MAC filters, avoid losing PF-set MAC
      - gve: fix stuck TX queue for DQ queue format

  Previous releases - always broken:

   - can: fix NULL pointer deref of struct can_priv::do_set_mode

   - eth:
      - ice: fix a null pointer dereference in ice_copy_and_init_pkg()
      - ism: fix concurrency management in ism_cmd()
      - dpaa2: fix device reference count leak in MAC endpoint handling
      - icssg-prueth: fix buffer allocation for ICSSG

  Misc:

   - selftests: mptcp: increase code coverage"

* tag 'net-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
  net: hns3: default enable tx bounce buffer when smmu enabled
  net: hns3: fixed vf get max channels bug
  net: hns3: disable interrupt when ptp init failed
  net: hns3: fix concurrent setting vlan filter issue
  s390/ism: fix concurrency management in ism_cmd()
  selftests: drv-net: wait for iperf client to stop sending
  MAINTAINERS: Add in6.h to MAINTAINERS
  selftests: netfilter: tone-down conntrack clash test
  can: netlink: can_changelink(): fix NULL pointer deref of struct can_priv::do_set_mode
  net/sched: sch_qfq: Avoid triggering might_sleep in atomic context in qfq_delete_class
  gve: Fix stuck TX queue for DQ queue format
  net: appletalk: Fix use-after-free in AARP proxy probe
  net: bcmasp: Restore programming of TX map vector register
  selftests: mptcp: connect: also cover checksum
  selftests: mptcp: connect: also cover alt modes
  e1000e: ignore uninitialized checksum word on tgp
  e1000e: disregard NVM checksum on tgp when valid checksum bit is not set
  ice: Fix a null pointer dereference in ice_copy_and_init_pkg()
  i40e: When removing VF MAC filters, only check PF-set MAC
  i40e: report VF tx_dropped with tx_errors instead of tx_discards
  ...
2025-07-24 08:44:42 -07:00
Paolo Abeni
94619ea2d9 Merge tag 'ipsec-next-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2025-07-23

1) Optimize to hold device only for the asynchronous decryption,
   where it is really needed.
   From Jianbo Liu.

2) Align our inbund SA lookup to RFC 4301. Only SPI and protocol
   should be used for an inbound SA lookup.
   From Aakash Kumar S.

3) Skip redundant statistics update for xfrm crypto offload.
   From Jianbo Liu.

Please pull or let me know if there are problems.

* tag 'ipsec-next-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: Skip redundant statistics update for crypto offload
  xfrm: Duplicate SPI Handling
  xfrm: hold device only for the asynchronous decryption
====================

Link: https://patch.msgid.link/20250723080402.3439619-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 15:13:20 +02:00
Paolo Abeni
291d5dc80e Merge tag 'ipsec-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2025-07-23

1) Premption fixes for xfrm_state_find.
   From Sabrina Dubroca.

2) Initialize offload path also for SW IPsec GRO. This fixes a
   performance regression on SW IPsec offload.
   From Leon Romanovsky.

3) Fix IPsec UDP GRO for IKE packets.
   From Tobias Brunner,

4) Fix transport header setting for IPcomp after decompressing.
   From Fernando Fernandez Mancera.

5)  Fix use-after-free when xfrmi_changelink tries to change
    collect_md for a xfrm interface.
    From Eyal Birger .

6) Delete the special IPcomp x->tunnel state along with the state x
   to avoid refcount problems.
   From Sabrina Dubroca.

Please pull or let me know if there are problems.

* tag 'ipsec-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  Revert "xfrm: destroy xfrm_state synchronously on net exit path"
  xfrm: delete x->tunnel as we delete x
  xfrm: interface: fix use-after-free after changing collect_md xfrm interface
  xfrm: ipcomp: adjust transport header after decompressing
  xfrm: Set transport header to fix UDP GRO handling
  xfrm: always initialize offload path
  xfrm: state: use a consistent pcpu_id in xfrm_state_find
  xfrm: state: initialize state_ptrs earlier in xfrm_state_find
====================

Link: https://patch.msgid.link/20250723075417.3432644-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 12:30:40 +02:00
Paolo Abeni
89fd905dab Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'
Jijie Shao says:

====================
There are some bugfix for the HNS3 ethernet driver

v1: https://lore.kernel.org/all/20250702130901.2879031-1-shaojijie@huawei.com/
====================

Link: https://patch.msgid.link/20250722125423.1270673-1-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 11:27:28 +02:00
Jijie Shao
49ade8630f net: hns3: default enable tx bounce buffer when smmu enabled
The SMMU engine on HIP09 chip has a hardware issue.
SMMU pagetable prefetch features may prefetch and use a invalid PTE
even the PTE is valid at that time. This will cause the device trigger
fake pagefaults. The solution is to avoid prefetching by adding a
SYNC command when smmu mapping a iova. But the performance of nic has a
sharp drop. Then we do this workaround, always enable tx bounce buffer,
avoid mapping/unmapping on TX path.

This issue only affects HNS3, so we always enable
tx bounce buffer when smmu enabled to improve performance.

Fixes: 295ba232a8 ("net: hns3: add device version to replace pci revision")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722125423.1270673-5-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 11:27:22 +02:00
Jian Shen
b3e75c0bcc net: hns3: fixed vf get max channels bug
Currently, the queried maximum of vf channels is the maximum of channels
supported by each TC. However, the actual maximum of channels is
the maximum of channels supported by the device.

Fixes: 849e460776 ("net: hns3: add ethtool_ops.get_channels support for VF")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722125423.1270673-4-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 11:27:21 +02:00
Yonglong Liu
cde304655f net: hns3: disable interrupt when ptp init failed
When ptp init failed, we'd better disable the interrupt and clear the
flag, to avoid early report interrupt at next probe.

Fixes: 0bf5eb7885 ("net: hns3: add support for PTP")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722125423.1270673-3-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 11:27:21 +02:00
Jian Shen
4555f8f8b6 net: hns3: fix concurrent setting vlan filter issue
The vport->req_vlan_fltr_en may be changed concurrently by function
hclge_sync_vlan_fltr_state() called in periodic work task and
function hclge_enable_vport_vlan_filter() called by user configuration.
It may cause the user configuration inoperative. Fixes it by protect
the vport->req_vlan_fltr by vport_lock.

Fixes: 2ba306627f ("net: hns3: add support for modify VLAN filter state")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722125423.1270673-2-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 11:27:21 +02:00
Halil Pasic
897e8601b9 s390/ism: fix concurrency management in ism_cmd()
The s390x ISM device data sheet clearly states that only one
request-response sequence is allowable per ISM function at any point in
time.  Unfortunately as of today the s390/ism driver in Linux does not
honor that requirement. This patch aims to rectify that.

This problem was discovered based on Aliaksei's bug report which states
that for certain workloads the ISM functions end up entering error state
(with PEC 2 as seen from the logs) after a while and as a consequence
connections handled by the respective function break, and for future
connection requests the ISM device is not considered -- given it is in a
dysfunctional state. During further debugging PEC 3A was observed as
well.

A kernel message like
[ 1211.244319] zpci: 061a:00:00.0: Event 0x2 reports an error for PCI function 0x61a
is a reliable indicator of the stated function entering error state
with PEC 2. Let me also point out that a kernel message like
[ 1211.244325] zpci: 061a:00:00.0: The ism driver bound to the device does not support error recovery
is a reliable indicator that the ISM function won't be auto-recovered
because the ISM driver currently lacks support for it.

On a technical level, without this synchronization, commands (inputs to
the FW) may be partially or fully overwritten (corrupted) by another CPU
trying to issue commands on the same function. There is hard evidence that
this can lead to DMB token values being used as DMB IOVAs, leading to
PEC 2 PCI events indicating invalid DMA. But this is only one of the
failure modes imaginable. In theory even completely losing one command
and executing another one twice and then trying to interpret the outputs
as if the command we intended to execute was actually executed and not
the other one is also possible.  Frankly, I don't feel confident about
providing an exhaustive list of possible consequences.

Fixes: 684b89bc39 ("s390/ism: add device driver for internal shared memory")
Reported-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com>
Tested-by: Mahanta Jambigi <mjambigi@linux.ibm.com>
Tested-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722161817.1298473-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 10:57:26 +02:00
Bartosz Golaszewski
55c172c137 ssb: use new GPIO line value setter callbacks for the second GPIO chip
Because the other chip is guarded in an unlikely ifdef, I missed it when
converting this driver. Fix it now.

Fixes: 757259db79 ("ssb: use new GPIO line value setter callbacks")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://patch.msgid.link/20250723141257.51412-1-brgl@bgdev.pl
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-24 09:06:46 +02:00
Bjorn Helgaas
41469ff94c wifi: Fix typos
Fix typos in comments and error messages.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250723201741.2908456-1-helgaas@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-24 09:05:31 +02:00
Paul Chaignon
ba578b87fe selftests/bpf: Test invalid narrower ctx load
This patch adds selftests to cover invalid narrower loads on the
context. These used to cause kernel warnings before the previous patch.
To trigger the warning, the load had to be aligned, to read an affected
context field (ex., skb->sk), and not starting at the beginning of the
field.

The nine new cases all fail without the previous patch.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/44cd83ea9c6868079943f0a436c6efa850528cc1.1753194596.git.paul.chaignon@gmail.com
2025-07-23 19:35:56 -07:00
Paul Chaignon
e09299225d bpf: Reject narrower access to pointer ctx fields
The following BPF program, simplified from a syzkaller repro, causes a
kernel warning:

    r0 = *(u8 *)(r1 + 169);
    exit;

With pointer field sk being at offset 168 in __sk_buff. This access is
detected as a narrower read in bpf_skb_is_valid_access because it
doesn't match offsetof(struct __sk_buff, sk). It is therefore allowed
and later proceeds to bpf_convert_ctx_access. Note that for the
"is_narrower_load" case in the convert_ctx_accesses(), the insn->off
is aligned, so the cnt may not be 0 because it matches the
offsetof(struct __sk_buff, sk) in the bpf_convert_ctx_access. However,
the target_size stays 0 and the verifier errors with a kernel warning:

    verifier bug: error during ctx access conversion(1)

This patch fixes that to return a proper "invalid bpf_context access
off=X size=Y" error on the load instruction.

The same issue affects multiple other fields in context structures that
allow narrow access. Some other non-affected fields (for sk_msg,
sk_lookup, and sockopt) were also changed to use bpf_ctx_range_ptr for
consistency.

Note this syzkaller crash was reported in the "Closes" link below, which
used to be about a different bug, fixed in
commit fce7bd8e38 ("bpf/verifier: Handle BPF_LOAD_ACQ instructions
in insn_def_regno()"). Because syzbot somehow confused the two bugs,
the new crash and repro didn't get reported to the mailing list.

Fixes: f96da09473 ("bpf: simplify narrower ctx access")
Fixes: 0df1a55afa ("bpf: Warn on internal verifier errors")
Reported-by: syzbot+0ef84a7bdf5301d4cbec@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0ef84a7bdf5301d4cbec
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/3b8dcee67ff4296903351a974ddd9c4dca768b64.1753194596.git.paul.chaignon@gmail.com
2025-07-23 19:33:49 -07:00
Kuniyuki Iwashima
17ce3e5949 bpf: Disable migration in nf_hook_run_bpf().
syzbot reported that the netfilter bpf prog can be called without
migration disabled in xmit path.

Then the assertion in __bpf_prog_run() fails, triggering the splat
below. [0]

Let's use bpf_prog_run_pin_on_cpu() in nf_hook_run_bpf().

[0]:
BUG: assuming non migratable context at ./include/linux/filter.h:703
in_atomic(): 0, irqs_disabled(): 0, migration_disabled() 0 pid: 5829, name: sshd-session
3 locks held by sshd-session/5829:
 #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1667 [inline]
 #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x20/0x50 net/ipv4/tcp.c:1395
 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline]
 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x69/0x26c0 net/ipv4/ip_output.c:470
 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline]
 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: nf_hook+0xb2/0x680 include/linux/netfilter.h:241
CPU: 0 UID: 0 PID: 5829 Comm: sshd-session Not tainted 6.16.0-rc6-syzkaller-00002-g155a3c003e55 #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x16c/0x1f0 lib/dump_stack.c:120
 __cant_migrate kernel/sched/core.c:8860 [inline]
 __cant_migrate+0x1c7/0x250 kernel/sched/core.c:8834
 __bpf_prog_run include/linux/filter.h:703 [inline]
 bpf_prog_run include/linux/filter.h:725 [inline]
 nf_hook_run_bpf+0x83/0x1e0 net/netfilter/nf_bpf_link.c:20
 nf_hook_entry_hookfn include/linux/netfilter.h:157 [inline]
 nf_hook_slow+0xbb/0x200 net/netfilter/core.c:623
 nf_hook+0x370/0x680 include/linux/netfilter.h:272
 NF_HOOK_COND include/linux/netfilter.h:305 [inline]
 ip_output+0x1bc/0x2a0 net/ipv4/ip_output.c:433
 dst_output include/net/dst.h:459 [inline]
 ip_local_out net/ipv4/ip_output.c:129 [inline]
 __ip_queue_xmit+0x1d7d/0x26c0 net/ipv4/ip_output.c:527
 __tcp_transmit_skb+0x2686/0x3e90 net/ipv4/tcp_output.c:1479
 tcp_transmit_skb net/ipv4/tcp_output.c:1497 [inline]
 tcp_write_xmit+0x1274/0x84e0 net/ipv4/tcp_output.c:2838
 __tcp_push_pending_frames+0xaf/0x390 net/ipv4/tcp_output.c:3021
 tcp_push+0x225/0x700 net/ipv4/tcp.c:759
 tcp_sendmsg_locked+0x1870/0x42b0 net/ipv4/tcp.c:1359
 tcp_sendmsg+0x2e/0x50 net/ipv4/tcp.c:1396
 inet_sendmsg+0xb9/0x140 net/ipv4/af_inet.c:851
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg net/socket.c:727 [inline]
 sock_write_iter+0x4aa/0x5b0 net/socket.c:1131
 new_sync_write fs/read_write.c:593 [inline]
 vfs_write+0x6c7/0x1150 fs/read_write.c:686
 ksys_write+0x1f8/0x250 fs/read_write.c:738
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe7d365d407
Code: 48 89 fa 4c 89 df e8 38 aa 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
RSP:

Fixes: fd9c663b9a ("bpf: minimal support for programs hooked into netfilter framework")
Reported-by: syzbot+40f772d37250b6d10efc@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6879466d.a00a0220.3af5df.0022.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Tested-by: syzbot+40f772d37250b6d10efc@syzkaller.appspotmail.com
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250722224041.112292-1-kuniyu@google.com
2025-07-23 18:58:03 -07:00
Linus Torvalds
25fae0b93d Merge tag 'drm-fixes-2025-07-24' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "This might just be part one, but I'm sending it a bit early as it has
  two sets of reverts for regressions, one is all the gem/dma-buf
  handling and another was a nouveau ioctl change.

  Otherwise there is an amdgpu fix, nouveau fix and a scheduler fix.

  If any other changes come in I'll follow up with another more usual
  Fri/Sat MR.

  gem:
   - revert all the dma-buf/gem changes as there as lifetime issues
     with them

  nouveau:
   - revert an ioctl change as it causes issues
   - fix NULL ptr on fermi

  bridge:
   - remove extra semicolon

  sched:
   - remove hang causing optimisation

  amdgpu:
   - fix garbage in cleared vram after resume"

* tag 'drm-fixes-2025-07-24' of https://gitlab.freedesktop.org/drm/kernel:
  drm/bridge: ti-sn65dsi86: Remove extra semicolon in ti_sn_bridge_probe()
  Revert "drm/nouveau: check ioctl command codes better"
  drm/nouveau/nvif: fix null ptr deref on pre-fermi boards
  Revert "drm/gem-dma: Use dma_buf from GEM object instance"
  Revert "drm/gem-shmem: Use dma_buf from GEM object instance"
  Revert "drm/gem-framebuffer: Use dma_buf from GEM object instance"
  Revert "drm/prime: Use dma_buf from GEM object instance"
  Revert "drm/etnaviv: Use dma_buf from GEM object instance"
  Revert "drm/vmwgfx: Use dma_buf from GEM object instance"
  Revert "drm/virtio: Use dma_buf from GEM object instance"
  drm/sched: Remove optimization that causes hang when killing dependent jobs
  drm/amdgpu: Reset the clear flag in buddy during resume
2025-07-23 18:56:24 -07:00
Nimrod Oren
8694138250 selftests: drv-net: wait for iperf client to stop sending
A few packets may still be sent out during the termination of iperf
processes. These late packets cause failures in rss_ctx.py when they
arrive on queues expected to be empty.

Example failure observed:

  Check failed 2 != 0 traffic on inactive queues (context 1):
    [0, 0, 1, 1, 386385, 397196, 0, 0, 0, 0, ...]

  Check failed 4 != 0 traffic on inactive queues (context 2):
    [0, 0, 0, 0, 2, 2, 247152, 253013, 0, 0, ...]

  Check failed 2 != 0 traffic on inactive queues (context 3):
    [0, 0, 0, 0, 0, 0, 1, 1, 282434, 283070, ...]

To avoid such failures, wait until all client sockets for the requested
port are either closed or in the TIME_WAIT state.

Fixes: 847aa551fa ("selftests: drv-net: rss_ctx: factor out send traffic and check")
Signed-off-by: Nimrod Oren <noren@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722122655.3194442-1-noren@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 18:52:12 -07:00
Jakub Kicinski
8aad37d16c Merge branch 'dualpi2-patch'
Chia-Yu Chang says:

====================
DUALPI2 patch

This patch serise adds DualPI Improved with a Square (DualPI2) with
following features:
 * Supports congestion controls that comply with the Prague requirements
   in RFC9331 (e.g. TCP-Prague)
 * Coupled dual-queue that separates the L4S traffic in a low latency
   queue (L-queue), without harming remaining traffic that is scheduled
   in classic queue (C-queue) due to congestion-coupling using PI2
   as defined in RFC9332
 * Configurable overload strategies
 * Use of sojourn time to reliably estimate queue delay
 * Supports ECN L4S-identifier (IP.ECN==0b*1) to classify traffic into
   respective queues

For more details of DualPI2, please refer IETF RFC9332
(https://datatracker.ietf.org/doc/html/rfc9332).
====================

Link: https://patch.msgid.link/20250722095915.24485-1-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:52:11 -07:00
Chia-Yu Chang
68db0ff2f7 Documentation: netlink: specs: tc: Add DualPI2 specification
Introduce the specification of tc qdisc DualPI2 stats and attributes,
which is the reference implementation of IETF RFC9332 DualQ Coupled AQM
(https://datatracker.ietf.org/doc/html/rfc9332) providing two different
queues: low latency queue (L-queue) and classic queue (C-queue).

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Link: https://patch.msgid.link/20250722095915.24485-7-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:52:09 -07:00
Chia-Yu Chang
032f0e9e15 selftests/tc-testing: Add selftests for qdisc DualPI2
Update configuration of tc-tests and preload DualPI2 module for self-tests,
and add following self-test cases for DualPI2:

  Test a4c7: Create DualPI2 with default setting
  Test 1ea4: Create DualPI2 with memlimit
  Test 2130: Create DualPI2 with typical_rtt and max_rtt
  Test 90c1: Create DualPI2 with max_rtt
  Test 7b3c: Create DualPI2 with any_ect option
  Test 49a3: Create DualPI2 with overflow option
  Test d0a1: Create DualPI2 with drop_enqueue option
  Test f051: Create DualPI2 with no_split_gso option
  Test 456b: Create DualPI2 with packet step_thresh
  Test 610c: Create DualPI2 with packet min_qlen_step
  Test b4fa: Create DualPI2 with packet coupling_factor
  Test 37f1: Create DualPI2 with packet classic_protection

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250722095915.24485-6-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:52:08 -07:00
Chia-Yu Chang
51217c659e selftests/tc-testing: Fix warning and style check on tdc.sh
Replace exit code check with '! cmd' and add both quote and $(...)
around 'nproc' to prevent warning and issue reported by shellcheck.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Link: https://patch.msgid.link/20250722095915.24485-5-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:52:08 -07:00
Koen De Schepper
8f9516daed sched: Add enqueue/dequeue of dualpi2 qdisc
DualPI2 provides L4S-type low latency & loss to traffic that uses a
scalable congestion controller (e.g. TCP-Prague, DCTCP) without
degrading the performance of 'classic' traffic (e.g. Reno,
Cubic etc.). It is to be the reference implementation of IETF RFC9332
DualQ Coupled AQM (https://datatracker.ietf.org/doc/html/rfc9332).

Note that creating two independent queues cannot meet the goal of
DualPI2 mentioned in RFC9332: "...to preserve fairness between
ECN-capable and non-ECN-capable traffic." Further, it could even
lead to starvation of Classic traffic, which is also inconsistent
with the requirements in RFC9332: "...although priority MUST be
bounded in order not to starve Classic traffic." DualPI2 is
designed to maintain approximate per-flow fairness on L-queue and
C-queue by forming a single qdisc using the coupling factor and
scheduler between two queues.

The qdisc provides two queues called low latency and classic. It
classifies packets based on the ECN field in the IP headers. By
default it directs non-ECN and ECT(0) into the classic queue and
ECT(1) and CE into the low latency queue, as per the IETF spec.

Each queue runs its own AQM:
* The classic AQM is called PI2, which is similar to the PIE AQM but
  more responsive and simpler. Classic traffic requires a decent
  target queue (default 15ms for Internet deployment) to fully
  utilize the link and to avoid high drop rates.
* The low latency AQM is, by default, a very shallow ECN marking
  threshold (1ms) similar to that used for DCTCP.

The DualQ isolates the low queuing delay of the Low Latency queue
from the larger delay of the 'Classic' queue. However, from a
bandwidth perspective, flows in either queue will share out the link
capacity as if there was just a single queue. This bandwidth pooling
effect is achieved by coupling together the drop and ECN-marking
probabilities of the two AQMs.

The PI2 AQM has two main parameters in addition to its target delay.
The integral gain factor alpha is used to slowly correct any persistent
standing queue error from the target delay, while the proportional gain
factor beta is used to quickly compensate for queue changes (growth or
shrinkage). Either alpha and beta are given as a parameter, or they can
be calculated by tc from alternative typical and maximum RTT parameters.

Internally, the output of a linear Proportional Integral (PI)
controller is used for both queues. This output is squared to
calculate the drop or ECN-marking probability of the classic queue.
This counterbalances the square-root rate equation of Reno/Cubic,
which is the trick that balances flow rates across the queues. For
the ECN-marking probability of the low latency queue, the output of
the base AQM is multiplied by a coupling factor. This determines the
balance between the flow rates in each queue. The default setting
makes the flow rates roughly equal, which should be generally
applicable.

If DUALPI2 AQM has detected overload (due to excessive non-responsive
traffic in either queue), it will switch to signaling congestion
solely using drop, irrespective of the ECN field. Alternatively, it
can be configured to limit the drop probability and let the queue
grow and eventually overflow (like tail-drop).

GSO splitting in DUALPI2 is configurable from userspace while the
default behavior is to split gso. When running DUALPI2 at unshaped
10gigE with 4 download streams test, splitting gso apart results in
halving the latency with no loss in throughput:

Summary of tcp_4down run 'no_split_gso':
                         avg         median      # data pts
 Ping (ms) ICMP   :       0.53      0.30 ms         350
 TCP download avg :    2326.86       N/A Mbits/s    350
 TCP download sum :    9307.42       N/A Mbits/s    350
 TCP download::1  :    2672.99   2568.73 Mbits/s    350
 TCP download::2  :    2586.96   2570.51 Mbits/s    350
 TCP download::3  :    1786.26   1798.82 Mbits/s    350
 TCP download::4  :    2261.21   2309.49 Mbits/s    350

Summart of tcp_4down run 'split_gso':
                         avg          median      # data pts
 Ping (ms) ICMP   :       0.22      0.23 ms         350
 TCP download avg :    2335.02       N/A Mbits/s    350
 TCP download sum :    9340.09       N/A Mbits/s    350
 TCP download::1  :    2335.30   2334.22 Mbits/s    350
 TCP download::2  :    2334.72   2334.20 Mbits/s    350
 TCP download::3  :    2335.28   2334.58 Mbits/s    350
 TCP download::4  :    2334.79   2334.39 Mbits/s    350

A similar result is observed when running DUALPI2 at unshaped 1gigE
with 1 download stream test:

Summary of tcp_1down run 'no_split_gso':
                         avg         median      # data pts
 Ping (ms) ICMP :         1.13      1.25 ms         350
 TCP download   :       941.41    941.46 Mbits/s    350

Summart of tcp_1down run 'split_gso':
                         avg         median      # data pts
 Ping (ms) ICMP :         0.51      0.55 ms         350
 TCP download   :       941.41    941.45 Mbits/s    350

Additional details can be found in the draft:
  https://datatracker.ietf.org/doc/html/rfc9332

Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Co-developed-by: Olga Albisser <olga@albisser.org>
Signed-off-by: Olga Albisser <olga@albisser.org>
Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Co-developed-by: Henrik Steen <henrist@henrist.net>
Signed-off-by: Henrik Steen <henrist@henrist.net>
Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Bob Briscoe <research@bobbriscoe.net>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Acked-by: Dave Taht <dave.taht@gmail.com>
Link: https://patch.msgid.link/20250722095915.24485-4-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:52:07 -07:00
Chia-Yu Chang
d4de8bffbe sched: Dump configuration and statistics of dualpi2 qdisc
The configuration and statistics dump of the DualPI2 Qdisc provides
information related to both queues, such as packet numbers and queuing
delays in the L-queue and C-queue, as well as general information such as
probability value, WRR credits, memory usage, packet marking counters, max
queue size, etc.

The following patch includes enqueue/dequeue for DualPI2.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Link: https://patch.msgid.link/20250722095915.24485-3-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:52:07 -07:00
Chia-Yu Chang
320d031ad6 sched: Struct definition and parsing of dualpi2 qdisc
DualPI2 is the reference implementation of IETF RFC9332 DualQ Coupled
AQM (https://datatracker.ietf.org/doc/html/rfc9332) providing two
queues called low latency (L-queue) and classic (C-queue). By default,
it enqueues non-ECN and ECT(0) packets into the C-queue and ECT(1) and
CE packets into the low latency queue (L-queue), as per IETF RFC9332 spec.

This patch defines the dualpi2 Qdisc structure and parsing, and the
following two patches include dumping and enqueue/dequeue for the DualPI2.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Link: https://patch.msgid.link/20250722095915.24485-2-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:52:07 -07:00
Jakub Kicinski
1cdf3f2d8f Merge branch 'split-netmem-from-struct-page'
Byungchul Park says:

====================
Split netmem from struct page

The MM subsystem is trying to reduce struct page to a single pointer.
See the following link for your information:

   https://kernelnewbies.org/MatthewWilcox/Memdescs/Path

The first step towards that is splitting struct page by its individual
users, as has already been done with folio and slab.  This patchset does
that for page pool.

Matthew Wilcox tried and stopped the same work, you can see in:

   https://lore.kernel.org/20230111042214.907030-1-willy@infradead.org

I focused on removing the page pool members in struct page this time,
not moving the allocation code of page pool from net to mm.  It can be
done later if needed.
====================

Link: https://patch.msgid.link/20250721021835.63939-1-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:47:01 -07:00
Byungchul Park
9dfd871a3e libeth: xdp: access ->pp through netmem_desc instead of page
To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make xdp access ->pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-13-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:46:56 -07:00
Byungchul Park
c0bcfabd77 net: ti: icssg-prueth: access ->pp through netmem_desc instead of page
To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make icssg-prueth access ->pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-12-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:46:55 -07:00
Byungchul Park
5445a5f712 mlx5: access ->pp through netmem_desc instead of page
To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make mlx5 access ->pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-11-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:46:55 -07:00
Byungchul Park
fc16f6a587 idpf: access ->pp through netmem_desc instead of page
To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make idpf access ->pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-10-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:46:55 -07:00
Byungchul Park
c8d6830e32 iavf: access ->pp through netmem_desc instead of page
To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make iavf access ->pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-9-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:46:55 -07:00
Byungchul Park
58831a1785 octeontx2-pf: access ->pp through netmem_desc instead of page
To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make octeontx2-pf access ->pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-8-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:46:55 -07:00
Byungchul Park
65589e860a net: fec: access ->pp through netmem_desc instead of page
To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make fec access ->pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-7-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:46:55 -07:00
Byungchul Park
87dda483e6 mt76: access ->pp through netmem_desc instead of page
To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make mt76 access ->pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-6-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:46:54 -07:00
Byungchul Park
6fd824342a netdevsim: access ->pp through netmem_desc instead of page
To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make netdevsim access ->pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-5-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:46:54 -07:00
Byungchul Park
89ade7c730 netmem, mlx4: access ->pp_ref_count through netmem_desc instead of page
To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make mlx4 access ->pp_ref_count through netmem_desc instead of page.

While at it, add a helper, pp_page_to_nmdesc() and __pp_page_to_nmdesc(),
that can be used to get netmem_desc from page only if it's a pp page.
For now that netmem_desc overlays on page, it can be achieved by just
casting, and use macro and _Generic to cover const casting as well.

Plus, change page_pool_page_is_pp() to check for 'const struct page *'
instead of 'struct page *' since it doesn't modify data and additionally
covers const type.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-4-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 17:46:54 -07:00