Commit Graph

1382144 Commits

Author SHA1 Message Date
Linus Torvalds
3cfcd57def Merge tag '6.17-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fix from Steve French:
 "Fix for netfs smb3 oops"

* tag '6.17-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix oops due to uninitialised variable
2025-08-22 09:02:32 -04:00
Linus Torvalds
e86ba12cf8 Merge tag 'nfs-for-6.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fix from Trond Myklebust:

 - NFS: Fix a data corrupting race when updating an existing write

* tag 'nfs-for-6.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix a race when updating an existing write
2025-08-22 08:58:58 -04:00
Linus Torvalds
6eba757ce9 Merge tag 'mm-hotfixes-stable-2025-08-21-18-17' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "20 hotfixes. 10 are cc:stable and the remainder address post-6.16
  issues or aren't considered necessary for -stable kernels. 17 of these
  fixes are for MM.

  As usual, singletons all over the place, apart from a three-patch
  series of KHO followup work from Pasha which is actually also a bunch
  of singletons"

* tag 'mm-hotfixes-stable-2025-08-21-18-17' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/mremap: fix WARN with uffd that has remap events disabled
  mm/damon/sysfs-schemes: put damos dests dir after removing its files
  mm/migrate: fix NULL movable_ops if CONFIG_ZSMALLOC=m
  mm/damon/core: fix damos_commit_filter not changing allow
  mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
  MAINTAINERS: mark MGLRU as maintained
  mm: rust: add page.rs to MEMORY MANAGEMENT - RUST
  iov_iter: iterate_folioq: fix handling of offset >= folio size
  selftests/damon: fix selftests by installing drgn related script
  .mailmap: add entry for Easwar Hariharan
  selftests/mm: add test for invalid multi VMA operations
  mm/mremap: catch invalid multi VMA moves earlier
  mm/mremap: allow multi-VMA move when filesystem uses thp_get_unmapped_area
  mm/damon/core: fix commit_ops_filters by using correct nth function
  tools/testing: add linux/args.h header and fix radix, VMA tests
  mm/debug_vm_pgtable: clear page table entries at destroy_args()
  squashfs: fix memory leak in squashfs_fill_super
  kho: warn if KHO is disabled due to an error
  kho: mm: don't allow deferred struct page with KHO
  kho: init new_physxa->phys_bits to fix lockdep
2025-08-22 08:54:34 -04:00
Linus Torvalds
3957a57201 Merge tag 'cgroup-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:

 - Fix NULL de-ref in css_rstat_exit() which could happen after
   allocation failure

 - Fix a cpuset partition handling bug and a couple other misc issues

 - Doc spelling fix

* tag 'cgroup-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  docs: cgroup: fixed spelling mistakes in documentation
  cgroup: avoid null de-ref in css_rstat_exit()
  cgroup/cpuset: Remove the unnecessary css_get/put() in cpuset_partition_write()
  cgroup/cpuset: Fix a partition error with CPU hotplug
  cgroup/cpuset: Use static_branch_enable_cpuslocked() on cpusets_insane_config_key
2025-08-21 16:31:27 -04:00
Linus Torvalds
9a36b58a88 Merge tag 'spi-fix-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
 "A small collection of fixes that came in during the past week, a few
  driver specifics plus one fix for the spi-mem core where we weren't
  taking account of the frequency capabilities of the system when
  determining if it can support an operation"

* tag 'spi-fix-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: st: fix PM macros to use CONFIG_PM instead of CONFIG_PM_SLEEP
  spi: spi-qpic-snand: fix calculating of ECC OOB regions' properties
  spi: spi-fsl-lpspi: Clamp too high speed_hz
  spi: spi-mem: add spi_mem_adjust_op_freq() in spi_mem_supports_op()
  spi: spi-mem: Add missing kdoc argument
  spi: spi-qpic-snand: use correct CW_PER_PAGE value for OOB write
2025-08-21 16:28:00 -04:00
Linus Torvalds
f43e6ba0b4 Merge tag 'regulator-fix-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
 "A couple of fairly minor device specific fixes that came in over the
  past week or so, plus the addition of an actual maintainer for the
  IR38060"

* tag 'regulator-fix-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: tps65219: regulator: tps65219: Fix error codes in probe()
  regulator: pca9450: Use devm_register_sys_off_handler
  regulator: dt-bindings: infineon,ir38060: Add Guenter as maintainer from IBM
2025-08-21 16:26:18 -04:00
Linus Torvalds
f70e1e7980 Merge tag 'acpi-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
 "These fix three new issues in the ACPI APEI error injection code and
  an ACPI platform firmware runtime update interface issue:

   - Make ACPI APEI error injection check the version of the request
     when mapping the EINJ parameter structure in the BIOS reserved
     memory to prevent injecting errors based on an uninitialized
     field (Tony Luck)

   - Fix potential NULL dereference in __einj_error_inject() that may
     occur when memory allocation fails (Charles Han)

   - Remove the __exit annotation from einj_remove(), so it can be
     called on errors during faux device probe (Uwe Kleine-König)

   - Use a security-version-number check instead of a runtime version
     check during ACPI platform firmware runtime driver updates to
     prevent those updates from failing due to false-positive driver
     version check failures (Chen Yu)"

* tag 'acpi-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: pfr_update: Fix the driver update version check
  ACPI: APEI: EINJ: Fix resource leak by remove callback in .exit.text
  ACPI: APEI: EINJ: fix potential NULL dereference in __einj_error_inject()
  ACPI: APEI: EINJ: Check if user asked for EINJV2 injection
2025-08-21 16:07:15 -04:00
Linus Torvalds
26d6ed49cd Merge tag 'pm-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These fix a cpuidle menu governor issue and two issues in the cpupower
  utility:

   - Prevent the menu cpuidle governor from selecting idle states with
     exit latency exceeding the current PM QoS limit after stopping the
     scheduler tick (Rafael Wysocki)

   - Make the set subcommand's -t option in the cpupower utility work as
     documented and allow it to control the CPU boost feature of cpufreq
     beyond x86 (Shinji Nomoto)"

* tag 'pm-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: governors: menu: Avoid selecting states with too much latency
  cpupower: Allow control of boost feature on non-x86 based systems with boost support.
  cpupower: Fix a bug where the -t option of the set subcommand was not working.
2025-08-21 16:04:58 -04:00
Linus Torvalds
d72052ac09 Merge tag 'sched_ext-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:

 - Fix a subtle bug during SCX enabling where a dead task skips init
   but doesn't skip sched class switch leading to invalid task state
   transition warning

 - Cosmetic fix in selftests

* tag 'sched_ext-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  selftests/sched_ext: Remove duplicate sched.h header
  sched/ext: Fix invalid task state transitions on class switch
2025-08-21 16:02:35 -04:00
Rafael J. Wysocki
670b51121e Merge branches 'acpi-apei' and 'acpi-pfrut'
Merge ACPI APEI fixes and an ACPI platform firmware runtime update fix
for 6.17-rc3.

* acpi-apei:
  ACPI: APEI: EINJ: Fix resource leak by remove callback in .exit.text
  ACPI: APEI: EINJ: fix potential NULL dereference in __einj_error_inject()
  ACPI: APEI: EINJ: Check if user asked for EINJV2 injection

* acpi-pfrut:
  ACPI: pfr_update: Fix the driver update version check
2025-08-21 20:47:23 +02:00
Rafael J. Wysocki
094a7c318b Merge branch 'pm-cpuidle'
Merge a menu governor fix for 6.17-rc3

* pm-cpuidle:
  cpuidle: governors: menu: Avoid selecting states with too much latency
2025-08-21 20:32:25 +02:00
Linus Torvalds
6439a0e64c Merge tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from Bluetooth.

  Current release - fix to a fix:

   - usb: asix_devices: fix PHY address mask in MDIO bus initialization

  Current release - regressions:

   - Bluetooth: fixes for the split between BIS_LINK and PA_LINK

   - Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN
     flag", breaks compatibility with some existing device tree blobs

   - dsa: b53: fix reserved register access in b53_fdb_dump()

  Current release - new code bugs:

   - sched: dualpi2: run probability update timer in BH to avoid
     deadlock

   - eth: libwx: fix the size in RSS hash key population

   - pse-pd: pd692x0: improve power budget error paths and handling

  Previous releases - regressions:

   - tls: fix handling of zero-length records on the rx_list

   - hsr: reject HSR frame if skb can't hold tag

   - bonding: fix negotiation flapping in 802.3ad passive mode

  Previous releases - always broken:

   - gso: forbid IPv6 TSO with extensions on devices with only IPV6_CSUM

   - sched: make cake_enqueue return NET_XMIT_CN when past buffer_limit,
     avoid packet drops with low buffer_limit, remove unnecessary WARN()

   - sched: fix backlog accounting after modifying config of a qdisc in
     the middle of the hierarchy

   - mptcp: improve handling of skb extension allocation failures

   - eth: mlx5:
       - fixes for the "HW Steering" flow management method
       - fixes for QoS and device buffer management"

* tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
  netfilter: nf_reject: don't leak dst refcount for loopback packets
  net/mlx5e: Preserve shared buffer capacity during headroom updates
  net/mlx5e: Query FW for buffer ownership
  net/mlx5: Restore missing scheduling node cleanup on vport enable failure
  net/mlx5: Fix QoS reference leak in vport enable error path
  net/mlx5: Destroy vport QoS element when no configuration remains
  net/mlx5e: Preserve tc-bw during parent changes
  net/mlx5: Remove default QoS group and attach vports directly to root TSAR
  net/mlx5: Base ECVF devlink port attrs from 0
  net: pse-pd: pd692x0: Skip power budget configuration when undefined
  net: pse-pd: pd692x0: Fix power budget leak in manager setup error path
  Octeontx2-af: Skip overlap check for SPI field
  selftests: tls: add tests for zero-length records
  tls: fix handling of zero-length records on the rx_list
  net: airoha: ppe: Do not invalid PPE entries in case of SW hash collision
  selftests: bonding: add test for passive LACP mode
  bonding: send LACPDUs periodically in passive mode after receiving partner's LACPDU
  bonding: update LACP activity flag after setting lacp_active
  Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag"
  ipv6: sr: Fix MAC comparison to be constant-time
  ...
2025-08-21 13:51:15 -04:00
Florian Westphal
91a79b7922 netfilter: nf_reject: don't leak dst refcount for loopback packets
recent patches to add a WARN() when replacing skb dst entry found an
old bug:

WARNING: include/linux/skbuff.h:1165 skb_dst_check_unset include/linux/skbuff.h:1164 [inline]
WARNING: include/linux/skbuff.h:1165 skb_dst_set include/linux/skbuff.h:1210 [inline]
WARNING: include/linux/skbuff.h:1165 nf_reject_fill_skb_dst+0x2a4/0x330 net/ipv4/netfilter/nf_reject_ipv4.c:234
[..]
Call Trace:
 nf_send_unreach+0x17b/0x6e0 net/ipv4/netfilter/nf_reject_ipv4.c:325
 nft_reject_inet_eval+0x4bc/0x690 net/netfilter/nft_reject_inet.c:27
 expr_call_ops_eval net/netfilter/nf_tables_core.c:237 [inline]
 ..

This is because blamed commit forgot about loopback packets.
Such packets already have a dst_entry attached, even at PRE_ROUTING stage.

Instead of checking hook just check if the skb already has a route
attached to it.

Fixes: f53b9b0bdc ("netfilter: introduce support for reject at prerouting stage")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250820123707.10671-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 10:02:00 -07:00
Jakub Kicinski
1b78236a05 Merge branch 'mlx5-misx-fixes-2025-08-20'
Mark Bloch says:

====================
mlx5 misx fixes 2025-08-20

This patchset provides misc bug fixes from the team to the mlx5
core and Eth drivers.

v1: https://lore.kernel.org/1755095476-414026-1-git-send-email-tariqt@nvidia.com
====================

Link: https://patch.msgid.link/20250820133209.389065-1-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:58:36 -07:00
Armen Ratner
8b0587a885 net/mlx5e: Preserve shared buffer capacity during headroom updates
When port buffer headroom changes, port_update_shared_buffer()
recalculates the shared buffer size and splits it in a 3:1 ratio
(lossy:lossless) - Currently, the calculation is:
lossless = shared / 4;
lossy = (shared / 4) * 3;

Meaning, the calculation dropped the remainder of shared % 4 due to
integer division, unintentionally reducing the total shared buffer
by up to three cells on each update. Over time, this could shrink
the buffer below usable size.

Fix it by changing the calculation to:
lossless = shared / 4;
lossy = shared - lossless;

This retains all buffer cells while still approximating the
intended 3:1 split, preventing capacity loss over time.

While at it, perform headroom calculations in units of cells rather than
in bytes for more accurate calculations avoiding extra divisions.

Fixes: a440030d89 ("net/mlx5e: Update shared buffer along with device buffer changes")
Signed-off-by: Armen Ratner <armeng@nvidia.com>
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20250820133209.389065-9-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:58:33 -07:00
Alexei Lazar
451d2849ea net/mlx5e: Query FW for buffer ownership
The SW currently saves local buffer ownership when setting
the buffer.
This means that the SW assumes it has ownership of the buffer
after the command is set.

If setting the buffer fails and we remain in FW ownership,
the local buffer ownership state incorrectly remains as SW-owned.
This leads to incorrect behavior in subsequent PFC commands,
causing failures.

Instead of saving local buffer ownership in SW,
query the FW for buffer ownership when setting the buffer.
This ensures that the buffer ownership state is accurately
reflected, avoiding the issues caused by incorrect ownership
states.

Fixes: ecdf2dadee ("net/mlx5e: Receive buffer support for DCBX")
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-8-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:58:32 -07:00
Carolina Jubran
51b17c98e3 net/mlx5: Restore missing scheduling node cleanup on vport enable failure
Restore the __esw_qos_free_node() call removed by the offending commit.

Fixes: 97733d1e00 ("net/mlx5: Add traffic class scheduling support for vport QoS")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-7-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:58:32 -07:00
Carolina Jubran
3c114fb2af net/mlx5: Fix QoS reference leak in vport enable error path
Add missing esw_qos_put() call when __esw_qos_alloc_node() fails in
mlx5_esw_qos_vport_enable().

Fixes: be034baba8 ("net/mlx5: Make vport QoS enablement more flexible for future extensions")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-6-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:58:31 -07:00
Carolina Jubran
b697ef4d1d net/mlx5: Destroy vport QoS element when no configuration remains
If a VF has been configured and the user later clears all QoS settings,
the vport element remains in the firmware QoS tree. This leads to
inconsistent behavior compared to VFs that were never configured, since
the FW assumes that unconfigured VFs are outside the QoS hierarchy.
As a result, the bandwidth share across VFs may differ, even though
none of them appear to have any configuration.

Align the driver behavior with the FW expectation by destroying the
vport QoS element when all configurations are removed.

Fixes: c9497c9890 ("net/mlx5: Add support for setting VF min rate")
Fixes: cf7e73770d ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20250820133209.389065-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:58:31 -07:00
Carolina Jubran
e8f973576c net/mlx5e: Preserve tc-bw during parent changes
When changing parent of a node/leaf with tc-bw configured, the code
saves and restores tc-bw values. However, it was reading the converted
hardware bw_share values (where 0 becomes 1) instead of the original
user values, causing incorrect tc-bw calculations after parent change.

Store original tc-bw values in the node structure and use them directly
for save/restore operations.

Fixes: cf7e73770d ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-4-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:58:31 -07:00
Carolina Jubran
330f0f6713 net/mlx5: Remove default QoS group and attach vports directly to root TSAR
Currently, the driver creates a default group (`node0`) and attaches
all vports to it unless the user explicitly sets a parent group. As a
result, when a user configures tx_share on a group and tx_share on
a VF, the expectation is for the group and the VF to share bandwidth
relatively. However, since the VF is not connected to the same parent
(but to the default node), the proportional share logic is not applied
correctly.

To fix this, remove the default group (`node0`) and instead connect
vports directly to the root TSAR when no parent is specified. This
ensures that vports and groups share the same root scheduler and their
tx_share values are compared directly under the same hierarchy.

Fixes: 0fe132eac3 ("net/mlx5: E-switch, Allow to add vports to rate groups")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:58:30 -07:00
Daniel Jurgens
bc17455bc8 net/mlx5: Base ECVF devlink port attrs from 0
Adjust the vport number by the base ECVF vport number so the port
attributes start at 0. Previously the port attributes would start 1
after the maximum number of host VFs.

Fixes: dc13180824 ("net/mlx5: Enable devlink port for embedded cpu VF vports")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:58:30 -07:00
Kory Maincent
7ef353879f net: pse-pd: pd692x0: Skip power budget configuration when undefined
If the power supply's power budget is not defined in the device tree,
the current code still requests power and configures the PSE manager
with a 0W power limit, which is undesirable behavior.

Skip power budget configuration entirely when the budget is zero,
avoiding unnecessary power requests and preventing invalid 0W limits
from being set on the PSE manager.

Fixes: 359754013e ("net: pse-pd: pd692x0: Add support for PSE PI priority feature")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250820133321.841054-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:56:08 -07:00
Kory Maincent
1c67f9c54c net: pse-pd: pd692x0: Fix power budget leak in manager setup error path
Fix a resource leak where manager power budgets were freed on both
success and error paths during manager setup. Power budgets should
only be freed on error paths after regulator registration or during
driver removal.

Refactor cleanup logic by extracting OF node cleanup and power budget
freeing into separate helper functions for better maintainability.

Fixes: 359754013e ("net: pse-pd: pd692x0: Add support for PSE PI priority feature")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250820132708.837255-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:55:40 -07:00
Hariprasad Kelam
8c5d95988c Octeontx2-af: Skip overlap check for SPI field
Octeontx2/CN10K silicon supports generating a 256-bit key per packet.
The specific fields to be extracted from a packet for key generation
are configurable via a Key Extraction (MKEX) Profile.

The AF driver scans the configured extraction profile to ensure that
fields from upper layers do not overwrite fields from lower layers in
the key.

Example Packet Field Layout:
LA: DMAC + SMAC
LB: VLAN
LC: IPv4/IPv6
LD: TCP/UDP

Valid MKEX Profile Configuration:

LA   -> DMAC   -> key_offset[0-5]
LC   -> SIP    -> key_offset[20-23]
LD   -> SPORT  -> key_offset[30-31]

Invalid MKEX profile configuration:

LA   -> DMAC   -> key_offset[0-5]
LC   -> SIP    -> key_offset[20-23]
LD   -> SPORT  -> key_offset[2-3]  // Overlaps with DMAC field

In another scenario, if the MKEX profile is configured to extract
the SPI field from both AH and ESP headers at the same key offset,
the driver rejecting this configuration. In a regular traffic,
ipsec packet will be having either AH(LD) or ESP (LE). This patch
relaxes the check for the same.

Fixes: 12aa0a3b93 ("octeontx2-af: Harden rule validation.")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250820063919.1463518-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:52:54 -07:00
Jakub Kicinski
a61a3e961b selftests: tls: add tests for zero-length records
Test various combinations of zero-length records.
Unfortunately, kernel cannot be coerced into producing those,
so hardcode the ciphertext messages in the test.

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250820021952.143068-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:52:31 -07:00
Jakub Kicinski
62708b9452 tls: fix handling of zero-length records on the rx_list
Each recvmsg() call must process either
 - only contiguous DATA records (any number of them)
 - one non-DATA record

If the next record has different type than what has already been
processed we break out of the main processing loop. If the record
has already been decrypted (which may be the case for TLS 1.3 where
we don't know type until decryption) we queue the pending record
to the rx_list. Next recvmsg() will pick it up from there.

Queuing the skb to rx_list after zero-copy decrypt is not possible,
since in that case we decrypted directly to the user space buffer,
and we don't have an skb to queue (darg.skb points to the ciphertext
skb for access to metadata like length).

Only data records are allowed zero-copy, and we break the processing
loop after each non-data record. So we should never zero-copy and
then find out that the record type has changed. The corner case
we missed is when the initial record comes from rx_list, and it's
zero length.

Reported-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg>
Reported-by: Billy Jheng Bing-Jhong <billy@starlabs.sg>
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250820021952.143068-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21 07:52:30 -07:00
Linus Torvalds
1c656b1efd Merge tag 'loongarch-fixes-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
 "Fix a lot of build warnings for LTO-enabled objtool check, increase
  COMMAND_LINE_SIZE up to 4096, rename a missing GCC_PLUGIN_STACKLEAK to
  KSTACK_ERASE, and fix some bugs about arch timer, module loading, LBT
  and KVM"

* tag 'loongarch-fixes-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Add address alignment check in pch_pic register access
  LoongArch: KVM: Use kvm_get_vcpu_by_id() instead of kvm_get_vcpu()
  LoongArch: KVM: Fix stack protector issue in send_ipi_data()
  LoongArch: KVM: Make function kvm_own_lbt() robust
  LoongArch: Rename GCC_PLUGIN_STACKLEAK to KSTACK_ERASE
  LoongArch: Save LBT before FPU in setup_sigcontext()
  LoongArch: Optimize module load time by optimizing PLT/GOT counting
  LoongArch: Add cpuhotplug hooks to fix high cpu usage of vCPU threads
  LoongArch: Increase COMMAND_LINE_SIZE up to 4096
  LoongArch: Pass annotate-tablejump option if LTO is enabled
  objtool/LoongArch: Get table size correctly if LTO is enabled
2025-08-21 10:37:33 -04:00
Raphael Gallais-Pou
7c7cda8115 spi: st: fix PM macros to use CONFIG_PM instead of CONFIG_PM_SLEEP
pm_sleep_ptr() depends on CONFIG_PM_SLEEP while pm_ptr() depends on
CONFIG_PM.  Since ST SSC4 implements runtime PM it makes sense using
pm_ptr() here.

For the same reason replace PM macros that use CONFIG_PM.  Doing so
prevents from using __maybe_unused attribute of runtime PM functions.

Link: https://lore.kernel.org/lkml/CAMuHMdX9nkROkAJJ5odv4qOWe0bFTmaFs=Rfxsfuc9+DT-bsEQ@mail.gmail.com
Fixes: 6f8584a482 ("spi: st: Switch from CONFIG_PM_SLEEP guards to pm_sleep_ptr()")

Signed-off-by: Raphael Gallais-Pou <rgallaispou@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250820180310.9605-1-rgallaispou@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-08-21 13:14:10 +01:00
Linus Torvalds
32b7144f80 Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fixes from Eric Biggers:
 "Fix a regression where 'make clean' stopped removing some of the
  generated assembly files on arm and arm64"

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crypto: ensure generated *.S files are removed on make clean
  lib/crypto: sha: Update Kconfig help for SHA1 and SHA256
2025-08-21 04:54:01 -07:00
Linus Torvalds
eb4a0992dd Merge tag '6.17-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:

 - fix refcount issue that can cause memory leak

 - rate limit repeated connections from IPv6, not just IPv4 addresses

 - fix potential null pointer access of smb direct work queue

* tag '6.17-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix refcount leak causing resource not released
  ksmbd: extend the connection limiting mechanism to support IPv6
  smb: server: split ksmbd_rdma_stop_listening() out of ksmbd_rdma_destroy()
2025-08-21 04:48:41 -07:00
Lorenzo Bianconi
9f6b606b6b net: airoha: ppe: Do not invalid PPE entries in case of SW hash collision
SW hash computed by airoha_ppe_foe_get_entry_hash routine (used for
foe_flow hlist) can theoretically produce collisions between two
different HW PPE entries.
In airoha_ppe_foe_insert_entry() if the collision occurs we will mark
the second PPE entry in the list as stale (setting the hw hash to 0xffff).
Stale entries are no more updated in airoha_ppe_foe_flow_entry_update
routine and so they are removed by Netfilter.
Fix the problem not marking the second entry as stale in
airoha_ppe_foe_insert_entry routine if we have already inserted the
brand new entry in the PPE table and let Netfilter remove real stale
entries according to their timestamp.
Please note this is just a theoretical issue spotted reviewing the code
and not faced running the system.

Fixes: cd53f62261 ("net: airoha: Add L2 hw acceleration support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250818-airoha-en7581-hash-collision-fix-v1-1-d190c4b53d1c@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 11:25:11 +02:00
Paolo Abeni
184fa9d704 Merge branch 'bonding-fix-negotiation-flapping-in-802-3ad-passive-mode'
Hangbin Liu says:

====================
bonding: fix negotiation flapping in 802.3ad passive mode

This patch fixes unstable LACP negotiation when bonding is configured in
passive mode (`lacp_active=off`).

Previously, the actor would stop sending LACPDUs after initial negotiation
succeeded, leading to the partner timing out and restarting the negotiation
cycle. This resulted in continuous LACP state flapping.

The fix ensures the passive actor starts sending periodic LACPDUs after
receiving the first LACPDU from the partner, in accordance with IEEE
802.1AX-2020 section 6.4.1.
====================

Link: https://patch.msgid.link/20250815062000.22220-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 09:35:24 +02:00
Hangbin Liu
87951b5664 selftests: bonding: add test for passive LACP mode
Add a selftest to verify bonding behavior when `lacp_active` is set to `off`.

The test checks the following:
- The passive LACP bond should not send LACPDUs before receiving a partner's
  LACPDU.
- The transmitted LACPDUs must not include the active flag.
- After transitioning to EXPIRED and DEFAULTED states, the passive side should
  still not initiate LACPDUs.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250815062000.22220-4-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 09:35:21 +02:00
Hangbin Liu
0599640a21 bonding: send LACPDUs periodically in passive mode after receiving partner's LACPDU
When `lacp_active` is set to `off`, the bond operates in passive mode, meaning
it only "speaks when spoken to." However, the current kernel implementation
only sends an LACPDU in response when the partner's state changes.

As a result, once LACP negotiation succeeds, the actor stops sending LACPDUs
until the partner times out and sends an "expired" LACPDU. This causes
continuous LACP state flapping.

According to IEEE 802.1AX-2014, 6.4.13 Periodic Transmission machine. The
values of Partner_Oper_Port_State.LACP_Activity and
Actor_Oper_Port_State.LACP_Activity determine whether periodic transmissions
take place. If either or both parameters are set to Active LACP, then periodic
transmissions occur; if both are set to Passive LACP, then periodic
transmissions do not occur.

To comply with this, we remove the `!bond->params.lacp_active` check in
`ad_periodic_machine()`. Instead, we initialize the actor's port's
`LACP_STATE_LACP_ACTIVITY` state based on `lacp_active` setting.

Additionally, we avoid setting the partner's state to
`LACP_STATE_LACP_ACTIVITY` in the EXPIRED state, since we should not assume
the partner is active by default.

This ensures that in passive mode, the bond starts sending periodic LACPDUs
after receiving one from the partner, and avoids flapping due to inactivity.

Fixes: 3a755cd8b7 ("bonding: add new option lacp_active")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250815062000.22220-3-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 09:35:20 +02:00
Hangbin Liu
b64d035f77 bonding: update LACP activity flag after setting lacp_active
The port's actor_oper_port_state activity flag should be updated immediately
after changing the lacp_active option to reflect the current mode correctly.

Fixes: 3a755cd8b7 ("bonding: add new option lacp_active")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250815062000.22220-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-21 09:35:20 +02:00
Ryan Wanner
c42be53454 Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag"
This reverts commit db400061b5.

This commit can cause a Devicetree ABI break for older DTS files that rely this
flag for RMII configuration. Adding this back in ensures that the older
DTBs will not break.

Fixes: db400061b5 ("net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag")
Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com>
Link: https://patch.msgid.link/20250819163236.100680-1-Ryan.Wanner@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:33:42 -07:00
Eric Biggers
a458b29021 ipv6: sr: Fix MAC comparison to be constant-time
To prevent timing attacks, MACs need to be compared in constant time.
Use the appropriate helper function for this.

Fixes: bf355b8d2c ("ipv6: sr: add core files for SR HMAC support")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250818202724.15713-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:32:30 -07:00
Jakub Acs
7af76e9d18 net, hsr: reject HSR frame if skb can't hold tag
Receiving HSR frame with insufficient space to hold HSR tag in the skb
can result in a crash (kernel BUG):

[   45.390915] skbuff: skb_under_panic: text:ffffffff86f32cac len:26 put:14 head:ffff888042418000 data:ffff888042417ff4 tail:0xe end:0x180 dev:bridge_slave_1
[   45.392559] ------------[ cut here ]------------
[   45.392912] kernel BUG at net/core/skbuff.c:211!
[   45.393276] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[   45.393809] CPU: 1 UID: 0 PID: 2496 Comm: reproducer Not tainted 6.15.0 #12 PREEMPT(undef)
[   45.394433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   45.395273] RIP: 0010:skb_panic+0x15b/0x1d0

<snip registers, remove unreliable trace>

[   45.402911] Call Trace:
[   45.403105]  <IRQ>
[   45.404470]  skb_push+0xcd/0xf0
[   45.404726]  br_dev_queue_push_xmit+0x7c/0x6c0
[   45.406513]  br_forward_finish+0x128/0x260
[   45.408483]  __br_forward+0x42d/0x590
[   45.409464]  maybe_deliver+0x2eb/0x420
[   45.409763]  br_flood+0x174/0x4a0
[   45.410030]  br_handle_frame_finish+0xc7c/0x1bc0
[   45.411618]  br_handle_frame+0xac3/0x1230
[   45.413674]  __netif_receive_skb_core.constprop.0+0x808/0x3df0
[   45.422966]  __netif_receive_skb_one_core+0xb4/0x1f0
[   45.424478]  __netif_receive_skb+0x22/0x170
[   45.424806]  process_backlog+0x242/0x6d0
[   45.425116]  __napi_poll+0xbb/0x630
[   45.425394]  net_rx_action+0x4d1/0xcc0
[   45.427613]  handle_softirqs+0x1a4/0x580
[   45.427926]  do_softirq+0x74/0x90
[   45.428196]  </IRQ>

This issue was found by syzkaller.

The panic happens in br_dev_queue_push_xmit() once it receives a
corrupted skb with ETH header already pushed in linear data. When it
attempts the skb_push() call, there's not enough headroom and
skb_push() panics.

The corrupted skb is put on the queue by HSR layer, which makes a
sequence of unintended transformations when it receives a specific
corrupted HSR frame (with incomplete TAG).

Fix it by dropping and consuming frames that are not long enough to
contain both ethernet and hsr headers.

Alternative fix would be to check for enough headroom before skb_push()
in br_dev_queue_push_xmit().

In the reproducer, this is injected via AF_PACKET, but I don't easily
see why it couldn't be sent over the wire from adjacent network.

Further Details:

In the reproducer, the following network interface chain is set up:

┌────────────────┐   ┌────────────────┐
│ veth0_to_hsr   ├───┤  hsr_slave0    ┼───┐
└────────────────┘   └────────────────┘   │
                                          │ ┌──────┐
                                          ├─┤ hsr0 ├───┐
                                          │ └──────┘   │
┌────────────────┐   ┌────────────────┐   │            │┌────────┐
│ veth1_to_hsr   ┼───┤  hsr_slave1    ├───┘            └┤        │
└────────────────┘   └────────────────┘                ┌┼ bridge │
                                                       ││        │
                                                       │└────────┘
                                                       │
                                        ┌───────┐      │
                                        │  ...  ├──────┘
                                        └───────┘

To trigger the events leading up to crash, reproducer sends a corrupted
HSR frame with incomplete TAG, via AF_PACKET socket on 'veth0_to_hsr'.

The first HSR-layer function to process this frame is
hsr_handle_frame(). It and then checks if the
protocol is ETH_P_PRP or ETH_P_HSR. If it is, it calls
skb_set_network_header(skb, ETH_HLEN + HSR_HLEN), without checking that
the skb is long enough. For the crashing frame it is not, and hence the
skb->network_header and skb->mac_len fields are set incorrectly,
pointing after the end of the linear buffer.

I will call this a BUG#1 and it is what is addressed by this patch. In
the crashing scenario before the fix, the skb continues to go down the
hsr path as follows.

hsr_handle_frame() then calls this sequence
hsr_forward_skb()
  fill_frame_info()
    hsr->proto_ops->fill_frame_info()
      hsr_fill_frame_info()

hsr_fill_frame_info() contains a check that intends to check whether the
skb actually contains the HSR header. But the check relies on the
skb->mac_len field which was erroneously setup due to BUG#1, so the
check passes and the execution continues  back in the hsr_forward_skb():

hsr_forward_skb()
  hsr_forward_do()
    hsr->proto_ops->get_untagged_frame()
      hsr_get_untagged_frame()
        create_stripped_skb_hsr()

In create_stripped_skb_hsr(), a copy of the skb is created and is
further corrupted by operation that attempts to strip the HSR tag in a
call to __pskb_copy().

The skb enters create_stripped_skb_hsr() with ethernet header pushed in
linear buffer. The skb_pull(skb_in, HSR_HLEN) thus pulls 6 bytes of
ethernet header into the headroom, creating skb_in with a headroom of
size 8. The subsequent __pskb_copy() then creates an skb with headroom
of just 2 and skb->len of just 12, this is how it looks after the copy:

gdb) p skb->len
$10 = 12
(gdb) p skb->data
$11 = (unsigned char *) 0xffff888041e45382 "\252\252\252\252\252!\210\373",
(gdb) p skb->head
$12 = (unsigned char *) 0xffff888041e45380 ""

It seems create_stripped_skb_hsr() assumes that ETH header is pulled
in the headroom when it's entered, because it just pulls HSR header on
top. But that is not the case in our code-path and we end up with the
corrupted skb instead. I will call this BUG#2

*I got confused here because it seems that under no conditions can
create_stripped_skb_hsr() work well, the assumption it makes is not true
during the processing of hsr frames - since the skb_push() in
hsr_handle_frame to skb_pull in hsr_deliver_master(). I wonder whether I
missed something here.*

Next, the execution arrives in hsr_deliver_master(). It calls
skb_pull(ETH_HLEN), which just returns NULL - the SKB does not have
enough space for the pull (as it only has 12 bytes in total at this
point).

*The skb_pull() here further suggests that ethernet header is meant
to be pushed through the whole hsr processing and
create_stripped_skb_hsr() should pull it before doing the HSR header
pull.*

hsr_deliver_master() then puts the corrupted skb on the queue, it is
then picked up from there by bridge frame handling layer and finally
lands in br_dev_queue_push_xmit where it panics.

Cc: stable@kernel.org
Fixes: 48b491a5cc ("net: hsr: fix mac_len checks")
Reported-by: syzbot+a81f2759d022496b40ab@syzkaller.appspotmail.com
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250819082842.94378-1-acsjakub@amazon.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:31:25 -07:00
William Liu
2c2192e5f9 net/sched: Remove unnecessary WARNING condition for empty child qdisc in htb_activate
The WARN_ON trigger based on !cl->leaf.q->q.qlen is unnecessary in
htb_activate. htb_dequeue_tree already accounts for that scenario.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: William Liu <will@willsroot.io>
Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
Link: https://patch.msgid.link/20250819033632.579854-1-will@willsroot.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:27:08 -07:00
William Liu
15de71d06a net/sched: Make cake_enqueue return NET_XMIT_CN when past buffer_limit
The following setup can trigger a WARNING in htb_activate due to
the condition: !cl->leaf.q->q.qlen

tc qdisc del dev lo root
tc qdisc add dev lo root handle 1: htb default 1
tc class add dev lo parent 1: classid 1:1 \
       htb rate 64bit
tc qdisc add dev lo parent 1:1 handle f: \
       cake memlimit 1b
ping -I lo -f -c1 -s64 -W0.001 127.0.0.1

This is because the low memlimit leads to a low buffer_limit, which
causes packet dropping. However, cake_enqueue still returns
NET_XMIT_SUCCESS, causing htb_enqueue to call htb_activate with an
empty child qdisc. We should return NET_XMIT_CN when packets are
dropped from the same tin and flow.

I do not believe return value of NET_XMIT_CN is necessary for packet
drops in the case of ack filtering, as that is meant to optimize
performance, not to signal congestion.

Fixes: 046f6fd5da ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
Signed-off-by: William Liu <will@willsroot.io>
Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250819033601.579821-1-will@willsroot.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:27:08 -07:00
Tristram Ha
e318cd6714 net: dsa: microchip: Fix KSZ9477 HSR port setup issue
ksz9477_hsr_join() is called once to setup the HSR port membership, but
the port can be enabled later, or disabled and enabled back and the port
membership is not set correctly inside ksz_update_port_member().  The
added code always use the correct HSR port membership for HSR port that
is enabled.

Fixes: 2d61298fdd ("net: dsa: microchip: Enable HSR offloading for KSZ9477")
Reported-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
Reviewed-by: Łukasz Majewski <lukma@nabladev.com>
Tested-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Link: https://patch.msgid.link/20250819010457.563286-1-Tristram.Ha@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:25:38 -07:00
Jakub Kicinski
f7b0b97c2d Merge branch 'intel-wired-lan-driver-updates-2025-08-15-ice-ixgbe-igc'
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-08-15 (ice, ixgbe, igc)

For ixgbe:
Jason Xing corrects a condition in which improper decrement can cause
improper budget value.

Maciej extends down states in which XDP cannot transmit and excludes XDP
rings from Tx hang checks.

For igc:
VladikSS moves setting of hardware device information to allow for proper
check of device ID.

v1: https://lore.kernel.org/20250815204205.1407768-1-anthony.l.nguyen@intel.com
====================

Link: https://patch.msgid.link/20250819222000.3504873-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 19:20:50 -07:00
ValdikSS
1468c1f97c igc: fix disabling L1.2 PCI-E link substate on I226 on init
Device ID comparison in igc_is_device_id_i226 is performed before
the ID is set, resulting in always failing check on init.

Before the patch:
* L1.2 is not disabled on init
* L1.2 is properly disabled after suspend-resume cycle

With the patch:
* L1.2 is properly disabled both on init and after suspend-resume

How to test:
Connect to the 1G link with 300+ mbit/s Internet speed, and run
the download speed test, such as:

    curl -o /dev/null http://speedtest.selectel.ru/1GB

Without L1.2 disabled, the speed would be no more than ~200 mbit/s.
With L1.2 disabled, the speed would reach 1 gbit/s.
Note: it's required that the latency between your host and the remote
be around 3-5 ms, the test inside LAN (<1 ms latency) won't trigger the
issue.

Link: https://lore.kernel.org/intel-wired-lan/15248b4f-3271-42dd-8e35-02bfc92b25e1@intel.com
Fixes: 0325143b59 ("igc: disable L1.2 PCI-E link substate to avoid performance issue")
Signed-off-by: ValdikSS <iam@valdikss.org.ru>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250819222000.3504873-6-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 18:46:30 -07:00
Maciej Fijalkowski
f3d9f7fa7f ixgbe: fix ndo_xdp_xmit() workloads
Currently ixgbe driver checks periodically in its watchdog subtask if
there is anything to be transmitted (considering both Tx and XDP rings)
under state of carrier not being 'ok'. Such event is interpreted as Tx
hang and therefore results in interface reset.

This is currently problematic for ndo_xdp_xmit() as it is allowed to
produce descriptors when interface is going through reset or its carrier
is turned off.

Furthermore, XDP rings should not really be objects of Tx hang
detection. This mechanism is rather a matter of ndo_tx_timeout() being
called from dev_watchdog against Tx rings exposed to networking stack.

Taking into account issues described above, let us have a two fold fix -
do not respect XDP rings in local ixgbe watchdog and do not produce Tx
descriptors in ndo_xdp_xmit callback when there is some problem with
carrier currently. For now, keep the Tx hang checks in clean Tx irq
routine, but adjust it to not execute for XDP rings.

Cc: Tobias Böhm <tobias.boehm@hetzner-cloud.de>
Reported-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Closes: https://lore.kernel.org/netdev/eca1880f-253a-4955-afe6-732d7c6926ee@hetzner-cloud.de/
Fixes: 6453073987 ("ixgbe: add initial support for xdp redirect")
Fixes: 33fdc82f08 ("ixgbe: add support for XDP_TX action")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250819222000.3504873-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 18:46:30 -07:00
Jason Xing
4d4d9ef9df ixgbe: xsk: resolve the negative overflow of budget in ixgbe_xmit_zc
Resolve the budget negative overflow which leads to returning true in
ixgbe_xmit_zc even when the budget of descs are thoroughly consumed.

Before this patch, when the budget is decreased to zero and finishes
sending the last allowed desc in ixgbe_xmit_zc, it will always turn back
and enter into the while() statement to see if it should keep processing
packets, but in the meantime it unexpectedly decreases the value again to
'unsigned int (0--)', namely, UINT_MAX. Finally, the ixgbe_xmit_zc returns
true, showing 'we complete cleaning the budget'. That also means
'clean_complete = true' in ixgbe_poll.

The true theory behind this is if that budget number of descs are consumed,
it implies that we might have more descs to be done. So we should return
false in ixgbe_xmit_zc to tell napi poll to find another chance to start
polling to handle the rest of descs. On the contrary, returning true here
means job done and we know we finish all the possible descs this time and
we don't intend to start a new napi poll.

It is apparently against our expectations. Please also see how
ixgbe_clean_tx_irq() handles the problem: it uses do..while() statement
to make sure the budget can be decreased to zero at most and the negative
overflow never happens.

The patch adds 'likely' because we rarely would not hit the loop condition
since the standard budget is 256.

Fixes: 8221c5eba8 ("ixgbe: add AF_XDP zero-copy Tx support")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Priya Singh <priyax.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250819222000.3504873-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 18:46:30 -07:00
Linus Torvalds
068a56e56f Merge tag 'probes-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fix from Masami Hiramatsu:
 "Sanitize wildcard for fprobe event name

  Fprobe event accepts wildcards for the target functions, but unless
  the user specifies its event name, it makes an event with the
  wildcards. Replace the wildcard '*' with the underscore '_'"

* tag 'probes-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: fprobe-event: Sanitize wildcard for fprobe event name
2025-08-20 16:29:30 -07:00
Linus Torvalds
43f981b7a7 Merge tag 'bootconfig-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig fix from Masami Hiramatsu:
 "Fix negative seeks on 32-bit with LFS enabled

  On 32bit architecture, -BOOTCONFIG_FOOTER_SIZE (size_t, 32bit) becomes
  a positive value when it is passed to lseek() because it is cast to
  off_t (64bit). Thus, add type casts"

* tag 'bootconfig-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  bootconfig: Fix negative seeks on 32-bit with LFS enabled
2025-08-20 16:27:38 -07:00
Ben Hutchings
729dc340a4 bootconfig: Fix negative seeks on 32-bit with LFS enabled
Commit 26dda57695 "tools/bootconfig: Cleanup bootconfig footer size
calculations" replaced some expressions of type int with the
BOOTCONFIG_FOOTER_SIZE macro, which expands to an expression of type
size_t, which is unsigned.

On 32-bit architectures with LFS enabled (i.e. off_t is 64-bit), the
seek offset of -BOOTCONFIG_FOOTER_SIZE now turns into a positive
value.

Fix this by casting the size to off_t before negating it.

Just in case someone changes BOOTCONFIG_MAGIC_LEN to have type size_t
later, do the same thing to the seek offset of -BOOTCONFIG_MAGIC_LEN.

Link: https://lore.kernel.org/all/aKHlevxeg6Y7UQrz@decadent.org.uk/

Fixes: 26dda57695 ("tools/bootconfig: Cleanup bootconfig footer size calculations")
Signed-off-by: Ben Hutchings <benh@debian.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-08-21 08:16:31 +09:00
Linus Torvalds
41cd3fd152 Merge tag 'pci-v6.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:

 - Remove vmd restriction on children using MSI-X because VMD does in
   fact support both MSI and MSI-X for children (Nam Cao)

 - Fix a NULL pointer dereference in the xilinx interrupt handler (Nam
   Cao)

* tag 'pci-v6.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: vmd: Remove MSI-X check on child devices
  PCI: xilinx: Fix NULL pointer dereference in xilinx_pcie_intr_handler()
2025-08-20 13:26:33 -07:00