Commit Graph

1336736 Commits

Author SHA1 Message Date
Willem de Bruijn
6ad861519a net: initialize mark in sockcm_init
Avoid open coding initialization of sockcm fields.
Avoid reading the sk_priority field twice.

This ensures all callers, existing and future, will correctly try a
cmsg passed mark before sk_mark.

This patch extends support for cmsg mark to:
packet_spkt and packet_tpacket and net/can/raw.c.

This patch extends support for cmsg priority to:
packet_spkt and packet_tpacket.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:27:19 -08:00
Willem de Bruijn
aaf6532d11 tcp: only initialize sockcm tsflags field
TCP only reads the tsflags field. Don't bother initializing others.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:27:19 -08:00
Yu-Chun Lin
3a03f9ec5d net: stmmac: Use str_enabled_disabled() helper
As kernel test robot reported, the following warning occurs:

cocci warnings: (new ones prefixed by >>)
>> drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c:582:6-8: opportunity for str_enabled_disabled(on)

Replace ternary (condition ? "enabled" : "disabled") with
str_enabled_disabled() from string_choices.h to improve readability,
maintain uniform string usage, and reduce binary size through linker
deduplication.

Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Link: https://patch.msgid.link/20250217155833.3105775-1-eleanor15x@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:12:12 -08:00
Breno Leitao
8f02c48f8f net: Remove redundant variable declaration in __dev_change_flags()
The old_flags variable is declared twice in __dev_change_flags(),
causing a shadow variable warning. This patch fixes the issue by
removing the redundant declaration, reusing the existing old_flags
variable instead.

	net/core/dev.c:9225:16: warning: declaration shadows a local variable [-Wshadow]
	9225 |                 unsigned int old_flags = dev->flags;
	|                              ^
	net/core/dev.c:9185:15: note: previous declaration is here
	9185 |         unsigned int old_flags = dev->flags;
	|                      ^
	1 warning generated.

Remove the redundant inner declaration and reuse the existing old_flags
variable since its value is not needed outside the if block, and it is
safe to reuse the variable. This eliminates the warning while
maintaining the same functionality.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250217-old_flags-v2-1-4cda3b43a35f@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:10:56 -08:00
Chandra Mohan Sundar
f29e41454b selftests: net: Fix few spelling mistakes
Fix few spelling mistakes in net selftests

Signed-off-by: Chandra Mohan Sundar <chandru.dav@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250217141520.81033-1-chandru.dav@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:10:31 -08:00
Qingfang Deng
952d732536 net: ethernet: mediatek: add EEE support
Add EEE support to MediaTek SoC Ethernet. The register fields are
similar to the ones in MT7531, except that the LPI threshold is in
milliseconds.

Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250217094022.1065436-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:09:24 -08:00
Pei Xiao
9faaaef27c net: freescale: ucc_geth: make ugeth_mac_ops be static const
sparse warning:
    sparse: symbol 'ugeth_mac_ops' was not declared. Should it be
static.

Add static to fix sparse warnings and add const. phylink_create() will
accept a const struct.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/202502141128.9HfxcdIE-lkp@intel.com
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:08:18 -08:00
Jakub Kicinski
59ed446bc4 Merge branch 'net-phy-improve-and-simplify-eee-handling-in-phylib'
Heiner Kallweit says:

====================
net: phy: improve and simplify EEE handling in phylib

This series improves and simplifies phylib's EEE handling.
====================

Link: https://patch.msgid.link/3caa3151-13ac-44a8-9bb6-20f82563f698@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:07:10 -08:00
Heiner Kallweit
809265fe96 net: phy: c45: remove local advertisement parameter from genphy_c45_eee_is_active
After the last user has gone, we can remove the local advertisement
parameter from genphy_c45_eee_is_active.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/bd121330-9e28-4bc8-8422-794bd54d561f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:07:09 -08:00
Heiner Kallweit
199d0ce385 net: phy: c45: use cached EEE advertisement in genphy_c45_ethtool_get_eee
Now that disabled EEE modes are considered when populating
advertising_eee, we can use this bitmap here instead of reading
the PHY register.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/e57ed3d4-d0bc-4f91-83f6-8f48dfb6d7d7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:07:09 -08:00
Heiner Kallweit
aa951feb54 net: phy: c45: Don't silently remove disabled EEE modes any longer when writing advertisement register
advertising_eee is adjusted now whenever an EEE mode gets disabled.
Therefore we can remove the silent removal of disabled EEE modes here.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/e95b9dad-24a7-4e3e-9af9-6f0770cf1520@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:07:09 -08:00
Heiner Kallweit
7f33fea6bb net: phy: remove disabled EEE modes from advertising_eee in phy_probe
A PHY driver may populate eee_disabled_modes in its probe or get_features
callback, therefore filter the EEE advertisement read from the PHY.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/493f3e2e-9cfc-445d-adbe-58d9c117a489@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:07:09 -08:00
Heiner Kallweit
a9b6a860d7 net: phy: improve phy_disable_eee_mode
If a mode is to be disabled, remove it from advertising_eee.
Disabling EEE modes shall be done before calling phy_start(),
warn if that's not the case.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/92164896-38ff-4474-b98b-e83fc05b9509@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:07:08 -08:00
Heiner Kallweit
8a6a77bb5a net: phy: move definition of phy_is_started before phy_disable_eee_mode
In preparation of a follow-up patch, move phy_is_started() to before
phy_disable_eee_mode().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/04d1e7a5-f4c0-42ab-8fa4-88ad26b74813@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:07:08 -08:00
Jakub Kicinski
2f56be7f52 MAINTAINERS: trim the GVE entry
We requested in the past that GVE patches coming out of Google should
be submitted only by GVE maintainers. There were too many patches
posted which didn't follow the subsystem guidance.

Recently Joshua was added to maintainers, but even tho he was asked
to follow the netdev "FAQ" in the past [1] he does not follow
the local customs. It is not reasonable for a person who hasn't read
the maintainer entry for the subsystem to be a driver maintainer.

We can re-add once Joshua does some on-list reviews to prove
the fluency with the upstream process.

Link: https://lore.kernel.org/20240610172720.073d5912@kernel.org # [1]
Link: https://patch.msgid.link/20250215162646.2446559-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:06:18 -08:00
Heiner Kallweit
fabcfd6d10 net: phy: realtek: add defines for shadowed c45 standard registers
Realtek shadows standard c45 registers in VEND2 device register space.
Add defines for these VEND2 registers, based on the names of the
standard c45 registers.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/c90bdf76-f8b8-4d06-9656-7a52d5658ee6@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:04:05 -08:00
Siddh Raman Pant
438989137a netlink: Unset cb_running when terminating dump on release
When we terminated the dump, the callback isn't running, so cb_running
should be set to false to be logically consistent.

cb_running signifies whether a dump is ongoing. It is set to true in
cb->start(), and is checked in netlink_dump() to be true initially.
After the dump, it is set to false in the same function.

This is just a cleanup, no path should access this field on a closed
socket.

Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com>
Link: https://patch.msgid.link/aff028e3eb2b768b9895fa6349fa1981ae22f098.camel@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:03:12 -08:00
Joshua Washington
415cadd505 gve: set xdp redirect target only when it is available
Before this patch the NETDEV_XDP_ACT_NDO_XMIT XDP feature flag is set by
default as part of driver initialization, and is never cleared. However,
this flag differs from others in that it is used as an indicator for
whether the driver is ready to perform the ndo_xdp_xmit operation as
part of an XDP_REDIRECT. Kernel helpers
xdp_features_(set|clear)_redirect_target exist to convey this meaning.

This patch ensures that the netdev is only reported as a redirect target
when XDP queues exist to forward traffic.

Fixes: 39a7f4aa3e ("gve: Add XDP REDIRECT support for GQI-QPL format")
Cc: stable@vger.kernel.org
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20250214224417.1237818-1-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:01:24 -08:00
Jakub Kicinski
d5b595d3ae Merge branch 'net-cadence-macb-modernize-statistics-reporting'
Sean Anderson says:

====================
net: cadence: macb: Modernize statistics reporting

Implement the modern interfaces for statistics reporting.
====================

Link: https://patch.msgid.link/20250214212703.2618652-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:00:10 -08:00
Sean Anderson
f6af690a29 net: cadence: macb: Report standard stats
Report standard statistics using the dedicated callbacks instead of
get_ethtool_stats.

OCTTX is split over two registers. Accumulating these registers
separately in gem_stats just means we need to combine them again later.
Instead, combine these stats before saving them, like is done for
ethtool_stats.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250214212703.2618652-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:00:08 -08:00
Sean Anderson
75696dd0fd net: cadence: macb: Convert to get_stats64
Convert the existing get_stats implementation to get_stats64. Since we
now report 64-bit values, increase the counters to 64-bits as well.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250214212703.2618652-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:00:07 -08:00
Sean Anderson
c900e49d58 net: xilinx: axienet: Implement BQL
Implement byte queue limits to allow queueing disciplines to account for
packets enqueued in the ring buffers but not yet transmitted.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250214211252.2615573-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 17:54:35 -08:00
Jakub Kicinski
f5da7c4518 tcp: adjust rcvq_space after updating scaling ratio
Since commit under Fixes we set the window clamp in accordance
to newly measured rcvbuf scaling_ratio. If the scaling_ratio
decreased significantly we may put ourselves in a situation
where windows become smaller than rcvq_space, preventing
tcp_rcv_space_adjust() from increasing rcvbuf.

The significant decrease of scaling_ratio is far more likely
since commit 697a6c8cec ("tcp: increase the default TCP scaling ratio"),
which increased the "default" scaling ratio from ~30% to 50%.

Hitting the bad condition depends a lot on TCP tuning, and
drivers at play. One of Meta's workloads hits it reliably
under following conditions:
 - default rcvbuf of 125k
 - sender MTU 1500, receiver MTU 5000
 - driver settles on scaling_ratio of 78 for the config above.
Initial rcvq_space gets calculated as TCP_INIT_CWND * tp->advmss
(10 * 5k = 50k). Once we find out the true scaling ratio and
MSS we clamp the windows to 38k. Triggering the condition also
depends on the message sequence of this workload. I can't repro
the problem with simple iperf or TCP_RR-style tests.

Fixes: a2cbb16039 ("tcp: Update window clamping condition")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250217232905.3162187-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 16:02:18 -08:00
Heiner Kallweit
8af2136e77 net: phy: realtek: add helper RTL822X_VND2_C22_REG
C22 register space is mapped to 0xa400 in MMD VEND2 register space.
Add a helper to access mapped C22 registers.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/6344277b-c5c7-449b-ac89-d5425306ca76@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 15:37:11 -08:00
Jakub Kicinski
2e864f18e5 Merge branch 'eth-mlx4-use-the-page-pool-for-rx-buffers'
Jakub Kicinski says:

====================
eth: mlx4: use the page pool for Rx buffers

Convert mlx4 to page pool. I've been sitting on these patches for
over a year, and Jonathan Lemon had a similar series years before.
We never deployed it or sent upstream because it didn't really show
much perf win under normal load (admittedly I think the real testing
was done before Ilias's work on recycling).

During the v6.9 kernel rollout Meta's CDN team noticed that machines
with CX3 Pro (mlx4) are prone to overloads (double digit % of CPU time
spent mapping buffers in the IOMMU). The problem does not occur with
modern NICs, so I dusted off this series and reportedly it still works.
And it makes the problem go away, no overloads, perf back in line with
older kernels. Something must have changed in IOMMU code, I guess.

This series is very simple, and can very likely be optimized further.
Thing is, I don't have access to any CX3 Pro NICs. They only exist
in CDN locations which haven't had a HW refresh for a while. So I can
say this series survives a week under traffic w/ XDP enabled, but
my ability to iterate and improve is a bit limited.

v2: https://lore.kernel.org/20250211192141.619024-1-kuba@kernel.org
v1: https://lore.kernel.org/20250205031213.358973-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250213010635.1354034-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 15:32:22 -08:00
Jakub Kicinski
82b023c97f eth: mlx4: use the page pool for Rx buffers
Simple conversion to page pool. Preserve the current fragmentation
logic / page splitting. Each page starts with a single frag reference,
and then we bump that when attaching to skbs. This can likely be
optimized further.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 15:32:20 -08:00
Jakub Kicinski
d17fb2c055 eth: mlx4: remove the local XDP fast-recycling ring
It will be replaced with page pool's built-in recycling.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 15:32:20 -08:00
Jakub Kicinski
8fdeafd66e eth: mlx4: don't try to complete XDP frames in netpoll
mlx4 doesn't support ndo_xdp_xmit / XDP_REDIRECT and wasn't
using page pool until now, so it could run XDP completions
in netpoll (NAPI budget == 0) just fine. Page pool has calling
context requirements, make sure we don't try to call it from
what is potentially HW IRQ context.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 15:32:20 -08:00
Jakub Kicinski
8533b14b3d eth: mlx4: create a page pool for Rx
Create a pool per rx queue. Subsequent patches will make use of it.

Move fcs_del to a hole to make space for the pointer.

Per common "wisdom" base the page pool size on the ring size.
Note that the page pool cache size is in full pages, so just
round up the effective buffer size to pages.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 15:32:20 -08:00
Linus Torvalds
6537cfb395 Merge tag 'sound-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "A slightly large collection of fixes, spread over various drivers.

  Almost all are small and device-specific fixes and quirks in ASoC SOF
  Intel and AMD, Renesas, Cirrus, HD-audio, in addition to a small fix
  for MIDI 2.0"

* tag 'sound-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (41 commits)
  ALSA: seq: Drop UMP events when no UMP-conversion is set
  ALSA: hda/conexant: Add quirk for HP ProBook 450 G4 mute LED
  ALSA: hda/cirrus: Reduce codec resume time
  ALSA: hda/cirrus: Correct the full scale volume set logic
  virtio_snd.h: clarify that `controls` depends on VIRTIO_SND_F_CTLS
  ALSA: hda: Add error check for snd_ctl_rename_id() in snd_hda_create_dig_out_ctls()
  ALSA: hda/tas2781: Fix index issue in tas2781 hda SPI driver
  ASoC: imx-audmix: remove cpu_mclk which is from cpu dai device
  ALSA: hda/realtek: Fixup ALC225 depop procedure
  ALSA: hda/tas2781: Update tas2781 hda SPI driver
  ASoC: cs35l41: Fix acpi_device_hid() not found
  ASoC: SOF: amd: Add branch prediction hint in ACP IRQ handler
  ASoC: SOF: amd: Handle IPC replies before FW_BOOT_COMPLETE
  ASoC: SOF: amd: Drop unused includes from Vangogh driver
  ASoC: SOF: amd: Add post_fw_run_delay ACP quirk
  ASoC: Intel: soc-acpi-intel-ptl-match: revise typo of rt713_vb_l2_rt1320_l13
  ASoC: Intel: soc-acpi-intel-ptl-match: revise typo of rt712_vb + rt1320 support
  ALSA: Switch to use hrtimer_setup()
  ALSA: hda: hda-intel: add Panther Lake-H support
  ASoC: SOF: Intel: pci-ptl: Add support for PTL-H
  ...
2025-02-18 09:00:31 -08:00
Niklas Söderlund
4991b88c25 net: phy: marvell-88q2xxx: Init PHY private structure for mv88q211x
When adding LED support for mv88q222x devices the PHY private data
structure was added to the mv88q211x code path, the data structure is
however only allocated during mv88q222x probe. This results in a nullptr
deference for mv88q2110 devices.

	Unable to handle kernel NULL pointer dereference at virtual address 0000000000000001
	Mem abort info:
	  ESR = 0x0000000096000004
	  EC = 0x25: DABT (current EL), IL = 32 bits
	  SET = 0, FnV = 0
	  EA = 0, S1PTW = 0
	  FSC = 0x04: level 0 translation fault
	Data abort info:
	  ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
	  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
	  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
	[0000000000000001] user address but active_mm is swapper
	Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
	CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc1-arm64-renesas-00342-ga3783dbf2574 #7
	Hardware name: Renesas White Hawk Single board based on r8a779g2 (DT)
	pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
	pc : mv88q2xxx_config_init+0x28/0x84
	lr : mv88q2110_config_init+0x98/0xb0
	sp : ffff8000823eb9d0
	x29: ffff8000823eb9d0 x28: ffff000440942000 x27: ffff80008144e400
	x26: 0000000000001002 x25: 0000000000000000 x24: 0000000000000000
	x23: 0000000000000009 x22: ffff8000810534f0 x21: ffff800081053550
	x20: 0000000000000000 x19: ffff0004437d6800 x18: 0000000000000018
	x17: 00000000000961c8 x16: ffff0006bef75ec0 x15: 0000000000000001
	x14: 0000000000000001 x13: ffff000440218080 x12: 071c71c71c71c71c
	x11: ffff000440218080 x10: 0000000000001420 x9 : ffff8000823eb770
	x8 : ffff8000823eb650 x7 : ffff8000823eb750 x6 : ffff8000823eb710
	x5 : 0000000000000000 x4 : 0000000000000800 x3 : 0000000000000001
	x2 : 0000000000000000 x1 : 00000000ffffffff x0 : ffff0004437d6800
	Call trace:
	 mv88q2xxx_config_init+0x28/0x84 (P)
	 mv88q2110_config_init+0x98/0xb0
	 phy_init_hw+0x64/0x9c
	 phy_attach_direct+0x118/0x320
	 phy_connect_direct+0x24/0x80
	 of_phy_connect+0x5c/0xa0
	 rtsn_open+0x5bc/0x78c
	 __dev_open+0xf8/0x1fc
	 __dev_change_flags+0x198/0x220
	 dev_change_flags+0x20/0x64
	 ip_auto_config+0x270/0xefc
	 do_one_initcall+0xe4/0x22c
	 kernel_init_freeable+0x2a8/0x308
	 kernel_init+0x20/0x130
	 ret_from_fork+0x10/0x20
	Code: b907e404 f9432814 3100083f 540000e3 (39400680)
	---[ end trace 0000000000000000 ]---
	Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
	SMP: stopping secondary CPUs
	Kernel Offset: disabled
	CPU features: 0x000,00000070,00801250,8200700b
	Memory Limit: none
	---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Fix this by using a generic probe function for both mv88q211x and
mv88q222x devices that allocates the PHY private data structure, while
only the mv88q222x probes for LED support.

Fixes: a3783dbf25 ("net: phy: marvell-88q2xxx: Add support for PHY LEDs on 88q2xxx")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250214174650.2056949-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 15:33:41 +01:00
Breno Leitao
8e677a4661 trace: tcp: Add tracepoint for tcp_cwnd_reduction()
Add a lightweight tracepoint to monitor TCP congestion window
adjustments via tcp_cwnd_reduction(). This tracepoint enables tracking
of:
- TCP window size fluctuations
- Active socket behavior
- Congestion window reduction events

Meta has been using BPF programs to monitor this function for years.
Adding a proper tracepoint provides a stable API for all users who need
to monitor TCP congestion window behavior.

Use DECLARE_TRACE instead of TRACE_EVENT to avoid creating trace event
infrastructure and exporting to tracefs, keeping the implementation
minimal. (Thanks Steven Rostedt)

Given that this patch creates a rawtracepoint, you could hook into it
using regular tooling, like bpftrace, using regular rawtracepoint
infrastructure, such as:

	rawtracepoint:tcp_cwnd_reduction_tp {
		....
	}

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250214-cwnd_tracepoint-v2-1-ef8d15162d95@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 15:29:53 +01:00
Paolo Abeni
8f17a6a861 Merge branch 'net-phy-marvell-88q2xxx-cleanup'
Dimitri Fedrau says:

====================
net: phy: marvell-88q2xxx: cleanup

- align defines
- order includes alphabetically
- enable temperature sensor in mv88q2xxx_config_init

Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
====================

Link: https://patch.msgid.link/20250214-marvell-88q2xxx-cleanup-v1-0-71d67c20f308@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 13:39:43 +01:00
Dimitri Fedrau
6c806720ba net: phy: marvell-88q2xxx: enable temperature sensor in mv88q2xxx_config_init
Temperature sensor gets enabled for 88Q222X devices in
mv88q222x_config_init. Move enabling to mv88q2xxx_config_init because
all 88Q2XXX devices support the temperature sensor.

Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 13:39:40 +01:00
Dimitri Fedrau
cbe0449e8f net: phy: marvell-88q2xxx: order includes alphabetically
Order includes alphabetically.

Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 13:39:40 +01:00
Dimitri Fedrau
8dcaed624f net: phy: marvell-88q2xxx: align defines
Align some defines.

Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 13:39:40 +01:00
Paolo Abeni
01072deab3 Merge branch 'vxlan-join-leave-mc-group-when-reconfigured'
Petr Machata says:

====================
vxlan: Join / leave MC group when reconfigured

When a vxlan netdevice is brought up, if its default remote is a multicast
address, the device joins the indicated group.

Therefore when the multicast remote address changes, the device should
leave the current group and subscribe to the new one. Similarly when the
interface used for endpoint communication is changed in a situation when
multicast remote is configured. This is currently not done.

Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is
possible that with such fix, the netdevice will end up in an inconsistent
situation where the old group is not joined anymore, but joining the
new group fails. Should we join the new group first, and leave the old one
second, we might end up in the opposite situation, where both groups are
joined. Undoing any of this during rollback is going to be similarly
problematic.

One solution would be to just forbid the change when the netdevice is up.
However in vnifilter mode, changing the group address is allowed, and these
problems are simply ignored (see vxlan_vni_update_group()):

 # ip link add name br up type bridge vlan_filtering 1
 # ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789
 # bridge vni add dev vx1 vni 200 group 224.0.0.1
 # tcpdump -i lo &
 # bridge vni add dev vx1 vni 200 group 224.0.0.2
 18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
 18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
 # bridge vni
 dev               vni                group/remote
 vx1               200                224.0.0.2

Having two different modes of operation for conceptually the same interface
is silly, so in this patchset, just do what the vnifilter code does and
deal with the errors by crossing fingers real hard.
====================

Link: https://patch.msgid.link/cover.1739548836.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 13:06:50 +01:00
Petr Machata
eae1e92a1d selftests: test_vxlan_fdb_changelink: Add a test for MC remote change
Changes to MC remote need to be reflected in actual group memberships.
Add a test to verify that it is the case.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 13:06:44 +01:00
Petr Machata
24adf47ea9 selftests: test_vxlan_fdb_changelink: Convert to lib.sh
Instead of inlining equivalents, use lib.sh-provided primitives.
Use defer to manage vx lifetime.

This will make it easier to extend the test in the next patch.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 13:06:44 +01:00
Petr Machata
f802f172d7 selftests: forwarding: lib: Move require_command to net, generalize
This helper could be useful to more than just forwarding tests.
Move it upstairs and port over to log_test_skip().

Split the function into two parts: the bit that actually checks and
reports skip, which is in a new function check_command(). And a bit
that exits the test script if the check fails. This allows users
consistent checking behavior while giving an option to bail out from
a single test without bailing out of the whole script.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 13:06:43 +01:00
Petr Machata
d42d543368 vxlan: Join / leave MC group after remote changes
When a vxlan netdevice is brought up, if its default remote is a multicast
address, the device joins the indicated group.

Therefore when the multicast remote address changes, the device should
leave the current group and subscribe to the new one. Similarly when the
interface used for endpoint communication is changed in a situation when
multicast remote is configured. This is currently not done.

Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is
possible that with such fix, the netdevice will end up in an inconsistent
situation where the old group is not joined anymore, but joining the new
group fails. Should we join the new group first, and leave the old one
second, we might end up in the opposite situation, where both groups are
joined. Undoing any of this during rollback is going to be similarly
problematic.

One solution would be to just forbid the change when the netdevice is up.
However in vnifilter mode, changing the group address is allowed, and these
problems are simply ignored (see vxlan_vni_update_group()):

 # ip link add name br up type bridge vlan_filtering 1
 # ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789
 # bridge vni add dev vx1 vni 200 group 224.0.0.1
 # tcpdump -i lo &
 # bridge vni add dev vx1 vni 200 group 224.0.0.2
 18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
 18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
 # bridge vni
 dev               vni                group/remote
 vx1               200                224.0.0.2

Having two different modes of operation for conceptually the same interface
is silly, so in this patch, just do what the vnifilter code does and deal
with the errors by crossing fingers real hard.

The vnifilter code leaves old before joining new, and in case of join /
leave failures does not roll back the configuration changes that have
already been applied, but bails out of joining if it could not leave. Do
the same here: leave before join, apply changes unconditionally and do not
attempt to join if we couldn't leave.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 13:06:43 +01:00
Petr Machata
5afb1596b9 vxlan: Drop 'changelink' parameter from vxlan_dev_configure()
vxlan_dev_configure() only has a single caller that passes false
for the changelink parameter. Drop the parameter and inline the
sole value.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 13:06:43 +01:00
Jason Xing
43130d02ba page_pool: avoid infinite loop to schedule delayed worker
We noticed the kworker in page_pool_release_retry() was waken
up repeatedly and infinitely in production because of the
buggy driver causing the inflight less than 0 and warning
us in page_pool_inflight()[1].

Since the inflight value goes negative, it means we should
not expect the whole page_pool to get back to work normally.

This patch mitigates the adverse effect by not rescheduling
the kworker when detecting the inflight negative in
page_pool_release_retry().

[1]
[Mon Feb 10 20:36:11 2025] ------------[ cut here ]------------
[Mon Feb 10 20:36:11 2025] Negative(-51446) inflight packet-pages
...
[Mon Feb 10 20:36:11 2025] Call Trace:
[Mon Feb 10 20:36:11 2025]  page_pool_release_retry+0x23/0x70
[Mon Feb 10 20:36:11 2025]  process_one_work+0x1b1/0x370
[Mon Feb 10 20:36:11 2025]  worker_thread+0x37/0x3a0
[Mon Feb 10 20:36:11 2025]  kthread+0x11a/0x140
[Mon Feb 10 20:36:11 2025]  ? process_one_work+0x370/0x370
[Mon Feb 10 20:36:11 2025]  ? __kthread_cancel_work+0x40/0x40
[Mon Feb 10 20:36:11 2025]  ret_from_fork+0x35/0x40
[Mon Feb 10 20:36:11 2025] ---[ end trace ebffe800f33e7e34 ]---
Note: before this patch, the above calltrace would flood the
dmesg due to repeated reschedule of release_dw kworker.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250214064250.85987-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 12:48:29 +01:00
Paolo Abeni
f7b5279b67 Merge branch 'sockmap-vsock-for-connectible-sockets-allow-only-connected'
Michal Luczaj says:

====================
sockmap, vsock: For connectible sockets allow only connected

Series deals with one more case of vsock surprising BPF/sockmap by being
inconsistency about (having an) assigned transport.

KASAN: null-ptr-deref in range [0x0000000000000120-0x0000000000000127]
CPU: 7 UID: 0 PID: 56 Comm: kworker/7:0 Not tainted 6.14.0-rc1+
Workqueue: vsock-loopback vsock_loopback_work
RIP: 0010:vsock_read_skb+0x4b/0x90
Call Trace:
 sk_psock_verdict_data_ready+0xa4/0x2e0
 virtio_transport_recv_pkt+0x1ca8/0x2acc
 vsock_loopback_work+0x27d/0x3f0
 process_one_work+0x846/0x1420
 worker_thread+0x5b3/0xf80
 kthread+0x35a/0x700
 ret_from_fork+0x2d/0x70
 ret_from_fork_asm+0x1a/0x30

This bug, similarly to commit f6abafcd32 ("vsock/bpf: return early if
transport is not assigned"), could be fixed with a single NULL check. But
instead, let's explore another approach: take a hint from
vsock_bpf_update_proto() and teach sockmap to accept only vsocks that are
already connected (no risk of transport being dropped or reassigned). At
the same time straight reject the listeners (vsock listening sockets do not
carry any transport anyway). This way BPF does not have to worry about
vsk->transport becoming NULL.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
====================

Link: https://patch.msgid.link/20250213-vsock-listen-sockmap-nullptr-v1-0-994b7cd2f16b@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 12:00:17 +01:00
Michal Luczaj
85928e9c43 selftest/bpf: Add vsock test for sockmap rejecting unconnected
Verify that for a connectible AF_VSOCK socket, merely having a transport
assigned is insufficient; socket must be connected for the sockmap to
accept.

This does not test datagram vsocks. Even though it hardly matters. VMCI is
the only transport that features VSOCK_TRANSPORT_F_DGRAM, but it has an
unimplemented vsock_transport::readskb() callback, making it unsupported by
BPF/sockmap.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 12:00:01 +01:00
Michal Luczaj
8350695bfb selftest/bpf: Adapt vsock_delete_on_close to sockmap rejecting unconnected
Commit 515745445e ("selftest/bpf: Add test for vsock removal from sockmap
on close()") added test that checked if proto::close() callback was invoked
on AF_VSOCK socket release. I.e. it verified that a close()d vsock does
indeed get removed from the sockmap.

It was done simply by creating a socket pair and attempting to replace a
close()d one with its peer. Since, due to a recent change, sockmap does not
allow updating index with a non-established connectible vsock, redo it with
a freshly established one.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 12:00:01 +01:00
Michal Luczaj
857ae05549 vsock/bpf: Warn on socket without transport
In the spirit of commit 91751e2482 ("vsock: prevent null-ptr-deref in
vsock_*[has_data|has_space]"), armorize the "impossible" cases with a
warning.

Fixes: 634f1a7110 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 12:00:01 +01:00
Michal Luczaj
8fb5bb169d sockmap, vsock: For connectible sockets allow only connected
sockmap expects all vsocks to have a transport assigned, which is expressed
in vsock_proto::psock_update_sk_prot(). However, there is an edge case
where an unconnected (connectible) socket may lose its previously assigned
transport. This is handled with a NULL check in the vsock/BPF recv path.

Another design detail is that listening vsocks are not supposed to have any
transport assigned at all. Which implies they are not supported by the
sockmap. But this is complicated by the fact that a socket, before
switching to TCP_LISTEN, may have had some transport assigned during a
failed connect() attempt. Hence, we may end up with a listening vsock in a
sockmap, which blows up quickly:

KASAN: null-ptr-deref in range [0x0000000000000120-0x0000000000000127]
CPU: 7 UID: 0 PID: 56 Comm: kworker/7:0 Not tainted 6.14.0-rc1+
Workqueue: vsock-loopback vsock_loopback_work
RIP: 0010:vsock_read_skb+0x4b/0x90
Call Trace:
 sk_psock_verdict_data_ready+0xa4/0x2e0
 virtio_transport_recv_pkt+0x1ca8/0x2acc
 vsock_loopback_work+0x27d/0x3f0
 process_one_work+0x846/0x1420
 worker_thread+0x5b3/0xf80
 kthread+0x35a/0x700
 ret_from_fork+0x2d/0x70
 ret_from_fork_asm+0x1a/0x30

For connectible sockets, instead of relying solely on the state of
vsk->transport, tell sockmap to only allow those representing established
connections. This aligns with the behaviour for AF_INET and AF_UNIX.

Fixes: 634f1a7110 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 12:00:00 +01:00
Paolo Abeni
b4cb730862 Merge branch 'add-af_xdp-support-for-cn10k'
Suman Ghosh says:

====================
Add af_xdp support for cn10k

This patchset includes changes to support AF_XDP for cn10k chipsets. Both
non-zero copy and zero copy will be supported after these changes. Also,
the RSS will be reconfigured once a particular receive queue is
added/removed to/from AF_XDP support.

Patch #1: octeontx2-pf: use xdp_return_frame() to free xdp buffers

Patch #2: octeontx2-pf: Add AF_XDP non-zero copy support

Patch #3: octeontx2-pf: AF_XDP zero copy receive support

Patch #4: octeontx2-pf: Reconfigure RSS table after enabling AF_XDP
zerocopy on rx queue

Patch #5: octeontx2-pf: Prepare for AF_XDP transmit

Patch #6: octeontx2-pf: AF_XDP zero copy transmit support
====================

Link: https://patch.msgid.link/20250213053141.2833254-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 11:36:30 +01:00
Suman Ghosh
53616af09b octeontx2-pf: AF_XDP zero copy transmit support
This patch implements below changes,

1. To avoid concurrency with normal traffic uses
   XDP queues.
2. Since there are chances that XDP and AF_XDP can
   fall under same queue uses separate flags to handle
   dma buffers.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18 11:36:27 +01:00