412 Commits

Author SHA1 Message Date
Linus Torvalds
2831fa8b8b Merge tag 'nfsd-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
 "Neil Brown and Jeff Layton contributed a dynamic thread pool sizing
  mechanism for NFSD. The sunrpc layer now tracks minimum and maximum
  thread counts per pool, and NFSD adjusts running thread counts based
  on workload: idle threads exit after a timeout when the pool exceeds
  its minimum, and new threads spawn automatically when all threads are
  busy. Administrators control this behavior via the nfsdctl netlink
  interface.

  Rick Macklem, FreeBSD NFS maintainer, generously contributed server-
  side support for the POSIX ACL extension to NFSv4, as specified in
  draft-ietf-nfsv4-posix-acls. This extension allows NFSv4 clients to
  get and set POSIX access and default ACLs using native NFSv4
  operations, eliminating the need for sideband protocols. The feature
  is gated by a Kconfig option since the IETF draft has not yet been
  ratified.

  Chuck Lever delivered numerous improvements to the xdrgen tool. Error
  reporting now covers parsing, AST transformation, and invalid
  declarations. Generated enum decoders validate incoming values against
  valid enumerator lists. New features include pass-through line support
  for embedding C directives in XDR specifications, 16-bit integer
  types, and program number definitions. Several code generation issues
  were also addressed.

  When an administrator revokes NFSv4 state for a filesystem via the
  unlock_fs interface, ongoing async COPY operations referencing that
  filesystem are now cancelled, with CB_OFFLOAD callbacks notifying
  affected clients.

  The remaining patches in this pull request are clean-ups and minor
  optimizations. Sincere thanks to all contributors, reviewers, testers,
  and bug reporters who participated in the v7.0 NFSD development cycle"

* tag 'nfsd-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits)
  NFSD: Add POSIX ACL file attributes to SUPPATTR bitmasks
  NFSD: Add POSIX draft ACL support to the NFSv4 SETATTR operation
  NFSD: Add support for POSIX draft ACLs for file creation
  NFSD: Add support for XDR decoding POSIX draft ACLs
  NFSD: Refactor nfsd_setattr()'s ACL error reporting
  NFSD: Do not allow NFSv4 (N)VERIFY to check POSIX ACL attributes
  NFSD: Add nfsd4_encode_fattr4_posix_access_acl
  NFSD: Add nfsd4_encode_fattr4_posix_default_acl
  NFSD: Add nfsd4_encode_fattr4_acl_trueform_scope
  NFSD: Add nfsd4_encode_fattr4_acl_trueform
  Add RPC language definition of NFSv4 POSIX ACL extension
  NFSD: Add a Kconfig setting to enable support for NFSv4 POSIX ACLs
  xdrgen: Implement pass-through lines in specifications
  nfsd: cancel async COPY operations when admin revokes filesystem state
  nfsd: add controls to set the minimum number of threads per pool
  nfsd: adjust number of running nfsd threads based on activity
  sunrpc: allow svc_recv() to return -ETIMEDOUT and -EBUSY
  sunrpc: split new thread creation into a separate function
  sunrpc: introduce the concept of a minimum number of threads per pool
  sunrpc: track the max number of requested threads in a pool
  ...
2026-02-12 08:23:53 -08:00
Paolo Abeni
83310d6133 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes in preparation for the net-next PR.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-11 15:14:35 +01:00
Matthieu Baerts (NGI0)
136f1e168f mptcp: fix kdoc warnings
The following warnings were visible:

  $ ./scripts/kernel-doc -Wall -none \
        net/mptcp/ include/net/mptcp.h include/uapi/linux/mptcp*.h \
        include/trace/events/mptcp.h
  Warning: net/mptcp/token.c:108 No description found for return value of 'mptcp_token_new_request'
  Warning: net/mptcp/token.c:151 No description found for return value of 'mptcp_token_new_connect'
  Warning: net/mptcp/token.c:246 No description found for return value of 'mptcp_token_get_sock'
  Warning: net/mptcp/token.c:298 No description found for return value of 'mptcp_token_iter_next'
  Warning: net/mptcp/protocol.c:4431 No description found for return value of 'mptcp_splice_read'
  Warning: include/uapi/linux/mptcp_pm.h:13 missing initial short description on line:
   * enum mptcp_event_type

Address all of them: either by using the 'Return:' keyword, or by adding
a missing initial short description.

The MPTCP CI will soon report issues with kdoc to avoid introducing new
issues and being flagged by the Netdev CI.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260205-net-mptcp-misc-fixes-6-19-rc8-v2-3-c2720ce75c34@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06 20:35:06 -08:00
Ivan Vecera
bc443c253f dpll: expose fractional frequency offset in ppt
Currently, the dpll subsystem exports the fractional frequency offset
(FFO) in parts per million (ppm). This granularity is insufficient for
high-precision synchronization scenarios which often require parts per
trillion (ppt) resolution.

Add a new netlink attribute DPLL_A_PIN_FRACTIONAL_FREQUENCY_OFFSET_PPT
to expose the FFO in ppt.

Update the dpll netlink core to expect the driver-provided FFO value
to be in ppt. To maintain backward compatibility with existing userspace
tools, populate the legacy DPLL_A_PIN_FRACTIONAL_FREQUENCY_OFFSET
attribute by dividing the new ppt value by 1,000,000.

Update the zl3073x and mlx5 drivers to provide the FFO value in ppt:
- zl3073x: adjust the fixed-point calculation to produce ppt (10^12)
  instead of ppm (10^6).
- mlx5: scale the existing ppm value by 1,000,000.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260126162253.27890-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-29 18:21:16 -08:00
Jeff Layton
d8316b837c nfsd: add controls to set the minimum number of threads per pool
Add a new "min_threads" variable to the nfsd_net, along with the
corresponding netlink interface, to set that value from userland.
Pass that value to svc_set_pool_threads() and svc_set_num_threads().

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Paolo Abeni
ba1b8c97b9 geneve: add netlink support for GRO hint
Allow configuring and dumping the new device option, and cache its value
into the geneve socket itself.
The new option is not tie to it any code yet.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/2295d4e4d1e919a3189425141bbc71c7850a2de0.1769011015.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-23 11:31:14 -08:00
Jakub Kicinski
9abf22075d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.19-rc7).

Conflicts:

drivers/net/ethernet/huawei/hinic3/hinic3_irq.c
  b35a6fd37a ("hinic3: Add adaptive IRQ coalescing with DIM")
  fb2bb2a1eb ("hinic3: Fix netif_queue_set_napi queue_index input parameter error")
https://lore.kernel.org/fc0a7fdf08789a52653e8ad05281a0a849e79206.1768915707.git.zhuyikai1@h-partners.com

drivers/net/wireless/ath/ath12k/mac.c
drivers/net/wireless/ath/ath12k/wifi7/hw.c
  3170757210 ("wifi: ath12k: Fix wrong P2P device link id issue")
  c26f294fef ("wifi: ath12k: Move ieee80211_ops callback to the arch specific module")
https://lore.kernel.org/20260114123751.6a208818@canb.auug.org.au

Adjacent changes:

drivers/net/wireless/ath/ath12k/mac.c
  8b8d6ee53d ("wifi: ath12k: Fix scan state stuck in ABORTING after cancel_remain_on_channel")
  914c890d3b ("wifi: ath12k: Add framework for hardware specific ieee80211_ops registration")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-22 20:14:36 -08:00
Linus Torvalds
0a80e38d0f Merge tag 'net-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from CAN and wireless.

  Pretty big, but hard to make up any cohesive story that would explain
  it, a random collection of fixes. The two reverts of bad patches from
  this release here feel like stuff that'd normally show up by rc5 or
  rc6. Perhaps obvious thing to say, given the holiday timing.

  That said, no active investigations / regressions. Let's see what the
  next week brings.

  Current release - fix to a fix:

   - can: alloc_candev_mqs(): add missing default CAN capabilities

  Current release - regressions:

   - usbnet: fix crash due to missing BQL accounting after resume

   - Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not ...

  Previous releases - regressions:

   - Revert "nfc/nci: Add the inconsistency check between the input ...

  Previous releases - always broken:

   - number of driver fixes for incorrect use of seqlocks on stats

   - rxrpc: fix recvmsg() unconditional requeue, don't corrupt rcv queue
     when MSG_PEEK was set

   - ipvlan: make the addrs_lock be per port avoid races in the port
     hash table

   - sched: enforce that teql can only be used as root qdisc

   - virtio: coalesce only linear skb

   - wifi: ath12k: fix dead lock while flushing management frames

   - eth: igc: reduce TSN TX packet buffer from 7KB to 5KB per queue"

* tag 'net-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits)
  Octeontx2-af: Add proper checks for fwdata
  dpll: Prevent duplicate registrations
  net/sched: act_ife: avoid possible NULL deref
  hinic3: Fix netif_queue_set_napi queue_index input parameter error
  vsock/test: add stream TX credit bounds test
  vsock/virtio: cap TX credit to local buffer size
  vsock/test: fix seqpacket message bounds test
  vsock/virtio: fix potential underflow in virtio_transport_get_credit()
  net: fec: account for VLAN header in frame length calculations
  net: openvswitch: fix data race in ovs_vport_get_upcall_stats
  octeontx2-af: Fix error handling
  net: pcs: pcs-mtk-lynxi: report in-band capability for 2500Base-X
  rxrpc: Fix data-race warning and potential load/store tearing
  net: dsa: fix off-by-one in maximum bridge ID determination
  net: bcmasp: Fix network filter wake for asp-3.0
  bonding: provide a net pointer to __skb_flow_dissect()
  selftests: net: amt: wait longer for connection before sending packets
  be2net: Fix NULL pointer dereference in be_cmd_get_mac_from_list
  Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not-at-end warning"
  netrom: fix double-free in nr_route_frame()
  ...
2026-01-22 09:32:11 -08:00
Jakub Kicinski
8766d61a1d Revert "Merge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp'"
This reverts commit 77b9c4a438, reversing
changes made to 4515ec4ad5:

 931420a2fc ("selftests/net: Add netkit container tests")
 ab771c938d ("selftests/net: Make NetDrvContEnv support queue leasing")
 6be87fbb27 ("selftests/net: Add env for container based tests")
 61d99ce3df ("selftests/net: Add bpf skb forwarding program")
 920da36341 ("netkit: Add xsk support for af_xdp applications")
 eef51113f8 ("netkit: Add netkit notifier to check for unregistering devices")
 b5ef109d22 ("netkit: Implement rtnl_link_ops->alloc and ndo_queue_create")
 b5c3fa4a0b ("netkit: Add single device mode for netkit")
 0073d2fd67 ("xsk: Proxy pool management for leased queues")
 1ecea95dd3 ("xsk: Extend xsk_rcv_check validation")
 804bf334d0 ("net: Proxy netdev_queue_get_dma_dev for leased queues")
 0caa9a8dde ("net: Proxy net_mp_{open,close}_rxq for leased queues")
 ff8889ff91 ("net, ethtool: Disallow leased real rxqs to be resized")
 9e2103f361 ("net: Add lease info to queue-get response")
 31127dedde ("net: Implement netdev_nl_queue_create_doit")
 a5546e18f7 ("net: Add queue-create operation")

The series will conflict with io_uring work, and the code needs more
polish.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20 18:06:01 -08:00
Daniel Borkmann
a5546e18f7 net: Add queue-create operation
Add a ynl netdev family operation called queue-create that creates a
new queue on a netdevice:

      name: queue-create
      attribute-set: queue
      flags: [admin-perm]
      do:
        request:
          attributes:
            - ifindex
            - type
            - lease
        reply: &queue-create-op
          attributes:
            - id

This is a generic operation such that it can be extended for various
use cases in future. Right now it is mandatory to specify ifindex,
the queue type which is enforced to rx and a lease. The newly created
queue id is returned to the caller.

A queue from a virtual device can have a lease which refers to another
queue from a physical device. This is useful for memory providers
and AF_XDP operations which take an ifindex and queue id to allow
applications to bind against virtual devices in containers. The lease
couples both queues together and allows to proxy the operations from
a virtual device in a container to the physical device.

In future, the nested lease attribute can be lifted and made optional
for other use-cases such as dynamic queue creation for physical
netdevs. The lack of lease and the specification of the physical
device as an ifindex will imply that we need a real queue to be
allocated. Similarly, the queue type enforcement to rx can then be
lifted as well to support tx.

An early implementation had only driver-specific integration [0], but
in order for other virtual devices to reuse, it makes sense to have
this as a generic API in core net.

For leasing queues, the virtual netdev must have real_num_rx_queue
less than num_rx_queues at the time of calling queue-create. The
queue-type must be rx as only rx queues are supported for leasing
for now. We also enforce that the queue-create ifindex must point
to a virtual device, and that the nested lease attribute's ifindex
must point to a physical device. The nested lease attribute set
contains a netns-id attribute which is currently only intended for
dumping as part of the queue-get operation. Also, it is modeled as
an s32 type similarly as done elsewhere in the stack.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf [0]
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-2-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20 11:58:49 +01:00
Ivan Vecera
e3f6c65192 dpll: add dpll_device op to set working mode
Currently, userspace can retrieve the DPLL working mode but cannot
configure it. This prevents changing the device operation, such as
switching from manual to automatic mode and vice versa.

Add a new callback .mode_set() to struct dpll_device_ops. Extend
the netlink policy and device-set command handling to process
the DPLL_A_MODE attribute.  Update the netlink YAML specification
to include the mode attribute in the device-set operation.

Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260114122726.120303-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19 12:04:53 -08:00
Kuniyuki Iwashima
7a9bc9e3f4 fou: Don't allow 0 for FOU_ATTR_IPPROTO.
fou_udp_recv() has the same problem mentioned in the previous
patch.

If FOU_ATTR_IPPROTO is set to 0, skb is not freed by
fou_udp_recv() nor "resubmit"-ted in ip_protocol_deliver_rcu().

Let's forbid 0 for FOU_ATTR_IPPROTO.

Fixes: 23461551c0 ("fou: Support for foo-over-udp RX path")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260115172533.693652-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-17 16:00:24 -08:00
Rafael J. Wysocki
d51e68b700 Merge branch 'pm-em'
Merge fixes related to the energy model management for 6.19-rc6:

 - Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout)

 - Fix stale description of the cost field in struct em_perf_state to
   reflect the current code (Yaxiong Tian)

 - Fix and revamp the energy model YNL specification added recently
   along with the energy model netlink interface (Changwoo Min)

* pm-em:
  PM: EM: Add dump to get-perf-domains in the EM YNL spec
  PM: EM: Change cpus' type from string to u64 array in the EM YNL spec
  PM: EM: Rename em.yaml to dev-energymodel.yaml
  PM: EM: Fix yamllint warnings in the EM YNL spec
  PM: EM: Fix memory leak in em_create_pd() error path
  PM: EM: Fix incorrect description of the cost field in struct em_perf_state
2026-01-16 16:16:24 +01:00
Jonas Köppeler
1bddd758ba net/sched: sch_cake: share shaper state across sub-instances of cake_mq
This commit adds shared shaper state across the cake instances beneath a
cake_mq qdisc. It works by periodically tracking the number of active
instances, and scaling the configured rate by the number of active
queues.

The scan is lockless and simply reads the qlen and the last_active state
variable of each of the instances configured beneath the parent cake_mq
instance. Locking is not required since the values are only updated by
the owning instance, and eventual consistency is sufficient for the
purpose of estimating the number of active queues.

The interval for scanning the number of active queues is set to 200 us.
We found this to be a good tradeoff between overhead and response time.
For a detailed analysis of this aspect see the Netdevconf talk:

https://netdevconf.info/0x19/docs/netdev-0x19-paper16-talk-paper.pdf

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-5-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-13 11:54:29 +01:00
Changwoo Min
380ff27af2 PM: EM: Add dump to get-perf-domains in the EM YNL spec
Add dump to get-perf-domains, so that a user can fetch either information
about a specific performance domain with do or information about all
performance domains with dump. Share the reply format of do and dump using
perf-domain-attrs, so remove perf-domains. The YNL spec, autogenerated
files, and the do implementation are updated, and the dump implementation
is added.

Suggested-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://patch.msgid.link/20260108053212.642478-5-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-09 21:44:46 +01:00
Changwoo Min
d29b900cf4 PM: EM: Change cpus' type from string to u64 array in the EM YNL spec
Previously, the cpus attribute was a string format which was a "%*pb"
stringification of a bitmap. That is not very consumable for a UAPI,
so let’s change it to an u64 array of CPU ids.

Suggested-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://patch.msgid.link/20260108053212.642478-4-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-09 21:44:46 +01:00
Changwoo Min
caa07a815d PM: EM: Rename em.yaml to dev-energymodel.yaml
The EM YNL specification used many acronyms, including ‘em’, ‘pd’,
‘ps’, etc. While the acronyms are short and convenient, they could be
confusing. So, let’s spell them out to be more specific. The following
changes were made in the spec. Note that the protocol name cannot exceed
GENL_NAMSIZ (16).

  em           -> dev-energymodel
  pds          -> perf-domains
  pd           -> perf-domain
  pd-id        -> perf-domain-id
  pd-table     -> perf-table
  ps           -> perf-state
  get-pds      -> get-perf-domains
  get-pd-table -> get-perf-table
  pd-created   -> perf-domain-created
  pd-updated   -> perf-domain-updated
  pd-deleted   -> perf-domain-deleted

In addition. doc strings were added to the spec. based on the comments in
energy_model.h. Two flag attributes (perf-state-flags and
perf-domain-flags) were added for easily interpreting the bit flags.

Finally, the autogenerated files and em_netlink.c were updated accordingly
to reflect the name changes.

Suggested-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://patch.msgid.link/20260108053212.642478-3-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-09 21:44:46 +01:00
Changwoo Min
ebabc32ddb PM: EM: Fix yamllint warnings in the EM YNL spec
The energy model YNL spec has the following two warnings
when checking with yamlint:

 3:1    warning missing document start "---"  (document-start)
 107:13 error   wrong indentation: expected 10 but found 12  (indentation)

So let’s fix whose lint warnings.

Fixes: bd26631ccd ("PM: EM: Add em.yaml and autogen files")
Suggested-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://patch.msgid.link/20260108053212.642478-2-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-09 21:44:46 +01:00
Jakub Kicinski
86c22d475c netlink: specs: netdev: clarify the page pool API a little
The phrasing of the page-pool-get doc is very confusing.
It's supposed to highlight that support depends on the driver
doing its part but it sounds like orphaned page pools won't
be visible.

The description of the ifindex is completely wrong.
We move the page pool to loopback and skip the attribute if
ifindex is loopback.

Link: https://lore.kernel.org/20260104084347.5de3a537@kernel.org
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/20260104165232.710460-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-05 16:49:54 -08:00
Linus Torvalds
8f7aa3d3c7 Merge tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Replace busylock at the Tx queuing layer with a lockless list.

     Resulting in a 300% (4x) improvement on heavy TX workloads, sending
     twice the number of packets per second, for half the cpu cycles.

   - Allow constantly busy flows to migrate to a more suitable CPU/NIC
     queue.

     Normally we perform queue re-selection when flow comes out of idle,
     but under extreme circumstances the flows may be constantly busy.

     Add sysctl to allow periodic rehashing even if it'd risk packet
     reordering.

   - Optimize the NAPI skb cache, make it larger, use it in more paths.

   - Attempt returning Tx skbs to the originating CPU (like we already
     did for Rx skbs).

   - Various data structure layout and prefetch optimizations from Eric.

   - Remove ktime_get() from the recvmsg() fast path, ktime_get() is
     sadly quite expensive on recent AMD machines.

   - Extend threaded NAPI polling to allow the kthread busy poll for
     packets.

   - Make MPTCP use Rx backlog processing. This lowers the lock
     pressure, improving the Rx performance.

   - Support memcg accounting of MPTCP socket memory.

   - Allow admin to opt sockets out of global protocol memory accounting
     (using a sysctl or BPF-based policy). The global limits are a poor
     fit for modern container workloads, where limits are imposed using
     cgroups.

   - Improve heuristics for when to kick off AF_UNIX garbage collection.

   - Allow users to control TCP SACK compression, and default to 33% of
     RTT.

   - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
     unnecessarily aggressive rcvbuf growth and overshot when the
     connection RTT is low.

   - Preserve skb metadata space across skb_push / skb_pull operations.

   - Support for IPIP encapsulation in the nftables flowtable offload.

   - Support appending IP interface information to ICMP messages (RFC
     5837).

   - Support setting max record size in TLS (RFC 8449).

   - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.

   - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.

   - Let users configure the number of write buffers in SMC.

   - Add new struct sockaddr_unsized for sockaddr of unknown length,
     from Kees.

   - Some conversions away from the crypto_ahash API, from Eric Biggers.

   - Some preparations for slimming down struct page.

   - YAML Netlink protocol spec for WireGuard.

   - Add a tool on top of YAML Netlink specs/lib for reporting commonly
     computed derived statistics and summarized system state.

  Driver API:

   - Add CAN XL support to the CAN Netlink interface.

   - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
     defined by the OPEN Alliance's "Advanced diagnostic features for
     100BASE-T1 automotive Ethernet PHYs" specification.

   - Add DPLL phase-adjust-gran pin attribute (and implement it in
     zl3073x).

   - Refactor xfrm_input lock to reduce contention when NIC offloads
     IPsec and performs RSS.

   - Add info to devlink params whether the current setting is the
     default or a user override. Allow resetting back to default.

   - Add standard device stats for PSP crypto offload.

   - Leverage DSA frame broadcast to implement simple HSR frame
     duplication for a lot of switches without dedicated HSR offload.

   - Add uAPI defines for 1.6Tbps link modes.

  Device drivers:

   - Add Motorcomm YT921x gigabit Ethernet switch support.

   - Add MUCSE driver for N500/N210 1GbE NIC series.

   - Convert drivers to support dedicated ops for timestamping control,
     and away from the direct IOCTL handling. While at it support GET
     operations for PHY timestamping.

   - Add (and convert most drivers to) a dedicated ethtool callback for
     reading the Rx ring count.

   - Significant refactoring efforts in the STMMAC driver, which
     supports Synopsys turn-key MAC IP integrated into a ton of SoCs.

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support PPS in/out on all pins
      - Intel (100G, ice, idpf):
         - ice: implement standard ethtool and timestamping stats
         - i40e: support setting the max number of MAC addresses per VF
         - iavf: support RSS of GTP tunnels for 5G and LTE deployments
      - nVidia/Mellanox (mlx5):
         - reduce downtime on interface reconfiguration
         - disable being an XDP redirect target by default (same as
           other drivers) to avoid wasting resources if feature is
           unused
      - Meta (fbnic):
         - add support for Linux-managed PCS on 25G, 50G, and 100G links
      - Wangxun:
         - support Rx descriptor merge, and Tx head writeback
         - support Rx coalescing offload
         - support 25G SPF and 40G QSFP modules

   - Ethernet virtual:
      - Google (gve):
         - allow ethtool to configure rx_buf_len
         - implement XDP HW RX Timestamping support for DQ descriptor
           format
      - Microsoft vNIC (mana):
         - support HW link state events
         - handle hardware recovery events when probing the device

   - Ethernet NICs consumer, and embedded:
      - usbnet: add support for Byte Queue Limits (BQL)
      - AMD (amd-xgbe):
         - add device selftests
      - NXP (enetc):
         - add i.MX94 support
      - Broadcom integrated MACs (bcmgenet, bcmasp):
         - bcmasp: add support for PHY-based Wake-on-LAN
      - Broadcom switches (b53):
         - support port isolation
         - support BCM5389/97/98 and BCM63XX ARL formats
      - Lantiq/MaxLinear switches:
         - support bridge FDB entries on the CPU port
         - use regmap for register access
         - allow user to enable/disable learning
         - support Energy Efficient Ethernet
         - support configuring RMII clock delays
         - add tagging driver for MaxLinear GSW1xx switches
      - Synopsys (stmmac):
         - support using the HW clock in free running mode
         - add Eswin EIC7700 support
         - add Rockchip RK3506 support
         - add Altera Agilex5 support
      - Cadence (macb):
         - cleanup and consolidate descriptor and DMA address handling
         - add EyeQ5 support
      - TI:
         - icssg-prueth: support AF_XDP
      - Airoha access points:
         - add missing Ethernet stats and link state callback
         - add AN7583 support
         - support out-of-order Tx completion processing
      - Power over Ethernet:
         - pd692x0: preserve PSE configuration across reboots
         - add support for TPS23881B devices

   - Ethernet PHYs:
      - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
      - Support 50G SerDes and 100G interfaces in Linux-managed PHYs
      - micrel:
         - support for non PTP SKUs of lan8814
         - enable in-band auto-negotiation on lan8814
      - realtek:
         - cable testing support on RTL8224
         - interrupt support on RTL8221B
      - motorcomm: support for PHY LEDs on YT853
      - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
      - mscc: support for PHY LED control

   - CAN drivers:
      - m_can: add support for optional reset and system wake up
      - remove can_change_mtu() obsoleted by core handling
      - mcp251xfd: support GPIO controller functionality

   - Bluetooth:
      - add initial support for PASTa

   - WiFi:
      - split ieee80211.h file, it's way too big
      - improvements in VHT radiotap reporting, S1G, Channel Switch
        Announcement handling, rate tracking in mesh networks
      - improve multi-radio monitor mode support, and add a cfg80211
        debugfs interface for it
      - HT action frame handling on 6 GHz
      - initial chanctx work towards NAN
      - MU-MIMO sniffer improvements

   - WiFi drivers:
      - RealTek (rtw89):
         - support USB devices RTL8852AU and RTL8852CU
         - initial work for RTL8922DE
         - improved injection support
      - Intel:
         - iwlwifi: new sniffer API support
      - MediaTek (mt76):
         - WED support for >32-bit DMA
         - airoha NPU support
         - regdomain improvements
         - continued WiFi7/MLO work
      - Qualcomm/Atheros:
         - ath10k: factory test support
         - ath11k: TX power insertion support
         - ath12k: BSS color change support
         - ath12k: statistics improvements
      - brcmfmac: Acer A1 840 tablet quirk
      - rtl8xxxu: 40 MHz connection fixes/support"

* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
  net: page_pool: sanitise allocation order
  net: page pool: xa init with destroy on pp init
  net/mlx5e: Support XDP target xmit with dummy program
  net/mlx5e: Update XDP features in switch channels
  selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
  net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
  wireguard: netlink: generate netlink code
  wireguard: uapi: generate header with ynl-gen
  wireguard: uapi: move flag enums
  wireguard: uapi: move enum wg_cmd
  wireguard: netlink: add YNL specification
  selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
  selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
  selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
  selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
  selftests: drv-net: introduce Iperf3Runner for measurement use cases
  selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
  net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
  Documentation: net: dsa: mention simple HSR offload helpers
  Documentation: net: dsa: mention availability of RedBox
  ...
2025-12-03 17:24:33 -08:00
Asbjørn Sloth Tønnesen
6b0f4ca079 wireguard: netlink: add YNL specification
This patch adds a near[1] complete YNL specification for WireGuard,
documenting the protocol in a machine-readable format, rather than
comments in wireguard.h, and eases usage from C and non-C programming
languages alike.

The generated C library will be featured in a later patch, so in
this patch I will use the in-kernel python client for examples.

This makes the documentation in the UAPI header redundant, it is
therefore removed. The in-line documentation in the spec is based
on the existing comment in wireguard.h, and once released it will
be available in the kernel documentation at:
  https://docs.kernel.org/netlink/specs/wireguard.html
  (until then run: make htmldocs)

Generate wireguard.rst from this spec:
$ make -C tools/net/ynl/generated/ wireguard.rst

Query wireguard interface through pyynl:
$ sudo ./tools/net/ynl/pyynl/cli.py --family wireguard \
                                    --dump get-device \
                                    --json '{"ifindex":3}'
[{'fwmark': 0,
  'ifindex': 3,
  'ifname': 'wg-test',
  'listen-port': 54318,
  'peers': [{0: {'allowedips': [{0: {'cidr-mask': 0,
                                     'family': 2,
                                     'ipaddr': '0.0.0.0'}},
                                {0: {'cidr-mask': 0,
                                     'family': 10,
                                     'ipaddr': '::'}}],
                 'endpoint': b'[...]',
                 'last-handshake-time': {'nsec': 42, 'sec': 42},
                 'persistent-keepalive-interval': 42,
                 'preshared-key': '[...]',
                 'protocol-version': 1,
                 'public-key': '[...]',
                 'rx-bytes': 42,
                 'tx-bytes': 42}}],
  'private-key': '[...]',
  'public-key': '[...]'}]

Add another allowed IP prefix:
$ sudo ./tools/net/ynl/pyynl/cli.py --family wireguard \
  --do set-device --json '{"ifindex":3,"peers":[
    {"public-key":"6a df b1 83 a4 ..","allowedips":[
      {"cidr-mask":0,"family":10,"ipaddr":"::"}]}]}'

[1] As can be seen above, the "endpoint" is only dumped as binary data,
    as it can't be fully described in YNL. It's either a struct
    sockaddr_in or struct sockaddr_in6 depending on the attribute length.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2025-12-02 04:12:19 +01:00
Donald Hunter
1adc241f39 ynl: fix schema check errors
Fix two schema check errors that have lurked since the attribute name
validation was made more strict:

not ok 2 conntrack.yaml schema validation
'labels mask' does not match '^[0-9a-z-]+$'

not ok 13 nftables.yaml schema validation
'set id' does not match '^[0-9a-z-]+$'

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20251127123502.89142-5-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 19:53:20 -08:00
Donald Hunter
acce9d7200 ynl: fix a yamllint warning in ethtool spec
Fix warning reported by yamllint:

../../../Documentation/netlink/specs/ethtool.yaml
  1272:21   warning  truthy value should be one of [false, true]  (truthy)

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20251127123502.89142-4-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 19:53:19 -08:00
Rafael J. Wysocki
638757c9c9 Merge branches 'pm-em' and 'pm-opp'
Merge energy model management updates and operating performance points
(OPP) library changes for 6.19-rc1:

 - Add support for sending netlink notifications to user space on energy
   model updates (Changwoo Mini, Peng Fan)

 - Minor improvements to the Rust OPP interface (Tamir Duberstein)

 - Fixes to scope-based pointers in the OPP library (Viresh Kumar)

* pm-em:
  PM: EM: Add to em_pd_list only when no failure
  PM: EM: Notify an event when the performance domain changes
  PM: EM: Implement em_notify_pd_created/updated()
  PM: EM: Implement em_notify_pd_deleted()
  PM: EM: Implement em_nl_get_pd_table_doit()
  PM: EM: Implement em_nl_get_pds_doit()
  PM: EM: Add an iterator and accessor for the performance domain
  PM: EM: Add a skeleton code for netlink notification
  PM: EM: Add em.yaml and autogen files
  PM: EM: Expose the ID of a performance domain via debugfs
  PM: EM: Assign a unique ID when creating a performance domain

* pm-opp:
  rust: opp: simplify callers of `to_c_str_array`
  OPP: Initialize scope-based pointers inline
  rust: opp: fix broken rustdoc link
2025-11-28 16:44:00 +01:00
Hangbin Liu
651765e8d5 netlink: specs: add big-endian byte-order for u32 IPv4 addresses
The fix commit converted several IPv4 address attributes from binary
to u32, but forgot to specify byte-order: big-endian. Without this,
YNL tools display IPv4 addresses incorrectly due to host-endian
interpretation.

Add the missing byte-order: big-endian to all affected u32 IPv4
address fields to ensure correct parsing and display.

Fixes: 1064d521d1 ("netlink: specs: support ipv4-or-v6 for dual-stack fields")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Link: https://patch.msgid.link/20251125112048.37631-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-26 17:14:17 -08:00
Daniel Zahka
2a367002ed devlink: support default values for param-get and param-set
Support querying and resetting to default param values.

Introduce two new devlink netlink attrs:
DEVLINK_ATTR_PARAM_VALUE_DEFAULT and
DEVLINK_ATTR_PARAM_RESET_DEFAULT. The former is used to contain an
optional parameter value inside of the param_value nested
attribute. The latter is used in param-set requests from userspace to
indicate that the driver should reset the param to its default value.

To implement this, two new functions are added to the devlink driver
api: devlink_param::get_default() and
devlink_param::reset_default(). These callbacks allow drivers to
implement default param actions for runtime and permanent cmodes. For
driverinit params, the core latches the last value set by a driver via
devl_param_driverinit_value_set(), and uses that as the default value
for a param.

Because default parameter values are optional, it would be impossible
to discern whether or not a param of type bool has default value of
false or not provided if the default value is encoded using a netlink
flag type. For this reason, when a DEVLINK_PARAM_TYPE_BOOL has an
associated default value, the default value is encoded using a u8
type.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251119025038.651131-4-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 19:01:22 -08:00
Hangbin Liu
1064d521d1 netlink: specs: support ipv4-or-v6 for dual-stack fields
Since commit 1b255e1bea ("tools: ynl: add ipv4-or-v6 display hint"), we
can display either IPv4 or IPv6 addresses for a single field based on the
address family. However, most dual-stack fields still use the ipv4 display
hint. This update changes them to use the new ipv4-or-v6 display hint and
converts IPv4-only fields to use the u32 type.

Field changes:
  - v4-or-v6
    - IFA_ADDRESS, IFA_LOCAL
    - IFLA_GRE_LOCAL, IFLA_GRE_REMOTE
    - IFLA_VTI_LOCAL, IFLA_VTI_REMOTE
    - IFLA_IPTUN_LOCAL, IFLA_IPTUN_REMOTE
    - NDA_DST
    - RTA_DST, RTA_SRC, RTA_GATEWAY, RTA_PREFSRC
    - FRA_SRC, FRA_DST
  - ipv4
    - IFA_BROADCAST
    - IFLA_GENEVE_REMOTE
    - IFLA_IPTUN_6RD_RELAY_PREFIX

Reviewed-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20251117024457.3034-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-18 18:42:10 -08:00
Felix Maurer
c294432be1 netlink: specs: rt-link: Add attributes for hsr
YNL wasn't able to decode the linkinfo from hsr interfaces. Add the
linkinfo attribute definitions for hsr interfaces. Example output now
looks like this:

$ ynl --spec Documentation/netlink/specs/rt-link.yaml --do getlink \
    --json '{"ifname": "hsr0"}' --output-json | jq .linkinfo
{
  "kind": "hsr",
  "data": {
    "slave1": 15,
    "slave2": 13,
    "supervision-addr": "01:15:4e:00:01:00",
    "seq-nr": 64511,
    "version": 1,
    "protocol": 0
  }
}

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Link: https://patch.msgid.link/926077a70de614f1539c905d06515e258905255e.1762968225.git.fmaurer@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-13 17:46:31 -08:00
Saeed Mahameed
0e535824d0 devlink: Introduce switchdev_inactive eswitch mode
Adds DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE attribute to UAPI and
documentation.

Before having traffic flow through an eswitch, a user may want to have the
ability to block traffic towards the FDB until FDB is fully programmed and
the user is ready to send traffic to it. For example: when two eswitches
are present for vports in a multi-PF setup, one eswitch may take over the
traffic from the other when the user chooses.
Before this take over, a user may want to first program the inactive
eswitch and then once ready redirect traffic to this new eswitch.

switchdev modes transition semantics:

legacy->switchdev_inactive: Create switchdev mode normally, traffic not
  allowed to flow yet.

switchdev_inactive->switchdev: Enable traffic to flow.

switchdev->switchdev_inactive: Block traffic on the FDB, FDB and
  representros state and content is preserved.

When eswitch is configured to this mode, traffic is ignored/dropped on
this eswitch FDB, while current configuration is kept, e.g FDB rules and
netdev representros are kept available, FDB programming is allowed.

Example:
 # start inactive switchdev
devlink dev eswitch set pci/0000:08:00.1 mode switchdev_inactive
 # setup TC rules, representors etc ..
 # activate
devlink dev eswitch set pci/0000:08:00.1 mode switchdev

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20251108070404.1551708-2-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-11 13:17:53 +01:00
Jakub Kicinski
f05d26198c psp: add stats from psp spec to driver facing api
Provide a driver api for reporting device statistics required by the
"Implementation Requirements" section of the PSP Architecture
Specification. Use a warning to ensure drivers report stats required
by the spec.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251106002608.1578518-4-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07 18:53:57 -08:00
Jakub Kicinski
dae4a92399 psp: report basic stats from the core
Track and report stats common to all psp devices from the core. A
'stale-event' is when the core marks the rx state of an active
psp_assoc as incapable of authenticating psp encapsulated data.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251106002608.1578518-2-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07 18:53:56 -08:00
Jakub Kicinski
c6934c4e04 netlink: specs: netdev add missing stats to qstat-get
Add missing entries in C attribute list.

Link: https://patch.msgid.link/20251104232348.1954349-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07 08:00:23 -08:00
Oleksij Rempel
e6e93fb013 ethtool: netlink: add ETHTOOL_MSG_MSE_GET and wire up PHY MSE access
Introduce the userspace entry point for PHY MSE diagnostics via
ethtool netlink. This exposes the core API added previously and
returns both capability information and one or more snapshots.

Userspace sends ETHTOOL_MSG_MSE_GET. The reply carries:
- ETHTOOL_A_MSE_CAPABILITIES: scale limits and timing information
- ETHTOOL_A_MSE_CHANNEL_* nests: one or more snapshots (per-channel
  if available, otherwise WORST, otherwise LINK)

Link down returns -ENETDOWN.

Changes:
  - YAML: add attribute sets (mse, mse-capabilities, mse-snapshot)
    and the mse-get operation
  - UAPI (generated): add ETHTOOL_A_MSE_* enums and message IDs,
    ETHTOOL_MSG_MSE_GET/REPLY
  - ethtool core: add net/ethtool/mse.c implementing the request,
    register genl op, and hook into ethnl dispatch
  - docs: document MSE_GET in ethtool-netlink.rst

The include/uapi/linux/ethtool_netlink_generated.h is generated
from Documentation/netlink/specs/ethtool.yaml.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20251027122801.982364-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-03 18:32:27 -08:00
Samiullah Khawaja
c18d4b190a net: Extend NAPI threaded polling to allow kthread based busy polling
Add a new state NAPI_STATE_THREADED_BUSY_POLL to the NAPI state enum to
enable and disable threaded busy polling.

When threaded busy polling is enabled for a NAPI, enable
NAPI_STATE_THREADED also.

When the threaded NAPI is scheduled, set NAPI_STATE_IN_BUSY_POLL to
signal napi_complete_done not to rearm interrupts.

Whenever NAPI_STATE_THREADED_BUSY_POLL is unset, the
NAPI_STATE_IN_BUSY_POLL will be unset, napi_complete_done unsets the
NAPI_STATE_SCHED_THREADED bit also, which in turn will make the kthread
go to sleep.

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Martin Karsten <mkarsten@uwaterloo.ca>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
Link: https://patch.msgid.link/20251028203007.575686-2-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-03 18:11:40 -08:00
Ivan Vecera
30176bf7c8 dpll: add phase-adjust-gran pin attribute
Phase-adjust values are currently limited by a min-max range. Some
hardware requires, for certain pin types, that values be multiples of
a specific granularity, as in the zl3073x driver.

Add a `phase-adjust-gran` pin attribute and an appropriate field in
dpll_pin_properties. If set by the driver, use its value to validate
user-provided phase-adjust values.

Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20251029153207.178448-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31 17:59:17 -07:00
Petr Oros
520ad9e969 dpll: spec: add missing module-name and clock-id to pin-get reply
The dpll.yaml spec incorrectly omitted module-name and clock-id from the
pin-get operation reply specification, even though the kernel DPLL
implementation has always included these attributes in pin-get responses
since the initial implementation.

This spec inconsistency caused issues with the C YNL code generator.
The generated dpll_pin_get_rsp structure was missing these fields.

Fix the spec by adding module-name and clock-id to the pin-attrs reply
specification to match the actual kernel behavior.

Fixes: 3badff3a25 ("dpll: spec: Add Netlink spec in YAML")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251024185512.363376-1-poros@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:20:36 -07:00
Changwoo Min
bd26631ccd PM: EM: Add em.yaml and autogen files
Add a generic netlink spec in YAML format and autogenerate boilerplate
code using ynl-regen.sh to introduce a generic netlink for the energy
model. It allows a userspace program to read the performance domain and
its energy model. It notifies the userspace program when a performance
domain is created or deleted or its energy model is updated through a
multicast interface.

Specifically, it supports two commands:
  - EM_CMD_GET_PDS: Get the list of information for all performance
    domains.
  - EM_CMD_GET_PD_TABLE: Get the energy model table of a performance
    domain.

Also, it supports three notification events:
  - EM_CMD_PD_CREATED: When a performance domain is created.
  - EM_CMD_PD_DELETED: When a performance domain is deleted.
  - EM_CMD_PD_UPDATED: When the energy model table of a performance domain
    is updated.

Finally, update MAINTAINERS to include new files.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20251020220914.320832-4-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 21:44:37 +02:00
Linus Torvalds
6093a688a0 Merge tag 'char-misc-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull Char/Misc/IIO/Binder updates from Greg KH:
 "Here is the big set of char/misc/iio and other driver subsystem
  changes for 6.18-rc1.

  Loads of different stuff in here, it was a busy development cycle in
  lots of different subsystems, with over 27k new lines added to the
  tree.

  Included in here are:

   - IIO updates including new drivers, reworking of existing apis, and
     other goodness in the sensor subsystems

   - MEI driver updates and additions

   - NVMEM driver updates

   - slimbus removal for an unused driver and some other minor updates

   - coresight driver updates and additions

   - MHI driver updates

   - comedi driver updates and fixes

   - extcon driver updates

   - interconnect driver additions

   - eeprom driver updates and fixes

   - minor UIO driver updates

   - tiny W1 driver updates

  But the majority of new code is in the rust bindings and additions,
  which includes:

   - misc driver rust binding updates for read/write support, we can now
     write "normal" misc drivers in rust fully, and the sample driver
     shows how this can be done.

   - Initial framework for USB driver rust bindings, which are disabled
     for now in the build, due to limited support, but coming in through
     this tree due to dependencies on other rust binding changes that
     were in here. I'll be enabling these back on in the build in the
     usb.git tree after -rc1 is out so that developers can continue to
     work on these in linux-next over the next development cycle.

   - Android Binder driver implemented in Rust.

     This is the big one, and was driving a huge majority of the rust
     binding work over the past years. Right now there are two binder
     drivers in the kernel, selected only at build time as to which one
     to use as binder wants to be included in the system at boot time.

     The binder C maintainers all agreed on this, as eventually, they
     want the C code to be removed from the tree, but it will take a few
     releases to get there while both are maintained to ensure that the
     rust implementation is fully stable and compliant with the existing
     userspace apis.

  All of these have been in linux-next for a while"

* tag 'char-misc-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (320 commits)
  rust: usb: keep usb::Device private for now
  rust: usb: don't retain device context for the interface parent
  USB: disable rust bindings from the build for now
  samples: rust: add a USB driver sample
  rust: usb: add basic USB abstractions
  coresight: Add label sysfs node support
  dt-bindings: arm: Add label in the coresight components
  coresight: tnoc: add new AMBA ID to support Trace Noc V2
  coresight: Fix incorrect handling for return value of devm_kzalloc
  coresight: tpda: fix the logic to setup the element size
  coresight: trbe: Return NULL pointer for allocation failures
  coresight: Refactor runtime PM
  coresight: Make clock sequence consistent
  coresight: Refactor driver data allocation
  coresight: Consolidate clock enabling
  coresight: Avoid enable programming clock duplicately
  coresight: Appropriately disable trace bus clocks
  coresight: Appropriately disable programming clocks
  coresight: etm4x: Support atclk
  coresight: catu: Support atclk
  ...
2025-10-04 16:26:32 -07:00
Paolo Abeni
1a98f5699b Revert "Documentation: net: add flow control guide and document ethtool API"
This reverts commit 7bd80ed89d.

I should not have merged it to begin with due to pending review and
changes to be addressed.

Link: https://patch.msgid.link/c6f3af12df9b7998920a02027fc8893ce82afc4c.1759239721.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-01 09:48:21 +02:00
Oleksij Rempel
7bd80ed89d Documentation: net: add flow control guide and document ethtool API
Introduce a new document, flow_control.rst, to provide a comprehensive
guide on Ethernet Flow Control in Linux. The guide explains how flow
control works, how autonegotiation resolves pause capabilities, and how
to configure it using ethtool and Netlink.

In parallel, document the pause and pause-stat attributes in the
ethtool.yaml netlink spec. This enables the ynl tool to generate
kernel-doc comments for the corresponding enums in the UAPI header,
making the C interface self-documenting.

Finally, replace the legacy flow control section in phy.rst with a
reference to the new document and add pointers in the relevant C source
files.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250924120241.724850-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 09:48:31 +02:00
Ivan Vecera
a680581f6a dpll: add phase-offset-avg-factor device attribute to netlink spec
Add dpll device level attribute DPLL_A_PHASE_OFFSET_AVG_FACTOR to allow
control over a calculation of reported phase offset value. Attribute is
present, if the driver provides such capability, otherwise attribute
shall not be present.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250927084912.2343597-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:57:41 -07:00
Vadim Fedorenko
cc2f081299 ethtool: add FEC bins histogram report
IEEE 802.3ck-2022 defines counters for FEC bins and 802.3df-2024
clarifies it a bit further. Implement reporting interface through as
addition to FEC stats available in ethtool. Drivers can leave bin
counter uninitialized if per-lane values are provided. In this case the
core will recalculate summ for the bin.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Link: https://patch.msgid.link/20250924124037.1508846-2-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 16:49:18 -07:00
Matthieu Baerts (NGI0)
c8bc168f5f mptcp: pm: netlink: deprecate server-side attribute
Now that such info is in the 'flags' attribute, it is time to deprecate
the dedicated 'server-side' attribute.

It will be removed in a few versions.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-3-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22 11:51:24 -07:00
Matthieu Baerts (NGI0)
c9809f03c1 mptcp: pm: netlink: only add server-side attr when true
This attribute is a boolean. No need to add it to set it to 'false'.

Indeed, the default value when this attribute is not set is naturally
'false'. A few bytes can then be saved by not adding this attribute if
the connection is not on the server side.

This prepares the future deprecation of its attribute, in favour of a
new flag.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-1-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22 11:51:24 -07:00
Alasdair McWilliam
b73b8146d7 rtnetlink: add needed_{head,tail}room attributes
Various network interface types make use of needed_{head,tail}room values
to efficiently reserve buffer space for additional encapsulation headers,
such as VXLAN, Geneve, IPSec, etc. However, it is not currently possible
to query these values in a generic way.

Introduce ability to query the needed_{head,tail}room values of a network
device via rtnetlink, such that applications that may wish to use these
values can do so.

For example, Cilium agent iterates over present devices based on user config
(direct routing, vxlan, geneve, wireguard etc.) and in future will configure
netkit in order to expose the needed_{head,tail}room into K8s pods. See
b9ed315d3c ("netkit: Allow for configuring needed_{head,tail}room").

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alasdair McWilliam <alasdair@mcwilliam.dev>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20250917095543.14039-1-alasdair@mcwilliam.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:21:55 -07:00
Jakub Kicinski
f2cdc4c22b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc7).

No conflicts.

Adjacent changes:

drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
  9536fbe10c ("net/mlx5e: Add PSP steering in local NIC RX")
  7601a0a462 ("net/mlx5e: Add a miss level for ipsec crypto offload")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 11:26:06 -07:00
Jakub Kicinski
6b46ca260e net: psp: add socket security association code
Add the ability to install PSP Rx and Tx crypto keys on TCP
connections. Netlink ops are provided for both operations.
Rx side combines allocating a new Rx key and installing it
on the socket. Theoretically these are separate actions,
but in practice they will always be used one after the
other. We can add distinct "alloc" and "install" ops later.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Daniel Zahka <daniel.zahka@gmail.com>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250917000954.859376-9-daniel.zahka@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-18 12:32:06 +02:00
Jakub Kicinski
117f02a49b psp: add op for rotation of device key
Rotating the device key is a key part of the PSP protocol design.
Some external daemon needs to do it once a day, or so.
Add a netlink op to perform this operation.
Add a notification group for informing users that key has been
rotated and they should rekey (next rotation will cut them off).

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250917000954.859376-6-daniel.zahka@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-18 12:32:06 +02:00
Jakub Kicinski
00c94ca2b9 psp: base PSP device support
Add a netlink family for PSP and allow drivers to register support.

The "PSP device" is its own object. This allows us to perform more
flexible reference counting / lifetime control than if PSP information
was part of net_device. In the future we should also be able
to "delegate" PSP access to software devices, such as *vlan, veth
or netkit more easily.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250917000954.859376-3-daniel.zahka@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-18 12:32:06 +02:00
Remy D. Farley
109f8b5154 doc/netlink: Fix typos in operation attributes
I'm trying to generate Rust bindings for netlink using the yaml spec.

It looks like there's a typo in conntrack spec: attribute set conntrack-attrs
defines attributes "counters-{orig,reply}" (plural), while get operation
references "counter-{orig,reply}" (singular). The latter should be fixed, as it
denotes multiple counters (packet and byte). The corresonding C define is
CTA_COUNTERS_ORIG.

Also, dump request references "nfgen-family" attribute, which neither exists in
conntrack-attrs attrset nor ctattr_type enum. There's member of nfgenmsg struct
with the same name, which is where family value is actually taken from.

> static int ctnetlink_dump_exp_ct(struct net *net, struct sock *ctnl,
>                struct sk_buff *skb,
>                const struct nlmsghdr *nlh,
>                const struct nlattr * const cda[],
>                struct netlink_ext_ack *extack)
> {
>   int err;
>   struct nfgenmsg *nfmsg = nlmsg_data(nlh);
>   u_int8_t u3 = nfmsg->nfgen_family;
                         ^^^^^^^^^^^^

Signed-off-by: Remy D. Farley <one-d-wide@protonmail.com>
Fixes: 23fc9311a5 ("netlink: specs: add conntrack dump and stats dump support")
Link: https://patch.msgid.link/20250913140515.1132886-1-one-d-wide@protonmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16 16:30:06 -07:00