mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-05-27 09:02:25 -04:00
Pull networking updates from Paolo Abeni:
"The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core:
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of
rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infrastructure is guarded by the
CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code:
- Add small file operations for debugfs, to reduce the struct ops
size.
- Refactoring and optimization for the implementation of page_frag
API, This is a preparatory work to consolidate the page_frag
implementation.
Netfilter:
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users the
option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent CI
improvements.
BPF:
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
Protocols:
- Introduces 4-tuple hash for connected udp sockets, speeding-up
significantly connected sockets lookup.
- Add a fastpath for some TCP timers that usually expires after
close, the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state
lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per
device neigh lists.
Driver API:
- Introduce a unified interface to configure transmission H/W
shaping, and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling:
- forwarding: introduce deferred commands, to simplify the cleanup
phase
Drivers:
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- add support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Add representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- add support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implement page pool support
- macsec:
- inherit lower device's features and TSO limits when
offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- add dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- add clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature"
* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
mm: page_frag: fix a compile error when kernel is not compiled
Documentation: tipc: fix formatting issue in tipc.rst
selftests: nic_performance: Add selftest for performance of NIC driver
selftests: nic_link_layer: Add selftest case for speed and duplex states
selftests: nic_link_layer: Add link layer selftest for NIC driver
bnxt_en: Add FW trace coredump segments to the coredump
bnxt_en: Add a new ethtool -W dump flag
bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
bnxt_en: Add functions to copy host context memory
bnxt_en: Do not free FW log context memory
bnxt_en: Manage the FW trace context memory
bnxt_en: Allocate backing store memory for FW trace logs
bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
bnxt_en: Refactor bnxt_free_ctx_mem()
bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
bnxt_en: Update firmware interface spec to 1.10.3.85
selftests/bpf: Add some tests with sockmap SK_PASS
bpf: fix recursive lock when verdict program return SK_PASS
wireguard: device: support big tcp GSO
wireguard: selftests: load nf_conntrack if not present
...
290 lines
6.6 KiB
C
290 lines
6.6 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
/*
|
|
* S390 version
|
|
* Copyright IBM Corp. 1999
|
|
*
|
|
* Derived from "include/asm-i386/timex.h"
|
|
* Copyright (C) 1992, Linus Torvalds
|
|
*/
|
|
|
|
#ifndef _ASM_S390_TIMEX_H
|
|
#define _ASM_S390_TIMEX_H
|
|
|
|
#include <linux/preempt.h>
|
|
#include <linux/time64.h>
|
|
#include <asm/lowcore.h>
|
|
#include <asm/asm.h>
|
|
|
|
/* The value of the TOD clock for 1.1.1970. */
|
|
#define TOD_UNIX_EPOCH 0x7d91048bca000000ULL
|
|
|
|
extern u64 clock_comparator_max;
|
|
|
|
union tod_clock {
|
|
__uint128_t val;
|
|
struct {
|
|
__uint128_t ei : 8; /* epoch index */
|
|
__uint128_t tod : 64; /* bits 0-63 of tod clock */
|
|
__uint128_t : 40;
|
|
__uint128_t pf : 16; /* programmable field */
|
|
};
|
|
struct {
|
|
__uint128_t eitod : 72; /* epoch index + bits 0-63 tod clock */
|
|
__uint128_t : 56;
|
|
};
|
|
struct {
|
|
__uint128_t us : 60; /* micro-seconds */
|
|
__uint128_t sus : 12; /* sub-microseconds */
|
|
__uint128_t : 56;
|
|
};
|
|
} __packed;
|
|
|
|
/* Inline functions for clock register access. */
|
|
static inline int set_tod_clock(__u64 time)
|
|
{
|
|
int cc;
|
|
|
|
asm volatile(
|
|
" sck %[time]\n"
|
|
CC_IPM(cc)
|
|
: CC_OUT(cc, cc)
|
|
: [time] "Q" (time)
|
|
: CC_CLOBBER);
|
|
return CC_TRANSFORM(cc);
|
|
}
|
|
|
|
static inline int store_tod_clock_ext_cc(union tod_clock *clk)
|
|
{
|
|
int cc;
|
|
|
|
asm volatile(
|
|
" stcke %[clk]\n"
|
|
CC_IPM(cc)
|
|
: CC_OUT(cc, cc), [clk] "=Q" (*clk)
|
|
:
|
|
: CC_CLOBBER);
|
|
return CC_TRANSFORM(cc);
|
|
}
|
|
|
|
static __always_inline void store_tod_clock_ext(union tod_clock *tod)
|
|
{
|
|
asm volatile("stcke %0" : "=Q" (*tod) : : "cc");
|
|
}
|
|
|
|
static inline void set_clock_comparator(__u64 time)
|
|
{
|
|
asm volatile("sckc %0" : : "Q" (time));
|
|
}
|
|
|
|
static inline void set_tod_programmable_field(u16 val)
|
|
{
|
|
asm volatile(
|
|
" lgr 0,%[val]\n"
|
|
" sckpf\n"
|
|
:
|
|
: [val] "d" ((unsigned long)val)
|
|
: "0");
|
|
}
|
|
|
|
void clock_comparator_work(void);
|
|
|
|
void __init time_early_init(void);
|
|
|
|
extern unsigned char ptff_function_mask[16];
|
|
|
|
/* Function codes for the ptff instruction. */
|
|
#define PTFF_QAF 0x00 /* query available functions */
|
|
#define PTFF_QTO 0x01 /* query tod offset */
|
|
#define PTFF_QSI 0x02 /* query steering information */
|
|
#define PTFF_QPT 0x03 /* query physical clock */
|
|
#define PTFF_QUI 0x04 /* query UTC information */
|
|
#define PTFF_ATO 0x40 /* adjust tod offset */
|
|
#define PTFF_STO 0x41 /* set tod offset */
|
|
#define PTFF_SFS 0x42 /* set fine steering rate */
|
|
#define PTFF_SGS 0x43 /* set gross steering rate */
|
|
|
|
/* Query TOD offset result */
|
|
struct ptff_qto {
|
|
unsigned long physical_clock;
|
|
unsigned long tod_offset;
|
|
unsigned long logical_tod_offset;
|
|
unsigned long tod_epoch_difference;
|
|
} __packed;
|
|
|
|
static inline int ptff_query(unsigned int nr)
|
|
{
|
|
unsigned char *ptr;
|
|
|
|
ptr = ptff_function_mask + (nr >> 3);
|
|
return (*ptr & (0x80 >> (nr & 7))) != 0;
|
|
}
|
|
|
|
/* Query UTC information result */
|
|
struct ptff_qui {
|
|
unsigned int tm : 2;
|
|
unsigned int ts : 2;
|
|
unsigned int : 28;
|
|
unsigned int pad_0x04;
|
|
unsigned long leap_event;
|
|
short old_leap;
|
|
short new_leap;
|
|
unsigned int pad_0x14;
|
|
unsigned long prt[5];
|
|
unsigned long cst[3];
|
|
unsigned int skew;
|
|
unsigned int pad_0x5c[41];
|
|
} __packed;
|
|
|
|
/*
|
|
* ptff - Perform timing facility function
|
|
* @ptff_block: Pointer to ptff parameter block
|
|
* @len: Length of parameter block
|
|
* @func: Function code
|
|
* Returns: Condition code (0 on success)
|
|
*/
|
|
#define ptff(ptff_block, len, func) \
|
|
({ \
|
|
struct addrtype { char _[len]; }; \
|
|
unsigned int reg0 = func; \
|
|
unsigned long reg1 = (unsigned long)(ptff_block); \
|
|
int rc; \
|
|
\
|
|
asm volatile( \
|
|
" lgr 0,%[reg0]\n" \
|
|
" lgr 1,%[reg1]\n" \
|
|
" ptff\n" \
|
|
CC_IPM(rc) \
|
|
: CC_OUT(rc, rc), "+m" (*(struct addrtype *)reg1) \
|
|
: [reg0] "d" (reg0), [reg1] "d" (reg1) \
|
|
: CC_CLOBBER_LIST("0", "1")); \
|
|
CC_TRANSFORM(rc); \
|
|
})
|
|
|
|
static inline unsigned long local_tick_disable(void)
|
|
{
|
|
unsigned long old;
|
|
|
|
old = get_lowcore()->clock_comparator;
|
|
get_lowcore()->clock_comparator = clock_comparator_max;
|
|
set_clock_comparator(get_lowcore()->clock_comparator);
|
|
return old;
|
|
}
|
|
|
|
static inline void local_tick_enable(unsigned long comp)
|
|
{
|
|
get_lowcore()->clock_comparator = comp;
|
|
set_clock_comparator(get_lowcore()->clock_comparator);
|
|
}
|
|
|
|
#define CLOCK_TICK_RATE 1193180 /* Underlying HZ */
|
|
|
|
typedef unsigned long cycles_t;
|
|
|
|
static __always_inline unsigned long get_tod_clock(void)
|
|
{
|
|
union tod_clock clk;
|
|
|
|
store_tod_clock_ext(&clk);
|
|
return clk.tod;
|
|
}
|
|
|
|
static inline unsigned long get_tod_clock_fast(void)
|
|
{
|
|
unsigned long clk;
|
|
|
|
asm volatile("stckf %0" : "=Q" (clk) : : "cc");
|
|
return clk;
|
|
}
|
|
|
|
static inline cycles_t get_cycles(void)
|
|
{
|
|
return (cycles_t) get_tod_clock() >> 2;
|
|
}
|
|
#define get_cycles get_cycles
|
|
|
|
int get_phys_clock(unsigned long *clock);
|
|
void init_cpu_timer(void);
|
|
|
|
extern union tod_clock tod_clock_base;
|
|
|
|
static __always_inline unsigned long __get_tod_clock_monotonic(void)
|
|
{
|
|
return get_tod_clock() - tod_clock_base.tod;
|
|
}
|
|
|
|
/**
|
|
* get_clock_monotonic - returns current time in clock rate units
|
|
*
|
|
* The clock and tod_clock_base get changed via stop_machine.
|
|
* Therefore preemption must be disabled, otherwise the returned
|
|
* value is not guaranteed to be monotonic.
|
|
*/
|
|
static inline unsigned long get_tod_clock_monotonic(void)
|
|
{
|
|
unsigned long tod;
|
|
|
|
preempt_disable_notrace();
|
|
tod = __get_tod_clock_monotonic();
|
|
preempt_enable_notrace();
|
|
return tod;
|
|
}
|
|
|
|
/**
|
|
* tod_to_ns - convert a TOD format value to nanoseconds
|
|
* @todval: to be converted TOD format value
|
|
* Returns: number of nanoseconds that correspond to the TOD format value
|
|
*
|
|
* Converting a 64 Bit TOD format value to nanoseconds means that the value
|
|
* must be divided by 4.096. In order to achieve that we multiply with 125
|
|
* and divide by 512:
|
|
*
|
|
* ns = (todval * 125) >> 9;
|
|
*
|
|
* In order to avoid an overflow with the multiplication we can rewrite this.
|
|
* With a split todval == 2^9 * th + tl (th upper 55 bits, tl lower 9 bits)
|
|
* we end up with
|
|
*
|
|
* ns = ((2^9 * th + tl) * 125 ) >> 9;
|
|
* -> ns = (th * 125) + ((tl * 125) >> 9);
|
|
*
|
|
*/
|
|
static __always_inline unsigned long tod_to_ns(unsigned long todval)
|
|
{
|
|
return ((todval >> 9) * 125) + (((todval & 0x1ff) * 125) >> 9);
|
|
}
|
|
|
|
static __always_inline u128 eitod_to_ns(u128 todval)
|
|
{
|
|
return (todval * 125) >> 9;
|
|
}
|
|
|
|
/**
|
|
* tod_after - compare two 64 bit TOD values
|
|
* @a: first 64 bit TOD timestamp
|
|
* @b: second 64 bit TOD timestamp
|
|
*
|
|
* Returns: true if a is later than b
|
|
*/
|
|
static inline int tod_after(unsigned long a, unsigned long b)
|
|
{
|
|
if (MACHINE_HAS_SCC)
|
|
return (long) a > (long) b;
|
|
return a > b;
|
|
}
|
|
|
|
/**
|
|
* tod_after_eq - compare two 64 bit TOD values
|
|
* @a: first 64 bit TOD timestamp
|
|
* @b: second 64 bit TOD timestamp
|
|
*
|
|
* Returns: true if a is later than b
|
|
*/
|
|
static inline int tod_after_eq(unsigned long a, unsigned long b)
|
|
{
|
|
if (MACHINE_HAS_SCC)
|
|
return (long) a >= (long) b;
|
|
return a >= b;
|
|
}
|
|
|
|
#endif
|