Commit Graph

128223 Commits

Author SHA1 Message Date
Jakub Kicinski
1e5d1f69d9 ethtool: support FEC settings over netlink
Add FEC API to netlink.

This is not a 1-to-1 conversion.

FEC settings already depend on link modes to tell user which
modes are supported. Take this further an use link modes for
manual configuration. Old struct ethtool_fecparam is still
used to talk to the drivers, so we need to translate back
and forth. We can revisit the internal API if number of FEC
encodings starts to grow.

Enforce only one active FEC bit (by using a bit position
rather than another mask).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31 14:15:23 -07:00
Paolo Abeni
d18931a92a vxlan: allow L4 GRO passthrough
When passing up an UDP GSO packet with L4 aggregation, there is
no need to segment it at the vxlan level. We can propagate the
packet untouched and let it be segmented later, if needed.

Introduce an helper to allow let the UDP socket to accept any
L4 aggregation and use it in the vxlan driver.

v1 -> v2:
 - updated to use the newly introduced UDP socket 'accept*' fields

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 17:06:49 -07:00
Paolo Abeni
78352f73dc udp: never accept GSO_FRAGLIST packets
Currently the UDP protocol delivers GSO_FRAGLIST packets to
the sockets without the expected segmentation.

This change addresses the issue introducing and maintaining
a couple of new fields to explicitly accept SKB_GSO_UDP_L4
or GSO_FRAGLIST packets. Additionally updates  udp_unexpected_gso()
accordingly.

UDP sockets enabling UDP_GRO stil keep accept_udp_fraglist
zeroed.

v1 -> v2:
 - use 2 bits instead of a whole GSO bitmask (Willem)

Fixes: 9fd1ff5d2a ("udp: Support UDP fraglist GRO/GSO.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 17:06:49 -07:00
Paolo Abeni
000ac44da7 udp: fixup csum for GSO receive slow path
When UDP packets generated locally by a socket with UDP_SEGMENT
traverse the following path:

UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) ->
	UDP tunnel (rx) -> UDP socket (no UDP_GRO)

ip_summed will be set to CHECKSUM_PARTIAL at creation time and
such checksum mode will be preserved in the above path up to the
UDP tunnel receive code where we have:

 __iptunnel_pull_header() -> skb_pull_rcsum() ->
skb_postpull_rcsum() -> __skb_postpull_rcsum()

The latter will convert the skb to CHECKSUM_NONE.

The UDP GSO packet will be later segmented as part of the rx socket
receive operation, and will present a CHECKSUM_NONE after segmentation.

Additionally the segmented packets UDP CB still refers to the original
GSO packet len. Overall that causes unexpected/wrong csum validation
errors later in the UDP receive path.

We could possibly address the issue with some additional checks and
csum mangling in the UDP tunnel code. Since the issue affects only
this UDP receive slow path, let's set a suitable csum status there.

Note that SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST packets lacking an UDP
encapsulation present a valid checksum when landing to udp_queue_rcv_skb(),
as the UDP checksum has been validated by the GRO engine.

v2 -> v3:
 - even more verbose commit message and comments

v1 -> v2:
 - restrict the csum update to the packets strictly needing them
 - hopefully clarify the commit message and code comments

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 17:06:49 -07:00
Andreas Roeseler
504a40113c ipv6: add ipv6_dev_find to stubs
Add ipv6_dev_find to ipv6_stub to allow lookup of net_devices by IPV6
address in net/ipv4/icmp.c.

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 13:29:39 -07:00
Andreas Roeseler
f1b8fa9fa5 net: add sysctl for enabling RFC 8335 PROBE messages
Section 8 of RFC 8335 specifies potential security concerns of
responding to PROBE requests, and states that nodes that support PROBE
functionality MUST be able to enable/disable responses and that
responses MUST be disabled by default

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 13:29:39 -07:00
Andreas Roeseler
750f4fc2a1 ICMPV6: add support for RFC 8335 PROBE
Add definitions for the ICMPV6 type of Extended Echo Request and
Extended Echo Reply, as defined by sections 2 and 3 of RFC 8335.

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 13:29:39 -07:00
Andreas Roeseler
2b246b2569 icmp: add support for RFC 8335 PROBE
Add definitions for PROBE ICMP types and codes.

Add AFI definitions for IP and IPV6 as specified by IANA

Add a struct to represent the additional header when probing by IP
address (ctype == 3) for use in parsing incoming PROBE messages

Add a struct to represent the entire Interface Identification Object
(IIO) section of an incoming PROBE packet

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 13:29:39 -07:00
Vincent Mailhol
1d7750760b can: bittiming: add CAN_KBPS, CAN_MBPS and CAN_MHZ macros
Add three macro to simplify the readability of big bit timing numbers:
  - CAN_KBPS: kilobits per second (one thousand)
  - CAN_MBPS: megabits per second (one million)
  - CAN_MHZ: megahertz per second (one million)

Example:
	u32 bitrate_max = 8 * CAN_MBPS;
	struct can_clock clock = {.freq = 80 * CAN_MHZ};
instead of:
	u32 bitrate_max = 8000000;
	struct can_clock clock = {.freq = 80000000};

Apply the new macro to driver/net/can/dev/bittiming.c.

Link: https://lore.kernel.org/r/20210306054040.76483-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-30 11:14:45 +02:00
Vincent Mailhol
c25cc79932 can: bittiming: add calculation for CAN FD Transmitter Delay Compensation (TDC)
The logic for the tdco calculation is to just reuse the normal sample
point: tdco = sp. Because the sample point is expressed in tenth of
percent and the tdco is expressed in time quanta, a conversion is
needed.

At the end,
     ssp = tdcv + tdco
         = tdcv + sp.

Another popular method is to set tdco to the middle of the bit:
     tdc->tdco = can_bit_time(dbt) / 2
During benchmark tests, we could not find a clear advantages for one
of the two methods.

The tdco calculation is triggered each time the data_bittiming is
changed so that users relying on automated calculation can use the
netlink interface the exact same way without need of new parameters.
For example, a command such as:
	ip link set canX type can bitrate 500000 dbitrate 4000000 fd on
would trigger the calculation.

The user using CONFIG_CAN_CALC_BITTIMING who does not want automated
calculation needs to manually set tdco to zero.
For example with:
	ip link set canX type can tdco 0 bitrate 500000 dbitrate 4000000 fd on
(if the tdco parameter is provided in a previous command, it will be
overwritten).

If tdcv is set to zero (default), it is automatically calculated by
the transiver for each frame. As such, there is no code in the kernel
to calculate it.

tdcf has no automated calculation functions because we could not
figure out a formula for this parameter.

Link: https://lore.kernel.org/r/20210224002008.4158-6-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-30 11:14:45 +02:00
Vincent Mailhol
4c9258dd26 can: dev: reorder struct can_priv members for better packing
Save eight bytes of holes on x86-64 architectures by reordering struct
can_priv members.

Before:

$ pahole -C can_priv drivers/net/can/dev/dev.o
struct can_priv {
	struct net_device *        dev;                  /*     0     8 */
	struct can_device_stats    can_stats;            /*     8    24 */
	struct can_bittiming       bittiming;            /*    32    32 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	struct can_bittiming       data_bittiming;       /*    64    32 */
	const struct can_bittiming_const  * bittiming_const; /*    96     8 */
	const struct can_bittiming_const  * data_bittiming_const; /*   104     8 */
	struct can_tdc             tdc;                  /*   112    12 */

	/* XXX 4 bytes hole, try to pack */

	/* --- cacheline 2 boundary (128 bytes) --- */
	const struct can_tdc_const  * tdc_const;         /*   128     8 */
	const u16  *               termination_const;    /*   136     8 */
	unsigned int               termination_const_cnt; /*   144     4 */
	u16                        termination;          /*   148     2 */

	/* XXX 2 bytes hole, try to pack */

	const u32  *               bitrate_const;        /*   152     8 */
	unsigned int               bitrate_const_cnt;    /*   160     4 */

	/* XXX 4 bytes hole, try to pack */

	const u32  *               data_bitrate_const;   /*   168     8 */
	unsigned int               data_bitrate_const_cnt; /*   176     4 */
	u32                        bitrate_max;          /*   180     4 */
	struct can_clock           clock;                /*   184     4 */
	enum can_state             state;                /*   188     4 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	u32                        ctrlmode;             /*   192     4 */
	u32                        ctrlmode_supported;   /*   196     4 */
	u32                        ctrlmode_static;      /*   200     4 */
	int                        restart_ms;           /*   204     4 */
	struct delayed_work        restart_work;         /*   208   168 */

	/* XXX last struct has 4 bytes of padding */

	/* --- cacheline 5 boundary (320 bytes) was 56 bytes ago --- */
	int                        (*do_set_bittiming)(struct net_device *); /*   376     8 */
	/* --- cacheline 6 boundary (384 bytes) --- */
	int                        (*do_set_data_bittiming)(struct net_device *); /*   384     8 */
	int                        (*do_set_mode)(struct net_device *, enum can_mode); /*   392     8 */
	int                        (*do_set_termination)(struct net_device *, u16); /*   400     8 */
	int                        (*do_get_state)(const struct net_device  *, enum can_state *); /*   408     8 */
	int                        (*do_get_berr_counter)(const struct net_device  *, struct can_berr_counter *); /*   416     8 */
	unsigned int               echo_skb_max;         /*   424     4 */

	/* XXX 4 bytes hole, try to pack */

	struct sk_buff * *         echo_skb;             /*   432     8 */

	/* size: 440, cachelines: 7, members: 31 */
	/* sum members: 426, holes: 4, sum holes: 14 */
	/* paddings: 1, sum paddings: 4 */
	/* last cacheline: 56 bytes */
};

After:

$ pahole -C can_priv drivers/net/can/dev/dev.o
struct can_priv {
	struct net_device *        dev;                  /*     0     8 */
	struct can_device_stats    can_stats;            /*     8    24 */
	const struct can_bittiming_const  * bittiming_const; /*    32     8 */
	const struct can_bittiming_const  * data_bittiming_const; /*    40     8 */
	struct can_bittiming       bittiming;            /*    48    32 */
	/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
	struct can_bittiming       data_bittiming;       /*    80    32 */
	const struct can_tdc_const  * tdc_const;         /*   112     8 */
	struct can_tdc             tdc;                  /*   120    12 */
	/* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */
	unsigned int               bitrate_const_cnt;    /*   132     4 */
	const u32  *               bitrate_const;        /*   136     8 */
	const u32  *               data_bitrate_const;   /*   144     8 */
	unsigned int               data_bitrate_const_cnt; /*   152     4 */
	u32                        bitrate_max;          /*   156     4 */
	struct can_clock           clock;                /*   160     4 */
	unsigned int               termination_const_cnt; /*   164     4 */
	const u16  *               termination_const;    /*   168     8 */
	u16                        termination;          /*   176     2 */

	/* XXX 2 bytes hole, try to pack */

	enum can_state             state;                /*   180     4 */
	u32                        ctrlmode;             /*   184     4 */
	u32                        ctrlmode_supported;   /*   188     4 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	u32                        ctrlmode_static;      /*   192     4 */
	int                        restart_ms;           /*   196     4 */
	struct delayed_work        restart_work;         /*   200   168 */

	/* XXX last struct has 4 bytes of padding */

	/* --- cacheline 5 boundary (320 bytes) was 48 bytes ago --- */
	int                        (*do_set_bittiming)(struct net_device *); /*   368     8 */
	int                        (*do_set_data_bittiming)(struct net_device *); /*   376     8 */
	/* --- cacheline 6 boundary (384 bytes) --- */
	int                        (*do_set_mode)(struct net_device *, enum can_mode); /*   384     8 */
	int                        (*do_set_termination)(struct net_device *, u16); /*   392     8 */
	int                        (*do_get_state)(const struct net_device  *, enum can_state *); /*   400     8 */
	int                        (*do_get_berr_counter)(const struct net_device  *, struct can_berr_counter *); /*   408     8 */
	unsigned int               echo_skb_max;         /*   416     4 */

	/* XXX 4 bytes hole, try to pack */

	struct sk_buff * *         echo_skb;             /*   424     8 */

	/* size: 432, cachelines: 7, members: 31 */
	/* sum members: 426, holes: 2, sum holes: 6 */
	/* paddings: 1, sum paddings: 4 */
	/* last cacheline: 48 bytes */
};

Link: https://lore.kernel.org/r/20210224002008.4158-3-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-30 11:14:44 +02:00
Vincent Mailhol
289ea9e4ae can: add new CAN FD bittiming parameters: Transmitter Delay Compensation (TDC)
At high bit rates, the propagation delay from the TX pin to the RX pin
of the transceiver causes measurement errors: the sample point on the
RX pin might occur on the previous bit.

This issue is addressed in ISO 11898-1 section 11.3.3 "Transmitter
delay compensation" (TDC).

This patch adds two new structures: can_tdc and can_tdc_const in order
to implement this TDC.

The structures are then added to can_priv.

A controller supports TDC if an only if can_priv::tdc_const is not
NULL.

TDC is active if and only if:
  - fd flag is on
  - can_priv::tdc.tdco is not zero.
It is the driver responsibility to check those two conditions are met.

No new controller modes are introduced (i.e. no CAN_CTRL_MODE_TDC) in
order not to be redundant with above logic.

The names of the parameters are chosen to match existing CAN
controllers specification. References:
  - Bosch C_CAN FD8:
https://www.bosch-semiconductors.com/media/ip_modules/pdf_2/c_can_fd8/users_manual_c_can_fd8_r210_1.pdf
  - Microchip CAN FD Controller Module:
http://ww1.microchip.com/downloads/en/DeviceDoc/MCP251XXFD-CAN-FD-Controller-Module-Family-Reference-Manual-20005678B.pdf
  - SAM E701/S70/V70/V71 Family:
https://www.mouser.com/datasheet/2/268/60001527A-1284321.pdf

Link: https://lore.kernel.org/r/20210224002008.4158-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-30 11:14:44 +02:00
Marc Kleine-Budde
f318482a1c can: dev: can_free_echo_skb(): extend to return can frame length
In order to implement byte queue limits (bql) in CAN drivers, the
length of the CAN frame needs to be passed into the networking stack
even if the transmission failed for some reason.

To avoid to calculate this length twice, extend can_free_echo_skb() to
return that value. Convert all users of this function, too.

This patch is the natural extension of commit:

| 9420e1d495 ("can: dev: can_get_echo_skb(): extend to return can
|                frame length")

Link: https://lore.kernel.org/r/20210319142700.305648-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-30 11:14:28 +02:00
Petr Machata
de1d1ee3e3 nexthop: Rename artifacts related to legacy multipath nexthop groups
After resilient next-hop groups have been added recently, there are two
types of multipath next-hop groups: the legacy "mpath", and the new
"resilient". Calling the legacy next-hop group type "mpath" is unfortunate,
because that describes the fact that a packet could be forwarded in one of
several paths, which is also true for the resilient next-hop groups.

Therefore, to make the naming clearer, rename various artifacts to reflect
the assumptions made. Therefore as of this patch:

- The flag for multipath groups is nh_grp_entry::is_multipath. This
  includes the legacy and resilient groups, as well as any future group
  types that behave as multipath groups.
  Functions that assume this have "mpath" in the name.

- The flag for legacy multipath groups is nh_grp_entry::hash_threshold.
  Functions that assume this have "hthr" in the name.

- The flag for resilient groups is nh_grp_entry::resilient.
  Functions that assume this have "res" in the name.

Besides the above, struct nh_grp_entry::mpath was renamed to ::hthr as
well.

UAPI artifacts were obviously left intact.

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-28 17:53:39 -07:00
Taehee Yoo
63ed8de4be mld: add mc_lock for protecting per-interface mld data
The purpose of this lock is to avoid a bottleneck in the query/report
event handler logic.

By previous patches, almost all mld data is protected by RTNL.
So, the query and report event handler, which is data path logic
acquires RTNL too. Therefore if a lot of query and report events
are received, it uses RTNL for a long time.
So it makes the control-plane bottleneck because of using RTNL.
In order to avoid this bottleneck, mc_lock is added.

mc_lock protect only per-interface mld data and per-interface mld
data is used in the query/report event handler logic.
So, no longer rtnl_lock is needed in the query/report event handler logic.
Therefore bottleneck will be disappeared by mc_lock.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26 15:14:56 -07:00
Taehee Yoo
f185de28d9 mld: add new workqueues for process mld events
When query/report packets are received, mld module processes them.
But they are processed under BH context so it couldn't use sleepable
functions. So, in order to switch context, the two workqueues are
added which processes query and report event.

In the struct inet6_dev, mc_{query | report}_queue are added so it
is per-interface queue.
And mc_{query | report}_work are workqueue structure.

When the query or report event is received, skb is queued to proper
queue and worker function is scheduled immediately.
Workqueues and queues are protected by spinlock, which is
mc_{query | report}_lock, and worker functions are protected by RTNL.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26 15:14:56 -07:00
Taehee Yoo
88e2ca3080 mld: convert ifmcaddr6 to RCU
The ifmcaddr6 has been protected by inet6_dev->lock(rwlock) so that
the critical section is atomic context. In order to switch this context,
changing locking is needed. The ifmcaddr6 actually already protected by
RTNL So if it's converted to use RCU, its control path context can be
switched to sleepable.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26 15:14:56 -07:00
Taehee Yoo
4b200e3989 mld: convert ip6_sf_list to RCU
The ip6_sf_list has been protected by mca_lock(spin_lock) so that the
critical section is atomic context. In order to switch this context,
changing locking is needed. The ip6_sf_list actually already protected
by RTNL So if it's converted to use RCU, its control path context can
be switched to sleepable.
But It doesn't remove mca_lock yet because ifmcaddr6 isn't converted
to RCU yet. So, It's not fully converted to the sleepable context.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26 15:14:56 -07:00
Taehee Yoo
882ba1f73c mld: convert ipv6_mc_socklist->sflist to RCU
The sflist has been protected by rwlock so that the critical section
is atomic context.
In order to switch this context, changing locking is needed.
The sflist actually already protected by RTNL So if it's converted
to use RCU, its control path context can be switched to sleepable.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26 15:14:56 -07:00
Taehee Yoo
cf2ce339b4 mld: get rid of inet6_dev->mc_lock
The purpose of mc_lock is to protect inet6_dev->mc_tomb.
But mc_tomb is already protected by RTNL and all functions,
which manipulate mc_tomb are called under RTNL.
So, mc_lock is not needed.
Furthermore, it is spinlock so the critical section is atomic.
In order to reduce atomic context, it should be removed.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26 15:14:55 -07:00
Taehee Yoo
2d9a93b490 mld: convert from timer to delayed work
mcast.c has several timers for delaying works.
Timer's expire handler is working under atomic context so it can't use
sleepable things such as GFP_KERNEL, mutex, etc.
In order to use sleepable APIs, it converts from timers to delayed work.
But there are some critical sections, which is used by both process
and BH context. So that it still uses spin_lock_bh() and rwlock.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26 15:14:55 -07:00
Jakub Kicinski
d04feecaf1 ethtool: document the enum values not defines
kdoc does not have good support for documenting defines,
and we can't abuse the enum documentation because it
generates warnings.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26 15:09:45 -07:00
Jakub Kicinski
ad1cd7856d ethtool: fec: add note about reuse of reserved
struct ethtool_fecparam::reserved can't be used in SET, because
ethtool user space doesn't zero-initialize the structure.
Make this clear.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26 15:09:45 -07:00
Eric Dumazet
4ecc1baf36 tcp: convert elligible sysctls to u8
Many tcp sysctls are either bools or small ints that can fit into u8.

Reducing space taken by sysctls can save few cache line misses
when sending/receiving data while cpu caches are empty,
for example after cpu idle period.

This is hard to measure with typical network performance tests,
but after this patch, struct netns_ipv4 has shrunk
by three cache lines.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 17:39:33 -07:00
Eric Dumazet
2932bcda07 inet: convert tcp_early_demux and udp_early_demux to u8
For these sysctls, their dedicated helpers have
to use proc_dou8vec_minmax().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 17:39:33 -07:00
Eric Dumazet
1c69dedc8f ipv4: convert ip_forward_update_priority sysctl to u8
This sysctl uses ip_fwd_update_priority() helper,
so the conversion needs to change it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 17:39:33 -07:00
Eric Dumazet
4b6bbf17d4 ipv4: shrink netns_ipv4 with sysctl conversions
These sysctls that can fit in one byte instead of one int
are converted to save space and thus reduce cache line misses.

 - icmp_echo_ignore_all, icmp_echo_ignore_broadcasts,
 - icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr
 - tcp_ecn, tcp_ecn_fallback
 - ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu
 - ip_nonlocal_bind, ip_autobind_reuse
 - ip_dynaddr, ip_early_demux, raw_l3mdev_accept
 - nexthop_compat_mode, fwmark_reflect

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 17:39:33 -07:00
Eric Dumazet
cb94441306 sysctl: add proc_dou8vec_minmax()
Networking has many sysctls that could fit in one u8.

This patch adds proc_dou8vec_minmax() for this purpose.

Note that the .extra1 and .extra2 fields are pointing
to integers, because it makes conversions easier.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 17:39:33 -07:00
Wong, Vee Khee
6ccf12ae11 net: stmmac: use interrupt mode INTM=1 for multi-MSI
For interrupt mode INTM=0, TX/RX transfer complete will trigger signal
not only on sbd_perch_[tx|rx]_intr_o (Transmit/Receive Per Channel) but
also on the sbd_intr_o (Common).

As for multi-MSI implementation, setting interrupt mode INTM=1 is more
efficient as each TX intr and RX intr (TI/RI) will be handled by TX/RX ISR
without the need of calling the common MAC ISR.

Updated the TX/RX NORMAL interrupts status checking process as the
NIS status bit is not asserted for any RI/TI events for INTM=1.

Signed-off-by: Wong, Vee Khee <vee.khee.wong@intel.com>
Co-developed-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 17:37:30 -07:00
Ong Boon Leong
8532f613bc net: stmmac: introduce MSI Interrupt routines for mac, safety, RX & TX
Now we introduce MSI interrupt service routines and hook these routines
up if stmmac_open() sees valid irq line being requested:-

stmmac_mac_interrupt()    :- MAC (dev->irq), WOL (wol_irq), LPI (lpi_irq)
stmmac_safety_interrupt() :- Safety Feat Correctible Error (sfty_ce_irq)
                             & Uncorrectible Error (sfty_ue_irq)
stmmac_msi_intr_rx()      :- For all RX MSI irq (rx_irq)
stmmac_msi_intr_tx()      :- For all TX MSI irq (tx_irq)

Each of IRQs will have its unique name so that we can differentiate
them easily under /proc/interrupts.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 17:37:30 -07:00
Jakub Kicinski
6dbf94b264 ethtool: clarify the ethtool FEC interface
The definition of the FEC driver interface is quite unclear.
Improve the documentation.

This is based on current driver and user space code, as well
as the discussions about the interface:

RFC v1 (24 Oct 2016): https://lore.kernel.org/netdev/1477363849-36517-1-git-send-email-vidya@cumulusnetworks.com/
 - this version has the autoneg field
 - no active_fec field
 - none vs off confusion is already present

RFC v2 (10 Feb 2017): https://lore.kernel.org/netdev/1486727004-11316-1-git-send-email-vidya@cumulusnetworks.com/
 - autoneg removed
 - active_fec added

v1 (10 Feb 2017): https://lore.kernel.org/netdev/1486751311-42019-1-git-send-email-vidya@cumulusnetworks.com/
 - no changes in the code

v1 (24 Jun 2017):  https://lore.kernel.org/netdev/1498331985-8525-1-git-send-email-roopa@cumulusnetworks.com/
 - include in tree user

v2 (27 Jul 2017): https://lore.kernel.org/netdev/1501199248-24695-1-git-send-email-roopa@cumulusnetworks.com/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 16:46:53 -07:00
Jakub Kicinski
d3b37fc805 ethtool: fec: sanitize ethtool_fecparam->active_fec
struct ethtool_fecparam::active_fec is a GET-only field,
all in-tree drivers correctly ignore it on SET. Clear
the field on SET to avoid any confusion. Again, we can't
reject non-zero now since ethtool user space does not
zero-init the param correctly.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 16:46:53 -07:00
Jakub Kicinski
240e114411 ethtool: fec: sanitize ethtool_fecparam->reserved
struct ethtool_fecparam::reserved is never looked at by the core.
Make sure it's actually 0. Unfortunately we can't return an error
because old ethtool doesn't zero-initialize the structure for SET.
On GET we can be more verbose, there are no in tree (ab)users.

Fix up the kdoc on the structure. Remove the mention of FEC
bypass. Seems like a niche thing to configure in the first
place.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 16:46:53 -07:00
Jakub Kicinski
408386817a ethtool: fec: remove long structure description
Digging through the mailing list archive @autoneg was part
of the first version of the RFC, this left over comment was
pointed out twice in review but wasn't removed.

The sentence is an exact copy-paste from pauseparam.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 16:46:53 -07:00
Jakub Kicinski
ed3038158e ethtool: fec: fix typo in kdoc
s/porte/the port/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 16:46:53 -07:00
David S. Miller
241949e488 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-03-24

The following pull-request contains BPF updates for your *net-next* tree.

We've added 37 non-merge commits during the last 15 day(s) which contain
a total of 65 files changed, 3200 insertions(+), 738 deletions(-).

The main changes are:

1) Static linking of multiple BPF ELF files, from Andrii.

2) Move drop error path to devmap for XDP_REDIRECT, from Lorenzo.

3) Spelling fixes from various folks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 16:30:46 -07:00
David S. Miller
efd13b71a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 15:31:22 -07:00
Linus Torvalds
002322402d Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 patches.

  Subsystems affected by this patch series: mm (hugetlb, kasan, gup,
  selftests, z3fold, kfence, memblock, and highmem), squashfs, ia64,
  gcov, and mailmap"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mailmap: update Andrey Konovalov's email address
  mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
  mm: memblock: fix section mismatch warning again
  kfence: make compatible with kmemleak
  gcov: fix clang-11+ support
  ia64: fix format strings for err_inject
  ia64: mca: allocate early mca with GFP_ATOMIC
  squashfs: fix xattr id and id lookup sanity checks
  squashfs: fix inode lookup sanity checks
  z3fold: prevent reclaim/free race for headless pages
  selftests/vm: fix out-of-tree build
  mm/mmu_notifiers: ensure range_end() is paired with range_start()
  kasan: fix per-page tags for non-page_alloc pages
  hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
2021-03-25 11:43:43 -07:00
Mike Rapoport
a024b7c285 mm: memblock: fix section mismatch warning again
Commit 34dc2efb39 ("memblock: fix section mismatch warning") marked
memblock_bottom_up() and memblock_set_bottom_up() as __init, but they
could be referenced from non-init functions like
memblock_find_in_range_node() on architectures that enable
CONFIG_ARCH_KEEP_MEMBLOCK.

For such builds kernel test robot reports:

   WARNING: modpost: vmlinux.o(.text+0x74fea4): Section mismatch in reference from the function memblock_find_in_range_node() to the function .init.text:memblock_bottom_up()
   The function memblock_find_in_range_node() references the function __init memblock_bottom_up().
   This is often because memblock_find_in_range_node lacks a __init  annotation or the annotation of memblock_bottom_up is wrong.

Replace __init annotations with __init_memblock annotations so that the
appropriate section will be selected depending on
CONFIG_ARCH_KEEP_MEMBLOCK.

Link: https://lore.kernel.org/lkml/202103160133.UzhgY0wt-lkp@intel.com
Link: https://lkml.kernel.org/r/20210316171347.14084-1-rppt@kernel.org
Fixes: 34dc2efb39 ("memblock: fix section mismatch warning")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Sean Christopherson
c2655835fd mm/mmu_notifiers: ensure range_end() is paired with range_start()
If one or more notifiers fails .invalidate_range_start(), invoke
.invalidate_range_end() for "all" notifiers.  If there are multiple
notifiers, those that did not fail are expecting _start() and _end() to
be paired, e.g.  KVM's mmu_notifier_count would become imbalanced.
Disallow notifiers that can fail _start() from implementing _end() so
that it's unnecessary to either track which notifiers rejected _start(),
or had already succeeded prior to a failed _start().

Note, the existing behavior of calling _start() on all notifiers even
after a previous notifier failed _start() was an unintented "feature".
Make it canon now that the behavior is depended on for correctness.

As of today, the bug is likely benign:

  1. The only caller of the non-blocking notifier is OOM kill.
  2. The only notifiers that can fail _start() are the i915 and Nouveau
     drivers.
  3. The only notifiers that utilize _end() are the SGI UV GRU driver
     and KVM.
  4. The GRU driver will never coincide with the i195/Nouveau drivers.
  5. An imbalanced kvm->mmu_notifier_count only causes soft lockup in the
     _guest_, and the guest is already doomed due to being an OOM victim.

Fix the bug now to play nice with future usage, e.g.  KVM has a
potential use case for blocking memslot updates in KVM while an
invalidation is in-progress, and failure to unblock would result in said
updates being blocked indefinitely and hanging.

Found by inspection.  Verified by adding a second notifier in KVM that
periodically returns -EAGAIN on non-blockable ranges, triggering OOM,
and observing that KVM exits with an elevated notifier count.

Link: https://lkml.kernel.org/r/20210311180057.1582638-1-seanjc@google.com
Fixes: 93065ac753 ("mm, oom: distinguish blockable mode for mmu notifiers")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Andrey Konovalov
cf10bd4c4a kasan: fix per-page tags for non-page_alloc pages
To allow performing tag checks on page_alloc addresses obtained via
page_address(), tag-based KASAN modes store tags for page_alloc
allocations in page->flags.

Currently, the default tag value stored in page->flags is 0x00.
Therefore, page_address() returns a 0x00ffff...  address for pages that
were not allocated via page_alloc.

This might cause problems.  A particular case we encountered is a
conflict with KFENCE.  If a KFENCE-allocated slab object is being freed
via kfree(page_address(page) + offset), the address passed to kfree()
will get tagged with 0x00 (as slab pages keep the default per-page
tags).  This leads to is_kfence_address() check failing, and a KFENCE
object ending up in normal slab freelist, which causes memory
corruptions.

This patch changes the way KASAN stores tag in page-flags: they are now
stored xor'ed with 0xff.  This way, KASAN doesn't need to initialize
per-page flags for every created page, which might be slow.

With this change, page_address() returns natively-tagged (with 0xff)
pointers for pages that didn't have tags set explicitly.

This patch fixes the encountered conflict with KFENCE and prevents more
similar issues that can occur in the future.

Link: https://lkml.kernel.org/r/1a41abb11c51b264511d9e71c303bb16d5cb367b.1615475452.git.andreyknvl@google.com
Fixes: 2813b9c029 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Miaohe Lin
d85aecf284 hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
The current implementation of hugetlb_cgroup for shared mappings could
have different behavior.  Consider the following two scenarios:

 1.Assume initial css reference count of hugetlb_cgroup is 1:
  1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference
      count is 2 associated with 1 file_region.
  1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference
      count is 3 associated with 2 file_region.
  1.3 coalesce_file_region will coalesce these two file_regions into
      one. So css reference count is 3 associated with 1 file_region
      now.

 2.Assume initial css reference count of hugetlb_cgroup is 1 again:
  2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference
      count is 2 associated with 1 file_region.

Therefore, we might have one file_region while holding one or more css
reference counts. This inconsistency could lead to imbalanced css_get()
and css_put() pair. If we do css_put one by one (i.g. hole punch case),
scenario 2 would put one more css reference. If we do css_put all
together (i.g. truncate case), scenario 1 will leak one css reference.

The imbalanced css_get() and css_put() pair would result in a non-zero
reference when we try to destroy the hugetlb cgroup. The hugetlb cgroup
directory is removed __but__ associated resource is not freed. This
might result in OOM or can not create a new hugetlb cgroup in a busy
workload ultimately.

In order to fix this, we have to make sure that one file_region must
hold exactly one css reference. So in coalesce_file_region case, we
should release one css reference before coalescence. Also only put css
reference when the entire file_region is removed.

The last thing to note is that the caller of region_add() will only hold
one reference to h_cg->css for the whole contiguous reservation region.
But this area might be scattered when there are already some
file_regions reside in it. As a result, many file_regions may share only
one h_cg->css reference. In order to ensure that one file_region must
hold exactly one css reference, we should do css_get() for each
file_region and release the reference held by caller when they are done.

[linmiaohe@huawei.com: fix imbalanced css_get and css_put pair for shared mappings]
  Link: https://lkml.kernel.org/r/20210316023002.53921-1-linmiaohe@huawei.com

Link: https://lkml.kernel.org/r/20210301120540.37076-1-linmiaohe@huawei.com
Fixes: 075a61d07a ("hugetlb_cgroup: add accounting for shared mappings")
Reported-by: kernel test robot <lkp@intel.com> (auto build test ERROR)
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Linus Torvalds
e138138003 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "Various fixes, all over:

   1) Fix overflow in ptp_qoriq_adjfine(), from Yangbo Lu.

   2) Always store the rx queue mapping in veth, from Maciej
      Fijalkowski.

   3) Don't allow vmlinux btf in map_create, from Alexei Starovoitov.

   4) Fix memory leak in octeontx2-af from Colin Ian King.

   5) Use kvalloc in bpf x86 JIT for storing jit'd addresses, from
      Yonghong Song.

   6) Fix tx ptp stats in mlx5, from Aya Levin.

   7) Check correct ip version in tun decap, fropm Roi Dayan.

   8) Fix rate calculation in mlx5 E-Switch code, from arav Pandit.

   9) Work item memork leak in mlx5, from Shay Drory.

  10) Fix ip6ip6 tunnel crash with bpf, from Daniel Borkmann.

  11) Lack of preemptrion awareness in macvlan, from Eric Dumazet.

  12) Fix data race in pxa168_eth, from Pavel Andrianov.

  13) Range validate stab in red_check_params(), from Eric Dumazet.

  14) Inherit vlan filtering setting properly in b53 driver, from
      Florian Fainelli.

  15) Fix rtnl locking in igc driver, from Sasha Neftin.

  16) Pause handling fixes in igc driver, from Muhammad Husaini
      Zulkifli.

  17) Missing rtnl locking in e1000_reset_task, from Vitaly Lifshits.

  18) Use after free in qlcnic, from Lv Yunlong.

  19) fix crash in fritzpci mISDN, from Tong Zhang.

  20) Premature rx buffer reuse in igb, from Li RongQing.

  21) Missing termination of ip[a driver message handler arrays, from
      Alex Elder.

  22) Fix race between "x25_close" and "x25_xmit"/"x25_rx" in hdlc_x25
      driver, from Xie He.

  23) Use after free in c_can_pci_remove(), from Tong Zhang.

  24) Uninitialized variable use in nl80211, from Jarod Wilson.

  25) Off by one size calc in bpf verifier, from Piotr Krysiuk.

  26) Use delayed work instead of deferrable for flowtable GC, from
      Yinjun Zhang.

  27) Fix infinite loop in NPC unmap of octeontx2 driver, from
      Hariprasad Kelam.

  28) Fix being unable to change MTU of dwmac-sun8i devices due to lack
      of fifo sizes, from Corentin Labbe.

  29) DMA use after free in r8169 with WoL, fom Heiner Kallweit.

  30) Mismatched prototypes in isdn-capi, from Arnd Bergmann.

  31) Fix psample UAPI breakage, from Ido Schimmel"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (171 commits)
  psample: Fix user API breakage
  math: Export mul_u64_u64_div_u64
  ch_ktls: fix enum-conversion warning
  octeontx2-af: Fix memory leak of object buf
  ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation
  net: bridge: don't notify switchdev for local FDB addresses
  net/sched: act_ct: clear post_ct if doing ct_clear
  net: dsa: don't assign an error value to tag_ops
  isdn: capi: fix mismatched prototypes
  net/mlx5: SF, do not use ecpu bit for vhca state processing
  net/mlx5e: Fix division by 0 in mlx5e_select_queue
  net/mlx5e: Fix error path for ethtool set-priv-flag
  net/mlx5e: Offload tuple rewrite for non-CT flows
  net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP
  net/mlx5: Add back multicast stats for uplink representor
  net: ipconfig: ic_dev can be NULL in ic_close_devs
  MAINTAINERS: Combine "QLOGIC QLGE 10Gb ETHERNET DRIVER" sections into one
  docs: networking: Fix a typo
  r8169: fix DMA being used after buffer free if WoL is enabled
  net: ipa: fix init header command validation
  ...
2021-03-24 18:16:04 -07:00
Ong Boon Leong
5a5586112b net: stmmac: support FPE link partner hand-shaking procedure
In order to discover whether remote station supports frame preemption,
local station sends verify mPacket and expects response mPacket in
return from the remote station.

So, we add the functions to send and handle event when verify mPacket
and response mPacket are exchanged between the networked stations.

The mechanism to handle different FPE states between local and remote
station (link partner) is implemented using workqueue which starts a
task each time there is some sign of verify & response mPacket exchange
as check in FPE IRQ event. The task retries couple of times to try to
spot the states that both stations are ready to enter FPE ON. This allows
different end points to enable FPE at different time and verify-response
mPacket can happen asynchronously. Ultimately, the task will only turn
FPE ON when local station have both exchange response in both directions.

Thanks to Voon Weifeng for implementing the core functions for detecting
FPE events and send mPacket and phylink related change.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Co-developed-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Co-developed-by: Tan Tee Min <tee.min.tan@intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan@intel.com>
Co-developed-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 17:58:13 -07:00
Eric Dumazet
d1c5688087 tcp_metrics: tcpm_hash_bucket is strictly local
After commit 098a697b49 ("tcp_metrics: Use a single hash table
for all network namespaces."), tcpm_hash_bucket is local to
net/ipv4/tcp_metrics.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 16:45:40 -07:00
Ido Schimmel
e43accba9b psample: Fix user API breakage
Cited commit added a new attribute before the existing group reference
count attribute, thereby changing its value and breaking existing
applications on new kernels.

Before:

 # psample -l
 libpsample ERROR psample_group_foreach: failed to recv message: Operation not supported

After:

 # psample -l
 Group Num       Refcount        Group Seq
 1               1               0

Fix by restoring the value of the old attribute and remove the
misleading comments from the enumerator to avoid future bugs.

Cc: stable@vger.kernel.org
Fixes: d8bed686ab ("net: psample: Add tunnel support")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Adiel Bidani <adielb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 16:44:31 -07:00
Cooper Lees
3c85a8b81c Add Open Routing Protocol ID to rtnetlink.h
- The Open Routing (Open/R) network protocol netlink handler uses ID 99
- Will also add to `/etc/iproute2/rt_protos` once this is accepted
- For more information: https://github.com/facebook/openr
Signed-off-by: From: Cooper Lees <me@cooperlees.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 16:31:23 -07:00
Wong Vee Khee
0ef25ed104 net: phy: add genphy_c45_loopback
Add generic code to enable C45 PHY loopback into the common phy-c45.c
file. This will allow C45 PHY drivers aceess this by setting
.set_loopback.

Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 15:20:07 -07:00
Tan Tee Min
341f67e424 net: stmmac: Add hardware supported cross-timestamp
Cross timestamping is supported on Integrated Ethernet Controller in
Intel SoC such as EHL and TGL with Always Running Timer.

The hardware cross-timestamp result is made available to
applications through the PTP_SYS_OFFSET_PRECISE ioctl which calls
stmmac_getcrosststamp().

Device time is stored in the MAC Auxiliary register. The 64-bit System
time (ART timestamp) is stored in registers that are only addressable
by using MDIO space.

Signed-off-by: Tan Tee Min <tee.min.tan@intel.com>
Co-developed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 15:12:36 -07:00
Pablo Neira Ayuso
563ae557dd net: flow_offload: add FLOW_ACTION_PPPOE_PUSH
Add an action to represent the PPPoE hardware offload support that
includes the session ID.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00