Commit Graph

121393 Commits

Author SHA1 Message Date
Alexander Lobakin
be0cec6ffd qed: introduce qed_chain_get_elem_used{,u32}()
Add reverse-variants of qed_chain_get_elem_left{,u32}() to be able to
know current chain occupation. They will be used in the upcoming qede
XDP_REDIRECT code.
They share most of the logics with the mentioned ones, so were reused
to collapse the latters.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:03 -07:00
Alexander Lobakin
f2aefd20b0 qed: optimize common chain accessors
Constify chain pointers and refactor qed_chain_get_elem_left{,u32}() a bit.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:03 -07:00
Alexander Lobakin
155065866b qed: add support for different page sizes for chains
Extend current infrastructure to store chain page size in a struct
and use it in all functions instead of fixed QED_CHAIN_PAGE_SIZE.
Its value remains the default one, but can be overridden in
qed_chain_init_params before chain allocation.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:03 -07:00
Alexander Lobakin
b6db3f71c9 qed: simplify chain allocation with init params struct
To simplify qed_chain_alloc() prototype and call sites, introduce struct
qed_chain_init_params to specify chain params, and pass a pointer to
filled struct to the actual qed_chain_alloc() instead of a long list
of separate arguments.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:03 -07:00
Alexander Lobakin
5e776d8016 qed: move chain initialization inlines next to allocation functions
qed_chain_init*() are used in one file/place on "cold" path only, so they
can be uninlined and moved next to the call sites.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:03 -07:00
Alexander Lobakin
9b6ee3cf95 qed: sanitize PBL chains allocation
PBL chain elements are actually DMA addresses stored in __le64, but
currently their size is hardcoded to 8, and DMA addresses are assigned
via cast to variable-sized dma_addr_t without any bitwise conversions.
Change the type of pbl_virt array to match the actual one, add a new
field to store the size of allocated DMA memory and sanitize elements
assignment.

Misc: give more logic names to the members of qed_chain::pbl_sp embedded
struct.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:03 -07:00
Alexander Lobakin
2cf2f4f546 qed: reformat "qed_chain.h" a bit
Reformat structs and macros definitions a bit prior to making functional
changes.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:02 -07:00
David S. Miller
dee72f8a0c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-07-21

The following pull-request contains BPF updates for your *net-next* tree.

We've added 46 non-merge commits during the last 6 day(s) which contain
a total of 68 files changed, 4929 insertions(+), 526 deletions(-).

The main changes are:

1) Run BPF program on socket lookup, from Jakub.

2) Introduce cpumap, from Lorenzo.

3) s390 JIT fixes, from Ilya.

4) teach riscv JIT to emit compressed insns, from Luke.

5) use build time computed BTF ids in bpf iter, from Yonghong.
====================

Purely independent overlapping changes in both filter.h and xdp.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 12:35:33 -07:00
Martin Varghese
4787dd582d bareudp: Reverted support to enable & disable rx metadata collection
The commit fe80536acf ("bareudp: Added attribute to enable & disable
rx metadata collection") breaks the the original(5.7) default behavior of
bareudp module to collect RX metadadata at the receive. It was added to
avoid the crash at the kernel neighbour subsytem when packet with metadata
from bareudp is processed. But it is no more needed as the
commit 394de110a7 ("net: Added pointer check for
dst->ops->neigh_lookup in dst_neigh_lookup_skb") solves this crash.

Fixes: fe80536acf ("bareudp: Added attribute to enable & disable rx metadata collection")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 18:30:47 -07:00
Parav Pandit
336ce1c932 devlink: Add comment for devlink instance lock
Add comment to describe the purpose of devlink instance lock.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 16:14:58 -07:00
Russell King
93eaceb0fc net: phylink: add interface to configure clause 22 PCS PHY
Add an interface to configure the advertisement for a clause 22 PCS
PHY, and set the AN enable flag in the BMCR appropriately.

Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:51 -07:00
Russell King
7137e18f6f net: phylink: add struct phylink_pcs
Add a way for MAC PCS to have private data while keeping independence
from struct phylink_config, which is used for the MAC itself. We need
this independence as we will have stand-alone code for PCS that is
independent of the MAC.  Introduce struct phylink_pcs, which is
designed to be embedded in a driver private data structure.

This structure does not include a mdio_device as there are PCS
implementations such as the Marvell DSA and network drivers where this
is not necessary.

Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:51 -07:00
Russell King
b7ad14c2fe net: phylink: re-implement interface configuration with PCS
With PCS support, how we implement interface reconfiguration (or other
major reconfiguration) is not up to the job; we end up reconfiguring
the PCS for an interface change while the link could potentially be up.
In order to solve this, add two additional MAC methods for major
configuration, one to prepare for the change, and one to finish the
change.

This allows mvneta and mvpp2 to shutdown what they require prior to the
MAC and PCS configuration calls, and then restart as appropriate.

This impacts ksettings_set(), which now needs to identify whether the
change is a minor tweak to the advertisement masks or whether the
interface mode has changed, and call the appropriate function for that
update.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:51 -07:00
Russell King
1571e700fd net: phylink: in-band pause mode advertisement update for PCS
Re-code the pause in-band advertisement update in light of the addition
of PCS support, so that we perform the minimum required; only the PCS
configuration function needs to be called in this case, followed by the
request to trigger a restart of negotiation if the programmed
advertisement changed.

We need to change the pcs_config() signature to pass whether resolved
pause should be passed to the MAC for setups such as mvneta and mvpp2
where doing so overrides the MAC manual flow controls.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 15:46:50 -07:00
Yonghong Song
951cf368bc bpf: net: Use precomputed btf_id for bpf iterators
One additional field btf_id is added to struct
bpf_ctx_arg_aux to store the precomputed btf_ids.
The btf_id is computed at build time with
BTF_ID_LIST or BTF_ID_LIST_GLOBAL macro definitions.
All existing bpf iterators are changed to used
pre-compute btf_ids.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163403.1393551-1-yhs@fb.com
2020-07-21 13:26:26 -07:00
Yonghong Song
fce557bcef bpf: Make btf_sock_ids global
tcp and udp bpf_iter can reuse some socket ids in
btf_sock_ids, so make it global.

I put the extern definition in btf_ids.h as a central
place so it can be easily discovered by developers.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163402.1393427-1-yhs@fb.com
2020-07-21 13:26:26 -07:00
Yonghong Song
0f12e584b2 bpf: Add BTF_ID_LIST_GLOBAL in btf_ids.h
Existing BTF_ID_LIST used a local static variable
to store btf_ids. This patch provided a new macro
BTF_ID_LIST_GLOBAL to store btf_ids in a global
variable which can be shared among multiple files.

The existing BTF_ID_LIST is still retained.
Two reasons. First, BTF_ID_LIST is also used to build
btf_ids for helper arguments which typically
is an array of 5. Since typically different
helpers have different signature, it makes
little sense to share them. Second, some
current computed btf_ids are indeed local.
If later those btf_ids are shared between
different files, they can use BTF_ID_LIST_GLOBAL then.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/bpf/20200720163401.1393159-1-yhs@fb.com
2020-07-21 13:26:26 -07:00
Yonghong Song
bc4f0548f6 bpf: Compute bpf_skc_to_*() helper socket btf ids at build time
Currently, socket types (struct tcp_sock, udp_sock, etc.)
used by bpf_skc_to_*() helpers are computed when vmlinux_btf
is first built in the kernel.

Commit 5a2798ab32
("bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros")
implemented a mechanism to compute btf_ids at kernel build
time which can simplify kernel implementation and reduce
runtime overhead by removing in-kernel btf_id calculation.
This patch did exactly this, removing in-kernel btf_id
computation and utilizing build-time btf_id computation.

If CONFIG_DEBUG_INFO_BTF is not defined, BTF_ID_LIST will
define an array with size of 5, which is not enough for
btf_sock_ids. So define its own static array if
CONFIG_DEBUG_INFO_BTF is not defined.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163358.1393023-1-yhs@fb.com
2020-07-21 13:26:26 -07:00
Alexander Lobakin
99785a87fc qed: add support for the extended speed and FEC modes
Add all necessary code (NVM parsing, MFW and Ethtool reports etc.) to
support extended speed and FEC modes.
These new modes are supported by the new boards revisions and newer
MFW versions.

Misc: correct port type for MEDIA_KR.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20 17:59:44 -07:00
Alexander Lobakin
98e675ec5a qed: add missing loopback modes
These modes are relevant only for several boards, but may be reported by
MFW as well as the others.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20 17:59:44 -07:00
Alexander Lobakin
ae7e69379f qed: add support for Forward Error Correction
Add all necessary routines for reading supported FEC modes from NVM and
querying FEC control to the MFW (if the running version supports it).

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20 17:59:44 -07:00
Alexander Lobakin
37237b5b71 qed: reformat several structures a bit
Prior to adding new fields and bitfields, reformat the related
structures according to the Linux style (spaces to tabs,
lowercase hex, indentation etc.).

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20 17:59:44 -07:00
Alexander Lobakin
bdb5d8ec47 qed, qede, qedf: convert link mode from u32 to ETHTOOL_LINK_MODE
Currently qed driver already ran out of 32 bits to store link modes,
and this doesn't allow to add and support more speeds.
Convert custom link mode to generic Ethtool bitmap and definitions
(convenient Phylink shorthands are used for elegance and readability).
This allowed us to drop all conversions/mappings between the driver
and Ethtool.

This involves changes in qede and qedf as well, as they used definitions
from shared "qed_if.h".

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20 17:59:43 -07:00
Alexander Lobakin
e812916d32 linkmode: introduce linkmode_intersects()
Add a new helper to find intersections between Ethtool link modes,
linkmode_intersects(), similar to the other linkmode helpers.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20 17:59:43 -07:00
Christoph Hellwig
f1bfd71c86 arch, net: remove the last csum_partial_copy() leftovers
Most of the tree only uses and implements csum_partial_copy_nocheck,
but the c6x and lib/checksum.c implement a csum_partial_copy that
isn't used anywere except to define csum_partial_copy.  Get rid of
this pointless alias.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20 17:45:32 -07:00
Jiri Pirko
a8b7b2d0b3 sched: sch_api: add missing rcu read lock to silence the warning
In case the qdisc_match_from_root function() is called from non-rcu path
with rtnl mutex held, a suspiciout rcu usage warning appears:

[  241.504354] =============================
[  241.504358] WARNING: suspicious RCU usage
[  241.504366] 5.8.0-rc4-custom-01521-g72a7c7d549c3 #32 Not tainted
[  241.504370] -----------------------------
[  241.504378] net/sched/sch_api.c:270 RCU-list traversed in non-reader section!!
[  241.504382]
               other info that might help us debug this:
[  241.504388]
               rcu_scheduler_active = 2, debug_locks = 1
[  241.504394] 1 lock held by tc/1391:
[  241.504398]  #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0
[  241.504431]
               stack backtrace:
[  241.504440] CPU: 0 PID: 1391 Comm: tc Not tainted 5.8.0-rc4-custom-01521-g72a7c7d549c3 #32
[  241.504446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
[  241.504453] Call Trace:
[  241.504465]  dump_stack+0x100/0x184
[  241.504482]  lockdep_rcu_suspicious+0x153/0x15d
[  241.504499]  qdisc_match_from_root+0x293/0x350

Fix this by passing the rtnl held lockdep condition down to
hlist_for_each_entry_rcu()

Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20 17:00:02 -07:00
Florian Fainelli
9c0c7014f3 net: dsa: Setup dsa_netdev_ops
Now that we have all the infrastructure in place for calling into the
dsa_ptr->netdev_ops function pointers, install them when we configure
the DSA CPU/management interface and tear them down. The flow is
unchanged from before, but now we preserve equality of tests when
network device drivers do tests like dev->netdev_ops == &foo_ops which
was not the case before since we were allocating an entirely new
structure.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20 16:48:22 -07:00
Florian Fainelli
4cfab35667 net: dsa: Add wrappers for overloaded ndo_ops
Add definitions for the dsa_netdevice_ops structure which is a subset of
the net_device_ops structure for the specific operations that we care
about overlaying on top of the DSA CPU port net_device and provide
inline stubs that take core managing whether DSA code is reachable.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20 16:48:22 -07:00
Vladimir Oltean
b6bd41363a ptp: introduce a phase offset in the periodic output request
Some PHCs like the ocelot/felix switch cannot emit generic periodic
output, but just PPS (pulse per second) signals, which:
- don't start from arbitrary absolute times, but are rather
  phase-aligned to the beginning of [the closest next] second.
- have an optional phase offset relative to that beginning of the
  second.

For those, it was initially established that they should reject any
other absolute time for the PTP_PEROUT_REQUEST than 0.000000000 [1].

But when it actually came to writing an application [2] that makes use
of this functionality, we realized that we can't really deal generically
with PHCs that support absolute start time, and with PHCs that don't,
without an explicit interface. Namely, in an ideal world, PHC drivers
would ensure that the "perout.start" value written to hardware will
result in a functional output. This means that if the PTP time has
become in the past of this PHC's current time, it should be
automatically fast-forwarded by the driver into a close enough future
time that is known to work (note: this is necessary only if the hardware
doesn't do this fast-forward by itself). But we don't really know what
is the status for PHC drivers in use today, so in the general sense,
user space would be risking to have a non-functional periodic output if
it simply asked for a start time of 0.000000000.

So let's introduce a flag for this type of reduced-functionality
hardware, named PTP_PEROUT_PHASE. The start time is just "soon", the
only thing we know for sure about this signal is that its rising edge
events, Rn, occur at:

Rn = perout.phase + n * perout.period

The "phase" in the periodic output structure is simply an alias to the
"start" time, since both cannot logically be specified at the same time.
Therefore, the binary layout of the structure is not affected.

[1]: https://patchwork.ozlabs.org/project/netdev/patch/20200320103726.32559-7-yangbo.lu@nxp.com/
[2]: https://www.mail-archive.com/linuxptp-devel@lists.sourceforge.net/msg04142.html

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 19:22:56 -07:00
Vladimir Oltean
f65b71aa25 ptp: add ability to configure duty cycle for periodic output
There are external event timestampers (PHCs with support for
PTP_EXTTS_REQUEST) that timestamp both event edges.

When those edges are very close (such as in the case of a short pulse),
there is a chance that the collected timestamp might be of the rising,
or of the falling edge, we never know.

There are also PHCs capable of generating periodic output with a
configurable duty cycle. This is good news, because we can space the
rising and falling edge out enough in time, that the risks to overrun
the 1-entry timestamp FIFO of the extts PHC are lower (example: the
perout PHC can be configured for a period of 1 second, and an "on" time
of 0.5 seconds, resulting in a duty cycle of 50%).

A flag is introduced for signaling that an on time is present in the
perout request structure, for preserving compatibility. Logically
speaking, the duty cycle cannot exceed 100% and the PTP core checks for
this.

PHC drivers that don't support this flag emit a periodic output of an
unspecified duty cycle, same as before.

The duty cycle is encoded as an "on" time, similar to the "start" and
"period" times, and reuses the reserved space while preserving overall
binary layout.

Pahole reported before:

struct ptp_perout_request {
        struct ptp_clock_time start;                     /*     0    16 */
        struct ptp_clock_time period;                    /*    16    16 */
        unsigned int               index;                /*    32     4 */
        unsigned int               flags;                /*    36     4 */
        unsigned int               rsv[4];               /*    40    16 */

        /* size: 56, cachelines: 1, members: 5 */
        /* last cacheline: 56 bytes */
};

And now:

struct ptp_perout_request {
        struct ptp_clock_time start;                     /*     0    16 */
        struct ptp_clock_time period;                    /*    16    16 */
        unsigned int               index;                /*    32     4 */
        unsigned int               flags;                /*    36     4 */
        union {
                struct ptp_clock_time on;                /*    40    16 */
                unsigned int       rsv[4];               /*    40    16 */
        };                                               /*    40    16 */

        /* size: 56, cachelines: 1, members: 5 */
        /* last cacheline: 56 bytes */
};

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 19:22:56 -07:00
Willem de Bruijn
eba75c587e icmp: support rfc 4884
Add setsockopt SOL_IP/IP_RECVERR_4884 to return the offset to an
extension struct if present.

ICMP messages may include an extension structure after the original
datagram. RFC 4884 standardized this behavior. It stores the offset
in words to the extension header in u8 icmphdr.un.reserved[1].

The field is valid only for ICMP types destination unreachable, time
exceeded and parameter problem, if length is at least 128 bytes and
entire packet does not exceed 576 bytes.

Return the offset to the start of the extension struct when reading an
ICMP error from the error queue, if it matches the above constraints.

Do not return the raw u8 field. Return the offset from the start of
the user buffer, in bytes. The kernel does not return the network and
transport headers, so subtract those.

Also validate the headers. Return the offset regardless of validation,
as an invalid extension must still not be misinterpreted as part of
the original datagram. Note that !invalid does not imply valid. If
the extension version does not match, no validation can take place,
for instance.

For backward compatibility, make this optional, set by setsockopt
SOL_IP/IP_RECVERR_RFC4884. For API example and feature test, see
github.com/wdebruij/kerneltools/blob/master/tests/recv_icmp_v2.c

For forward compatibility, reserve only setsockopt value 1, leaving
other bits for additional icmp extensions.

Changes
  v1->v2:
  - convert word offset to byte offset from start of user buffer
    - return in ee_data as u8 may be insufficient
  - define extension struct and object header structs
  - return len only if constraints met
  - if returning len, also validate

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 19:20:22 -07:00
Lorenzo Bianconi
2f0bc54ba9 xdp: introduce xdp_get_shared_info_from_{buff, frame} utility routines
Introduce xdp_get_shared_info_from_{buff,frame} utility routines to get
skb_shared_info from xdp buffer/frame pointer.
xdp_get_shared_info_from_{buff,frame} will be used to implement xdp
multi-buffer support

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:28:33 -07:00
Christoph Hellwig
a44d9e7210 net: make ->{get,set}sockopt in proto_ops optional
Just check for a NULL method instead of wiring up
sock_no_{get,set}sockopt.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:16:41 -07:00
Christoph Hellwig
3021ad5299 net/ipv6: remove compat_ipv6_{get,set}sockopt
Handle the few cases that need special treatment in-line using
in_compat_syscall().  This also removes all the now unused
compat_{get,set}sockopt methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:16:41 -07:00
Christoph Hellwig
b6238c04c0 net/ipv4: remove compat_ip_{get,set}sockopt
Handle the few cases that need special treatment in-line using
in_compat_syscall().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:16:41 -07:00
Christoph Hellwig
c34bc10d25 netfilter: remove the compat argument to xt_copy_counters_from_user
Lift the in_compat_syscall() from the callers instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:16:40 -07:00
Christoph Hellwig
77d4df41d5 netfilter: remove the compat_{get,set} methods
All instances handle compat sockopts via in_compat_syscall() now, so
remove the compat_{get,set} methods as well as the
compat_nf_{get,set}sockopt wrappers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:16:40 -07:00
Christoph Hellwig
55db9c0e85 net: remove compat_sys_{get,set}sockopt
Now that the ->compat_{get,set}sockopt proto_ops methods are gone
there is no good reason left to keep the compat syscalls separate.

This fixes the odd use of unsigned int for the compat_setsockopt
optlen and the missing sock_use_custom_sol_socket.

It would also easily allow running the eBPF hooks for the compat
syscalls, but such a large change in behavior does not belong into
a consolidation patch like this one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:16:40 -07:00
Christoph Hellwig
8c918ffbba net: remove compat_sock_common_{get,set}sockopt
Add the compat handling to sock_common_{get,set}sockopt instead,
keyed of in_compat_syscall().  This allow to remove the now unused
->compat_{get,set}sockopt methods from struct proto_ops.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:16:40 -07:00
Christoph Hellwig
4d295e5461 net: simplify cBPF setsockopt compat handling
Add a helper that copies either a native or compat bpf_fprog from
userspace after verifying the length, and remove the compat setsockopt
handlers that now aren't required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:16:40 -07:00
Christoph Hellwig
a06d30ae7a net/atm: remove the atmdev_ops {get, set}sockopt methods
All implementations of these two methods are dummies that always
return -EINVAL.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:16:40 -07:00
Michael Walle
c4471ad9a5 net: phy: add USXGMII link partner ability constants
The constants are taken from the USXGMII Singleport Copper Interface
specification. The naming are based on the SGMII ones, but with an MDIO_
prefix.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:05:49 -07:00
Jakub Sitnicki
1122702f02 inet6: Run SK_LOOKUP BPF program on socket lookup
Following ipv4 stack changes, run a BPF program attached to netns before
looking up a listening socket. Program can return a listening socket to use
as result of socket lookup, fail the lookup, or take no action.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-7-jakub@cloudflare.com
2020-07-17 20:18:17 -07:00
Jakub Sitnicki
1559b4aa1d inet: Run SK_LOOKUP BPF program on socket lookup
Run a BPF program before looking up a listening socket on the receive path.
Program selects a listening socket to yield as result of socket lookup by
calling bpf_sk_assign() helper and returning SK_PASS code. Program can
revert its decision by assigning a NULL socket with bpf_sk_assign().

Alternatively, BPF program can also fail the lookup by returning with
SK_DROP, or let the lookup continue as usual with SK_PASS on return, when
no socket has been selected with bpf_sk_assign().

This lets the user match packets with listening sockets freely at the last
possible point on the receive path, where we know that packets are destined
for local delivery after undergoing policing, filtering, and routing.

With BPF code selecting the socket, directing packets destined to an IP
range or to a port range to a single socket becomes possible.

In case multiple programs are attached, they are run in series in the order
in which they were attached. The end result is determined from return codes
of all the programs according to following rules:

 1. If any program returned SK_PASS and selected a valid socket, the socket
    is used as result of socket lookup.
 2. If more than one program returned SK_PASS and selected a socket,
    last selection takes effect.
 3. If any program returned SK_DROP, and no program returned SK_PASS and
    selected a socket, socket lookup fails with -ECONNREFUSED.
 4. If all programs returned SK_PASS and none of them selected a socket,
    socket lookup continues to htable-based lookup.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-5-jakub@cloudflare.com
2020-07-17 20:18:16 -07:00
Jakub Sitnicki
e9ddbb7707 bpf: Introduce SK_LOOKUP program type with a dedicated attach point
Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
when looking up a listening socket for a new connection request for
connection oriented protocols, or when looking up an unconnected socket for
a packet for connection-less protocols.

When called, SK_LOOKUP BPF program can select a socket that will receive
the packet. This serves as a mechanism to overcome the limits of what
bind() API allows to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, on fixed port to a socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, on any port to a socket

     198.51.100.1, any port -> L7 proxy socket

In its run-time context program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple. Context can be further extended to include ingress
interface identifier.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection. Transport layer then uses the selected
socket as a result of socket lookup.

In its basic form, SK_LOOKUP acts as a filter and hence must return either
SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should
look for a socket to receive the packet, or use the one selected by the
program if available, while SK_DROP informs the transport layer that the
lookup should fail.

This patch only enables the user to attach an SK_LOOKUP program to a
network namespace. Subsequent patches hook it up to run on local delivery
path in ipv4 and ipv6 stacks.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com
2020-07-17 20:18:16 -07:00
Jakub Sitnicki
ce3aa9cc51 bpf, netns: Handle multiple link attachments
Extend the BPF netns link callbacks to rebuild (grow/shrink) or update the
prog_array at given position when link gets attached/updated/released.

This let's us lift the limit of having just one link attached for the new
attach type introduced by subsequent patch.

No functional changes intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-2-jakub@cloudflare.com
2020-07-17 20:18:16 -07:00
David S. Miller
d44a919a5c Merge tag 'mlx5-updates-2020-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5-updates-2020-07-16

Fixes:
1) Fix build break when CONFIG_XPS is not set
2) Fix missing switch_id for representors

Updates:
1) IPsec XFRM RX offloads from Raed and Huy.
  - Added IPSec RX steering flow tables to NIC RX
  - Refactoring of the existing FPGA IPSec, to add support
    for ConnectX IPsec.
  - RX data path handling for IPSec traffic
  - Synchronize offloading device ESN with xfrm received SN

2) Parav allows E-Switch to siwtch to switchdev mode directly without
   the need to go through legacy mode first.

3) From Tariq, Misc updates including:
   3.1) indirect calls for RX and XDP handlers
   3.2) Make MLX5_EN_TLS non-prompt as it should always be enabled when
        TLS and MLX5_EN are selected.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17 13:04:17 -07:00
Priyaranjan Jha
e3a5a1e8b6 tcp: add SNMP counter for no. of duplicate segments reported by DSACK
There are two existing SNMP counters, TCPDSACKRecv and TCPDSACKOfoRecv,
which are incremented depending on whether the DSACKed range is below
the cumulative ACK sequence number or not. Unfortunately, these both
implicitly assume each DSACK covers only one segment. This makes these
counters unusable for estimating spurious retransmit rates,
or real/non-spurious loss rate.

This patch introduces a new SNMP counter, TCPDSACKRecvSegs, which tracks
the estimated number of duplicate segments based on:
(DSACKed sequence range) / MSS. This counter is usable for estimating
spurious retransmit rates, or real/non-spurious loss rate.

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17 12:54:30 -07:00
Bjørn Mork
1ea2b748b5 net: usbnet: export usbnet_set_rx_mode()
This function can be reused by other usbnet minidrivers.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17 12:42:47 -07:00
Miguel Rodríguez Pérez
e506addeff net: cdc_ether: export usbnet_cdc_update_filter
This makes the function available to other drivers, like cdc_ncm.

Signed-off-by: Miguel Rodríguez Pérez <miguel@det.uvigo.gal>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17 12:42:47 -07:00