Commit Graph

116106 Commits

Author SHA1 Message Date
Mohit P. Tahiliani
5205ea00cd net: sched: pie: export symbols to be reused by FQ-PIE
This patch makes the drop_early(), calculate_probability() and
pie_process_dequeue() functions generic enough to be used by
both PIE and FQ-PIE (to be added in a future commit). The major
change here is in the way the functions take in arguments. This
patch exports these functions and makes FQ-PIE dependent on
sch_pie.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 11:38:31 +01:00
Mohit P. Tahiliani
b42a3d7c7c pie: improve comments and commenting style
Improve the comments along with the commenting style used to
describe the members of the structures and their initial values
in the init functions.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 11:38:31 +01:00
Mohit P. Tahiliani
2dfb1952a9 pie: rearrange structure members and their initializations
Rearrange the members of the structure such that closely
referenced members appear together and/or fit in the same
cacheline. Also, change the order of their initializations to
match the order in which they appear in the structure.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 11:38:31 +01:00
Mohit P. Tahiliani
1dbfc5e071 pie: use u8 instead of bool in pie_vars
Linux best practice recommends using u8 for true/false values in
structures.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 11:38:31 +01:00
Mohit P. Tahiliani
cf4eeee5ff pie: rearrange macros in order of length
Rearrange macros in order of length and align the values to
improve readability.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 11:38:31 +01:00
Mohit P. Tahiliani
805a5a23a4 pie: use U64_MAX to denote (2^64 - 1)
Use the U64_MAX macro to denote the constant (2^64 - 1).

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 11:38:31 +01:00
Mohit P. Tahiliani
84bf557fb0 net: sched: pie: move common code to pie.h
This patch moves macros, structures and small functions common
to PIE and FQ-PIE (to be added in a future commit) from the file
net/sched/sch_pie.c to the header file include/net/pie.h.
All the moved functions are made inline.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 11:38:30 +01:00
David S. Miller
954b3c4397 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-01-22

The following pull-request contains BPF updates for your *net-next* tree.

We've added 92 non-merge commits during the last 16 day(s) which contain
a total of 320 files changed, 7532 insertions(+), 1448 deletions(-).

The main changes are:

1) function by function verification and program extensions from Alexei.

2) massive cleanup of selftests/bpf from Toke and Andrii.

3) batched bpf map operations from Brian and Yonghong.

4) tcp congestion control in bpf from Martin.

5) bulking for non-map xdp_redirect form Toke.

6) bpf_send_signal_thread helper from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 08:10:16 +01:00
Martin KaFai Lau
5576b991e9 bpf: Add BPF_FUNC_jiffies64
This patch adds a helper to read the 64bit jiffies.  It will be used
in a later patch to implement the bpf_cubic.c.

The helper is inlined for jit_requested and 64 BITS_PER_LONG
as the map_gen_lookup().  Other cases could be considered together
with map_gen_lookup() if needed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200122233646.903260-1-kafai@fb.com
2020-01-22 16:30:10 -08:00
Alexei Starovoitov
be8704ff07 bpf: Introduce dynamic program extensions
Introduce dynamic program extensions. The users can load additional BPF
functions and replace global functions in previously loaded BPF programs while
these programs are executing.

Global functions are verified individually by the verifier based on their types only.
Hence the global function in the new program which types match older function can
safely replace that corresponding function.

This new function/program is called 'an extension' of old program. At load time
the verifier uses (attach_prog_fd, attach_btf_id) pair to identify the function
to be replaced. The BPF program type is derived from the target program into
extension program. Technically bpf_verifier_ops is copied from target program.
The BPF_PROG_TYPE_EXT program type is a placeholder. It has empty verifier_ops.
The extension program can call the same bpf helper functions as target program.
Single BPF_PROG_TYPE_EXT type is used to extend XDP, SKB and all other program
types. The verifier allows only one level of replacement. Meaning that the
extension program cannot recursively extend an extension. That also means that
the maximum stack size is increasing from 512 to 1024 bytes and maximum
function nesting level from 8 to 16. The programs don't always consume that
much. The stack usage is determined by the number of on-stack variables used by
the program. The verifier could have enforced 512 limit for combined original
plus extension program, but it makes for difficult user experience. The main
use case for extensions is to provide generic mechanism to plug external
programs into policy program or function call chaining.

BPF trampoline is used to track both fentry/fexit and program extensions
because both are using the same nop slot at the beginning of every BPF
function. Attaching fentry/fexit to a function that was replaced is not
allowed. The opposite is true as well. Replacing a function that currently
being analyzed with fentry/fexit is not allowed. The executable page allocated
by BPF trampoline is not used by program extensions. This inefficiency will be
optimized in future patches.

Function by function verification of global function supports scalars and
pointer to context only. Hence program extensions are supported for such class
of global functions only. In the future the verifier will be extended with
support to pointers to structures, arrays with sizes, etc.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200121005348.2769920-2-ast@kernel.org
2020-01-22 23:04:52 +01:00
Björn Töpel
43a825afc9 xsk, net: Make sock_def_readable() have external linkage
XDP sockets use the default implementation of struct sock's
sk_data_ready callback, which is sock_def_readable(). This function
is called in the XDP socket fast-path, and involves a retpoline. By
letting sock_def_readable() have external linkage, and being called
directly, the retpoline can be avoided.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200120092917.13949-1-bjorn.topel@gmail.com
2020-01-22 00:08:52 +01:00
David S. Miller
4f2c17e0f3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2020-01-21

1) Add support for TCP encapsulation of IKE and ESP messages,
   as defined by RFC 8229. Patchset from Sabrina Dubroca.

Please note that there is a merge conflict in:

net/unix/af_unix.c

between commit:

3c32da19a8 ("unix: Show number of pending scm files of receive queue in fdinfo")

from the net-next tree and commit:

b50b0580d2 ("net: add queue argument to __skb_wait_for_more_packets and __skb_{,try_}recv_datagram")

from the ipsec-next tree.

The conflict can be solved as done in linux-next.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21 12:18:20 +01:00
Martin Schiller
f362e5fe0f wan/hdlc_x25: make lapb params configurable
This enables you to configure mode (DTE/DCE), Modulo, Window, T1, T2, N2 via
sethdlc (which needs to be patched as well).

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21 11:41:36 +01:00
Heiner Kallweit
bbbf8430af net: phy: add new version of phy_do_ioctl
Add a new version of phy_do_ioctl that doesn't check whether net_device
is running. It will typically be used if suitable drivers attach the
PHY in probe already.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21 10:50:41 +01:00
Heiner Kallweit
3231e5d222 net: phy: rename phy_do_ioctl to phy_do_ioctl_running
We just added phy_do_ioctl, but it turned out that we need another
version of this function that doesn't check whether net_device is
running. So rename phy_do_ioctl to phy_do_ioctl_running.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21 10:50:41 +01:00
David S. Miller
ad063075d4 Merge tag 'rds-odp-for-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma
Leon Romanovsky says:

====================
Use ODP MRs for kernel ULPs

The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.

The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.

The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.

The third part is actual user of those in-kernel APIs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21 10:22:51 +01:00
Heiner Kallweit
2ab1d925aa net: phy: add generic ndo_do_ioctl handler phy_do_ioctl
A number of network drivers has the same glue code to use phy_mii_ioctl
as ndo_do_ioctl handler. So let's add such a generic ndo_do_ioctl
handler to phylib.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-20 10:43:24 +01:00
David S. Miller
b3f7e3f23a Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net 2020-01-19 22:10:04 +01:00
Linus Torvalds
11a8272947 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix non-blocking connect() in x25, from Martin Schiller.

 2) Fix spurious decryption errors in kTLS, from Jakub Kicinski.

 3) Netfilter use-after-free in mtype_destroy(), from Cong Wang.

 4) Limit size of TSO packets properly in lan78xx driver, from Eric
    Dumazet.

 5) r8152 probe needs an endpoint sanity check, from Johan Hovold.

 6) Prevent looping in tcp_bpf_unhash() during sockmap/tls free, from
    John Fastabend.

 7) hns3 needs short frames padded on transmit, from Yunsheng Lin.

 8) Fix netfilter ICMP header corruption, from Eyal Birger.

 9) Fix soft lockup when low on memory in hns3, from Yonglong Liu.

10) Fix NTUPLE firmware command failures in bnxt_en, from Michael Chan.

11) Fix memory leak in act_ctinfo, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
  cxgb4: reject overlapped queues in TC-MQPRIO offload
  cxgb4: fix Tx multi channel port rate limit
  net: sched: act_ctinfo: fix memory leak
  bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal.
  bnxt_en: Fix ipv6 RFS filter matching logic.
  bnxt_en: Fix NTUPLE firmware command failures.
  net: systemport: Fixed queue mapping in internal ring map
  net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
  net: dsa: sja1105: Don't error out on disabled ports with no phy-mode
  net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset
  net: hns: fix soft lockup when there is not enough memory
  net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()
  net/sched: act_ife: initalize ife->metalist earlier
  netfilter: nat: fix ICMP header corruption on ICMP errors
  net: wan: lapbether.c: Use built-in RCU list checking
  netfilter: nf_tables: fix flowtable list del corruption
  netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks()
  netfilter: nf_tables: remove WARN and add NLA_STRING upper limits
  netfilter: nft_tunnel: ERSPAN_VERSION must not be null
  netfilter: nft_tunnel: fix null-attribute check
  ...
2020-01-19 12:03:53 -08:00
Amit Cohen
c3cae4916e devlink: Add overlay source MAC is multicast trap
Add packet trap that can report NVE packets that the device decided to
drop because their overlay source MAC is multicast.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19 16:23:52 +01:00
Amit Cohen
13c056ec7d devlink: Add tunnel generic packet traps
Add packet traps that can report packets that were dropped during tunnel
decapsulation.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19 16:23:52 +01:00
Amit Cohen
95f0ead8f0 devlink: Add non-routable packet trap
Add packet trap that can report packets that reached the router, but are
non-routable. For example, IGMP queries can be flooded by the device in
layer 2 and reach the router. Such packets should not be routed and
instead dropped.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19 16:23:52 +01:00
David S. Miller
95ae2d1d11 Merge branch 'for-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
Mellanox, mlx5 E-Switch chains and prios

This series has two parts,

1) A merge commit with mlx5-next branch that include updates for mlx5
HW layouts needed for this and upcoming submissions.

2) From Paul, Increase the number of chains and prios

Currently the Mellanox driver supports offloading tc rules that
are defined on the first 4 chains and the first 16 priorities.
The restriction stems from the firmware flow level enforcement
requiring a flow table of a certain level to point to a flow
table of a higher level. This limitation may be ignored by setting
the ignore_flow_level bit when creating flow table entries.
Use unmanaged tables and ignore flow level to create more tables than
declared by fs_core steering. Manually manage the connections between the
tables themselves.

HW table is instantiated for every tc <chain,prio> tuple. The miss rule
of every table either jumps to the next <chain,prio> table, or continues
to slow_fdb. This logic is realized by following this sequence:

1. Create an auto-grouped flow table for the specified priority with
    reserved entries

Reserved entries are allocated at the end of the flow table.
Flow groups are evaluated in sequence and therefore it is guaranteed
that the flow group defined on the last FTEs will be the last to evaluate.

Define a "match all" flow group on the reserved entries, providing
the platform to add table miss actions.

2. Set the miss rule action to jump to the next <chain,prio> table
    or the slow_fdb.

3. Link the previous priority table to point to the new table by
    updating its miss rule.

Please pull and let me know if there's any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19 16:17:07 +01:00
David S. Miller
7f013edeba Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next, they are:

1) Incorrect uapi header comment in bitwise, from Jeremy Sowden.

2) Fetch flow statistics if flow is still active.

3) Restrict flow matching on hardware based on input device.

4) Add nf_flow_offload_work_alloc() helper function.

5) Remove the last client of the FLOW_OFFLOAD_DYING flag, use teardown
   instead.

6) Use atomic bitwise operation to operate with flow flags.

7) Add nf_flowtable_hw_offload() helper function to check for the
   NF_FLOWTABLE_HW_OFFLOAD flag.

8) Add NF_FLOW_HW_REFRESH to retry hardware offload from the flowtable
   software datapath.

9) Remove indirect calls in xt_hashlimit, from Florian Westphal.

10) Add nf_flow_offload_tuple() helper to consolidate code.

11) Add nf_flow_table_offload_cmd() helper function.

12) A few whitespace cleanups in nf_tables in bitwise and the bitmap/hash
    set types, from Jeremy Sowden.

13) Cleanup netlink attribute checks in bitwise, from Jeremy Sowden.

14) Replace goto by return in error path of nft_bitwise_dump(), from
    Jeremy Sowden.

15) Add bitwise operation netlink attribute, also from Jeremy.

16) Add nft_bitwise_init_bool(), from Jeremy Sowden.

17) Add nft_bitwise_eval_bool(), also from Jeremy.

18) Add nft_bitwise_dump_bool(), from Jeremy Sowden.

19) Disallow hardware offload for other that NFT_BITWISE_BOOL,
    from Jeremy Sowden.

20) Add NFTA_BITWISE_DATA netlink attribute, again from Jeremy.

21) Add support for bitwise shift operation, from Jeremy Sowden.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19 10:29:05 +01:00
Linus Torvalds
244dc26890 Merge tag 'drm-fixes-2020-01-19' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "Back from LCA2020, fixes wasn't too busy last week, seems to have
  quieten down appropriately, some amdgpu, i915, then a core mst fix and
  one fix for virtio-gpu and one for rockchip:

  core mst:
   - serialize down messages and clear timeslots are on unplug

  amdgpu:
   - Update golden settings for renoir
   - eDP fix

  i915:
   - uAPI fix: Remove dash and colon from PMU names to comply with
     tools/perf
   - Fix for include file that was indirectly included
   - Two fixes to make sure VMA are marked active for error capture

  virtio:
   - maintain obj reservation lock when submitting cmds

  rockchip:
   - increase link rate var size to accommodate rates"

* tag 'drm-fixes-2020-01-19' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Reorder detect_edp_sink_caps before link settings read.
  drm/amdgpu: update goldensetting for renoir
  drm/dp_mst: Have DP_Tx send one msg at a time
  drm/dp_mst: clear time slots for ports invalid
  drm/i915/pmu: Do not use colons or dashes in PMU names
  drm/rockchip: fix integer type used for storing dp data rate
  drm/i915/gt: Mark ring->vma as active while pinned
  drm/i915/gt: Mark context->state vma as active while pinned
  drm/i915/gt: Skip trying to unbind in restore_ggtt_mappings
  drm/i915: Add missing include file <linux/math64.h>
  drm/virtio: add missing virtio_gpu_array_lock_resv call
2020-01-18 13:57:31 -08:00
Linus Torvalds
ba0f472203 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rseq fixes from Ingo Molnar:
 "Two rseq bugfixes:

   - CLONE_VM !CLONE_THREAD didn't work properly, the kernel would end
     up corrupting the TLS of the parent. Technically a change in the
     ABI but the previous behavior couldn't resonably have been relied
     on by applications so this looks like a valid exception to the ABI
     rule.

   - Make the RSEQ_FLAG_UNREGISTER ABI behavior consistent with the
     handling of other flags. This is not thought to impact any
     applications either"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq: Unregister rseq for clone CLONE_VM
  rseq: Reject unknown flags on rseq unregister
2020-01-18 12:29:13 -08:00
Dave Airlie
f66d84c8b4 Merge tag 'drm-misc-fixes-2020-01-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
virtio: maintain obj reservation lock when submitting cmds (Gerd)
rockchip: increase link rate var size to accommodate rates (Tobias)
mst: serialize down messages and clear timeslots are on unplug (Wayne)

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Tobias Schramm <t.schramm@manjaro.org>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116162856.GA11524@art_vandelay
2020-01-18 12:54:37 +10:00
Linus Torvalds
5ffdff81cf Merge tag 'block-5.5-2020-01-16' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Three fixes that should go into this release:

   - The 32-bit segment size fix that I mentioned last week (Ming)

   - Use uint for the block size (Mikulas)

   - A null_blk zone write handling fix (Damien)"

* tag 'block-5.5-2020-01-16' of git://git.kernel.dk/linux-block:
  block: fix an integer overflow in logical block size
  null_blk: Fix zone write handling
  block: fix get_max_segment_size() overflow on 32bit arch
2020-01-17 05:54:18 -08:00
Guillaume Nault
56f200c78c netns: Constify exported functions
Mark function parameters as 'const' where possible.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-17 13:25:24 +01:00
Florian Fainelli
080bb352fa net: phy: Maintain MDIO device and bus statistics
We maintain global statistics for an entire MDIO bus, as well as broken
down, per MDIO bus address statistics. Given that it is possible for
MDIO devices such as switches to access MDIO bus addresses for which
there is not a mdio_device instance created (therefore not a a
corresponding device directory in sysfs either), we also maintain
per-address statistics under the statistics folder. The layout looks
like this:

/sys/class/mdio_bus/../statistics/
	transfers
	errrors
	writes
	reads
	transfers_<addr>
	errors_<addr>
	writes_<addr>
	reads_<addr>

When a mdio_device instance is registered, a statistics/ folder is
created with the tranfers, errors, writes and reads attributes which
point to the appropriate MDIO bus statistics structure.

Statistics are 64-bit unsigned quantities and maintained through the
u64_stats_sync.h helper functions.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-17 11:12:44 +01:00
Jesper Dangaard Brouer
58aa94f922 devmap: Adjust tracepoint for map-less queue flush
Now that we don't have a reference to a devmap when flushing the device
bulk queue, let's change the the devmap_xmit tracepoint to remote the
map_id and map_index fields entirely. Rearrange the fields so 'drops' and
'sent' stay in the same position in the tracepoint struct, to make it
possible for the xdp_monitor utility to read both the old and the new
format.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/157918768613.1458396.9165902403373826572.stgit@toke.dk
2020-01-16 20:03:34 -08:00
Toke Høiland-Jørgensen
1d233886dd xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths
Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
we can re-use the bulking for the non-map version of the bpf_redirect()
helper. This is a simple matter of having xdp_do_redirect_slow() queue the
frame on the bulk queue instead of sending it out with __bpf_tx_xdp().

Unfortunately we can't make the bpf_redirect() helper return an error if
the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
have a reference to the network namespace of the ingress device at the time
the helper is called. So we have to leave it as-is and keep the device
lookup in xdp_do_redirect_slow().

Since this leaves less reason to have the non-map redirect code in a
separate function, so we get rid of the xdp_do_redirect_slow() function
entirely. This does lose us the tracepoint disambiguation, but fortunately
the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
entry structures. This means both can contain a map index, so we can just
amend the tracepoint definitions so we always emit the xdp_redirect(_err)
tracepoints, but with the map ID only populated if a map is present. This
means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
the definitions around in case someone is still listening for them.

With this change, the performance of the xdp_redirect sample program goes
from 5Mpps to 8.4Mpps (a 68% increase).

Since the flush functions are no longer map-specific, rename the flush()
functions to drop _map from their names. One of the renamed functions is
the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
keep from having to update all drivers, use a #define to keep the old name
working, and only update the virtual drivers in this patch.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
2020-01-16 20:03:34 -08:00
Toke Høiland-Jørgensen
75ccae62cb xdp: Move devmap bulk queue into struct net_device
Commit 96360004b8 ("xdp: Make devmap flush_list common for all map
instances"), changed devmap flushing to be a global operation instead of a
per-map operation. However, the queue structure used for bulking was still
allocated as part of the containing map.

This patch moves the devmap bulk queue into struct net_device. The
motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
which will be changed in a subsequent commit.  To avoid other fields of
struct net_device moving to different cache lines, we also move a couple of
other members around.

We defer the actual allocation of the bulk queue structure until the
NETDEV_REGISTER notification devmap.c. This makes it possible to check for
ndo_xdp_xmit support before allocating the structure, which is not possible
at the time struct net_device is allocated. However, we keep the freeing in
free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.

Because of this change, we lose the reference back to the map that
originated the redirect, so change the tracepoint to always return 0 as the
map ID and index. Otherwise no functional change is intended with this
patch.

After this patch, the relevant part of struct net_device looks like this,
according to pahole:

	/* --- cacheline 14 boundary (896 bytes) --- */
	struct netdev_queue *      _tx __attribute__((__aligned__(64))); /*   896     8 */
	unsigned int               num_tx_queues;        /*   904     4 */
	unsigned int               real_num_tx_queues;   /*   908     4 */
	struct Qdisc *             qdisc;                /*   912     8 */
	unsigned int               tx_queue_len;         /*   920     4 */
	spinlock_t                 tx_global_lock;       /*   924     4 */
	struct xdp_dev_bulk_queue * xdp_bulkq;           /*   928     8 */
	struct xps_dev_maps *      xps_cpus_map;         /*   936     8 */
	struct xps_dev_maps *      xps_rxqs_map;         /*   944     8 */
	struct mini_Qdisc *        miniq_egress;         /*   952     8 */
	/* --- cacheline 15 boundary (960 bytes) --- */
	struct hlist_head  qdisc_hash[16];               /*   960   128 */
	/* --- cacheline 17 boundary (1088 bytes) --- */
	struct timer_list  watchdog_timer;               /*  1088    40 */

	/* XXX last struct has 4 bytes of padding */

	int                        watchdog_timeo;       /*  1128     4 */

	/* XXX 4 bytes hole, try to pack */

	struct list_head   todo_list;                    /*  1136    16 */
	/* --- cacheline 18 boundary (1152 bytes) --- */

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/157918768397.1458396.12673224324627072349.stgit@toke.dk
2020-01-16 20:03:34 -08:00
Linus Torvalds
575966e080 Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
 "I've been sitting on these longer than I meant, so the patch count is
  a bit higher than ideal for this part of the release. There's also
  some reverts of double-applied patches that brings the diffstat up a
  bit.

  With that said, the biggest changes are:

   - Revert of duplicate i2c device addition on two Aspeed (BMC)
     Devicetrees.

   - Move of two device nodes that got applied to the wrong part of the
     tree on ASpeed G6.

   - Regulator fix for Beaglebone X15 (adding 12/5V supplies)

   - Use interrupts for keys on Amlogic SM1 to avoid missed polls

  In addition to that, there is a collection of smaller DT fixes:

   - Power supply assignment fixes for i.MX6

   - Fix of interrupt line for magnetometer on i.MX8 Librem5 devkit

   - Build fixlets (selects) for davinci/omap2+

   - More interrupt number fixes for Stratix10, Amlogic SM1, etc.

   - ... and more similar fixes across different platforms

  And some non-DT stuff:

   - optee fix to register multiple shared pages properly

   - Clock calculation fixes for MMP3

   - Clock fixes for OMAP as well"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits)
  MAINTAINERS: Add myself as the co-maintainer for Actions Semi platforms
  ARM: dts: imx7: Fix Toradex Colibri iMX7S 256MB NAND flash support
  ARM: dts: imx6sll-evk: Remove incorrect power supply assignment
  ARM: dts: imx6sl-evk: Remove incorrect power supply assignment
  ARM: dts: imx6sx-sdb: Remove incorrect power supply assignment
  ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment
  ARM: dts: imx6q-icore-mipi: Use 1.5 version of i.Core MX6DL
  ARM: omap2plus: select RESET_CONTROLLER
  ARM: davinci: select CONFIG_RESET_CONTROLLER
  ARM: dts: aspeed: rainier: Fix fan fault and presence
  ARM: dts: aspeed: rainier: Remove duplicate i2c busses
  ARM: dts: aspeed: tacoma: Remove duplicate flash nodes
  ARM: dts: aspeed: tacoma: Remove duplicate i2c busses
  ARM: dts: aspeed: tacoma: Fix fsi master node
  ARM: dts: aspeed-g6: Fix FSI master location
  ARM: dts: mmp3: Fix the TWSI ranges
  clk: mmp2: Fix the order of timer mux parents
  ARM: mmp: do not divide the clock rate
  arm64: dts: rockchip: Fix IR on Beelink A1
  optee: Fix multi page dynamic shm pool alloc
  ...
2020-01-16 19:42:08 -08:00
Paul Blakey
79cdb0aaea net/mlx5: Allow creating autogroups with reserved entries
Exclude the last n entries for an autogrouped flow table.

Reserving entries at the end of the FT will ensure that this FG will be
the last to be evaluated. This will be used in the next patch to create
a miss group enabling custom actions on FT miss.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 15:48:58 -08:00
Paul Blakey
ff189b4356 net/mlx5: Add ignore level support fwd to table rules
If user sets ignore flow level flag on a rule, that rule can point to
a flow table of any level, including those with levels equal or less
than the level of the flow table it is added on.

This with unamanged tables will be used to create a FDB chain/prio
hierarchy much larger than currently supported level range.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 15:48:58 -08:00
Paul Blakey
5281a0c909 net/mlx5: fs_core: Introduce unmanaged flow tables
Currently, Most of the steering tree is statically declared ahead of time,
with steering prios instances allocated for each fdb chain to assign max
number of levels for each of them. This allows fs_core to manage the
connections and  levels of the flow tables hierarcy to prevent loops, but
restricts us with the number of supported chains and priorities.

Introduce unmananged flow tables, allowing the user to manage the flow
table connections. A unamanged table is detached from the fs_core flow
table hierarcy, and is only connected back to the hierarchy by explicit
FTEs forward actions.

This will be used together with firmware that supports ignoring the flow
table levels to increase the number of supported chains and prios.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 15:48:58 -08:00
Saeed Mahameed
12e9e0d0d9 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
This merge syncs with mlx5-next latest HW bits and layout updates for next
features, in addition one patch that improves
mlx5_create_auto_grouped_flow_table() API across all mlx5 users.

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
  net/mlx5e: Add discard counters per priority
  net/mlx5e: Expose FEC feilds and related capability bit
  net/mlx5: Add mlx5_ifc definitions for connection tracking support
  net/mlx5: Add copy header action struct layout
  net/mlx5: Expose resource dump register mapping
  net/mlx5: Add structures and defines for MIRC register
  net/mlx5: Read MCAM register groups 1 and 2
  net/mlx5: Add structures layout for new MCAM access reg groups
  net/mlx5: Expose vDPA emulation device capabilities
  net/mlx5: Add Virtio Emulation related device capabilities

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 15:48:24 -08:00
Paul Blakey
61dc7b0141 net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param
which already carries the max_fte, prio and flags memebers, and is
used the same in similar mlx5_create_flow_table() function.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 15:41:59 -08:00
Aharon Landau
827a8cb2dd net/mlx5e: Add discard counters per priority
Add counters that count (per priority) the number of received
packets that dropped due to lack of buffers on a physical port. If
this counter is increasing, it implies that the adapter is
congested and cannot absorb the traffic coming from the network.

Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 14:11:33 -08:00
Aya Levin
a58837f52d net/mlx5e: Expose FEC feilds and related capability bit
Introduce 50G per lane FEC modes capability bit and newly supported
fields in PPLM register which allow this configuration.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 14:11:30 -08:00
Paul Blakey
822e114b50 net/mlx5: Add mlx5_ifc definitions for connection tracking support
Add the required hardware definitions to mlx5_ifc:
ignore_flow_level, registers, copy_header, and fwd_and_modify cap.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Sholomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 14:11:28 -08:00
Hamdan Igbaria
31d8bde1c8 net/mlx5: Add copy header action struct layout
Add definition for copy header action, copy action is used
to copy header fields from source to destination.

Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 14:11:26 -08:00
Aya Levin
609b82727f net/mlx5: Expose resource dump register mapping
Add new register enumeration for resource dump. Add layout mapping for
resource dump: access command and response.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 14:11:23 -08:00
Eran Ben Elisha
bab58ba10e net/mlx5: Add structures and defines for MIRC register
Add needed structures, layouts and defines for MIRC (Management Image
Re-activation Control) register. This structure will be used for the FSM
reactivation flow in the downstream patches.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 14:11:21 -08:00
Eran Ben Elisha
932ef15511 net/mlx5: Read MCAM register groups 1 and 2
On load, Driver caches MCAM (Management Capabilities Mask Register)
registers. in addition to the only MCAM register group (0) the driver
already reads, here we add support for reading groups 1 and 2.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 14:11:19 -08:00
Eran Ben Elisha
f397464eb7 net/mlx5: Add structures layout for new MCAM access reg groups
MCAM has 3 access_reg_groups (0-2). Defines data structures in order to
read and parse access_reg_groups #1 and #2.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 14:11:16 -08:00
Jeremy Sowden
567d746b55 netfilter: bitwise: add support for shifts.
Hitherto nft_bitwise has only supported boolean operations: NOT, AND, OR
and XOR.  Extend it to do shifts as well.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16 15:52:02 +01:00
Jeremy Sowden
779f725e14 netfilter: bitwise: add NFTA_BITWISE_DATA attribute.
Add a new bitwise netlink attribute that will be used by shift
operations to store the size of the shift.  It is not used by boolean
operations.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16 15:52:02 +01:00
Jeremy Sowden
9d1f979986 netfilter: bitwise: add NFTA_BITWISE_OP netlink attribute.
Add a new bitwise netlink attribute, NFTA_BITWISE_OP, which is set to a
value of a new enum, nft_bitwise_ops.  It describes the type of
operation an expression contains.  Currently, it only has one value:
NFT_BITWISE_BOOL.  More values will be added later to implement shifts.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16 15:51:57 +01:00