Commit Graph

84087 Commits

Author SHA1 Message Date
Alexander Duyck
b9adcd69bd vxlan: Add new UDP encapsulation offload type for VXLAN-GPE
The fact is VXLAN with Generic Protocol Extensions cannot be supported by
the same hardware parsers that support VXLAN.  The protocol extensions
allow for things like a Next Protocol field which in turn allows for things
other than Ethernet to be passed over the tunnel.  Most existing parsers
will not know how to interpret this.

To resolve this I am giving VXLAN-GPE its own UDP encapsulation offload
type.  This way hardware that does support GPE can simply add this type to
the switch statement for VXLAN, and if they don't support it then this will
fix any issues where headers might be interpreted incorrectly.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 20:23:32 -07:00
Alexander Duyck
1938ee1fd3 net: Remove deprecated tunnel specific UDP offload functions
Now that we have all the drivers using udp_tunnel_get_rx_ports,
ndo_add_udp_enc_rx_port, and ndo_del_udp_enc_rx_port we can drop the
function calls that were specific to VXLAN and GENEVE.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 20:23:32 -07:00
Alexander Duyck
7c46a640de net: Merge VXLAN and GENEVE push notifiers into a single notifier
This patch merges the notifiers for VXLAN and GENEVE into a single UDP
tunnel notifier.  The idea is that we will want to only have to make one
notifier call to receive the list of ports for VXLAN and GENEVE tunnels
that need to be offloaded.

In addition we add a new set of ndo functions named ndo_udp_tunnel_add and
ndo_udp_tunnel_del that are meant to allow us to track the tunnel meta-data
such as port and address family as tunnels are added and removed.  The
tunnel meta-data is now transported in a structure named udp_tunnel_info
which for now carries the type, address family, and port number.  In the
future this could be updated so that we can include a tuple of values
including things such as the destination IP address and other fields.

I also ended up going with a naming scheme that consisted of using the
prefix udp_tunnel on function names.  I applied this to the notifier and
ndo ops as well so that it hopefully points to the fact that these are
primarily used in the udp_tunnel functions.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 20:23:29 -07:00
Alexander Duyck
e7b3db5e60 net: Combine GENEVE and VXLAN port notifiers into single functions
This patch merges the GENEVE and VXLAN code so that both functions pass
through a shared code path.  This way we can start the effort of using a
single function on the network device drivers to handle both of these
tunnel types.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 20:23:29 -07:00
Alexander Duyck
86a9805725 vxlan/geneve: Include udp_tunnel.h in vxlan/geneve.h and fixup includes
This patch makes it so that we add udp_tunnel.h to vxlan.h and geneve.h
header files.  This is useful as I plan to move the generic handlers for
the port offloads into the udp_tunnel header file and leave the vxlan and
geneve headers to be a bit more protocol specific.

I also went through and cleaned out a number of redundant includes that
where in the .h and .c files for these drivers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 20:23:29 -07:00
Vincent Palatin
cecbc5563a net: stmmac: allow to split suspend/resume from init/exit callbacks
Let the stmmac platform drivers provide dedicated suspend and resume
callbacks rather than always re-using the init and exits callbacks.
If the driver does not provide the suspend or resume callback, we fall
back to the old behavior trying to use exit or init.

This allows a specific platform to perform only a partial power-down on
suspend if Wake-on-Lan is enabled but always perform the full shutdown
sequence if the module is unloaded.

Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16 14:14:58 -07:00
Fabien Siron
5fb384b066 netlink: Add comment to warn about deprecated netlink rings attribute request
Signed-off-by: Fabien Siron <fabien.siron@epita.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16 14:00:50 -07:00
Daniel Borkmann
3b1efb196e bpf, maps: flush own entries on perf map release
The behavior of perf event arrays are quite different from all
others as they are tightly coupled to perf event fds, f.e. shown
recently by commit e03e7ee34f ("perf/bpf: Convert perf_event_array
to use struct file") to make refcounting on perf event more robust.
A remaining issue that the current code still has is that since
additions to the perf event array take a reference on the struct
file via perf_event_get() and are only released via fput() (that
cleans up the perf event eventually via perf_event_release_kernel())
when the element is either manually removed from the map from user
space or automatically when the last reference on the perf event
map is dropped. However, this leads us to dangling struct file's
when the map gets pinned after the application owning the perf
event descriptor exits, and since the struct file reference will
in such case only be manually dropped or via pinned file removal,
it leads to the perf event living longer than necessary, consuming
needlessly resources for that time.

Relations between perf event fds and bpf perf event map fds can be
rather complex. F.e. maps can act as demuxers among different perf
event fds that can possibly be owned by different threads and based
on the index selection from the program, events get dispatched to
one of the per-cpu fd endpoints. One perf event fd (or, rather a
per-cpu set of them) can also live in multiple perf event maps at
the same time, listening for events. Also, another requirement is
that perf event fds can get closed from application side after they
have been attached to the perf event map, so that on exit perf event
map will take care of dropping their references eventually. Likewise,
when such maps are pinned, the intended behavior is that a user
application does bpf_obj_get(), puts its fds in there and on exit
when fd is released, they are dropped from the map again, so the map
acts rather as connector endpoint. This also makes perf event maps
inherently different from program arrays as described in more detail
in commit c9da161c65 ("bpf: fix clearing on persistent program
array maps").

To tackle this, map entries are marked by the map struct file that
added the element to the map. And when the last reference to that map
struct file is released from user space, then the tracked entries
are purged from the map. This is okay, because new map struct files
instances resp. frontends to the anon inode are provided via
bpf_map_new_fd() that is called when we invoke bpf_obj_get_user()
for retrieving a pinned map, but also when an initial instance is
created via map_create(). The rest is resolved by the vfs layer
automatically for us by keeping reference count on the map's struct
file. Any concurrent updates on the map slot are fine as well, it
just means that perf_event_fd_array_release() needs to delete less
of its own entires.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 23:42:57 -07:00
Daniel Borkmann
d056a78876 bpf, maps: extend map_fd_get_ptr arguments
This patch extends map_fd_get_ptr() callback that is used by fd array
maps, so that struct file pointer from the related map can be passed
in. It's safe to remove map_update_elem() callback for the two maps since
this is only allowed from syscall side, but not from eBPF programs for these
two map types. Like in per-cpu map case, bpf_fd_array_map_update_elem()
needs to be called directly here due to the extra argument.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 23:42:57 -07:00
Daniel Borkmann
61d1b6a42f bpf, maps: add release callback
Add a release callback for maps that is invoked when the last
reference to its struct file is gone and the struct file about
to be released by vfs. The handler will be used by fd array maps.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 23:42:57 -07:00
Philip Prindeville
22a59be8b7 net: ipv4: Add ability to have GRE ignore DF bit in IPv4 payloads
In the presence of firewalls which improperly block ICMP Unreachable
    (including Fragmentation Required) messages, Path MTU Discovery is
    prevented from working.

    A workaround is to handle IPv4 payloads opaquely, ignoring the DF bit--as
    is done for other payloads like AppleTalk--and doing transparent
    fragmentation and reassembly.

    Redux includes the enforcement of mutual exclusion between this feature
    and Path MTU Discovery as suggested by Alexander Duyck.

    Cc: Alexander Duyck <alexander.duyck@gmail.com>
    Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
    Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 21:39:59 -07:00
David S. Miller
042ce72230 Merge tag 'shared' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma
Mellanox shared code between RDMA and net-next trees

This is Mellanox mlx5_core shared code for both net-next and RDMA
trees for 4.8 kernel cycle.
2016-06-15 21:37:10 -07:00
Alexander Aring
bbe5f5cefe 6lowpan: introduce 6lowpan-nd
This patch introduce different 6lowpan handling for receive and transmit
NS/NA messages for the ipv6 neighbour discovery. The first use-case is
for supporting 802.15.4 short addresses inside the option fields and
handling for RFC6775 6CO option field as userspace option.

Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 20:41:23 -07:00
Alexander Aring
cc84b3c6b4 ipv6: export several functions
This patch exports some neighbour discovery functions which can be used
by 6lowpan neighbour discovery ops functionality then.

Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 20:41:23 -07:00
Alexander Aring
f997c55c1d ipv6: introduce neighbour discovery ops
This patch introduces neighbour discovery ops callback structure. The
idea is to separate the handling for 6LoWPAN into the 6lowpan module.

These callback offers 6lowpan different handling, such as 802.15.4 short
address handling or RFC6775 (Neighbor Discovery Optimization for IPv6
over 6LoWPANs).

Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 20:41:23 -07:00
Alexander Aring
4f36ce84c5 ndisc: add __ndisc_opt_addr_data function
This patch adds __ndisc_opt_addr_data as low-level function for
ndisc_opt_addr_data which doesn't depend on net_device parameter.

Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 20:41:22 -07:00
Alexander Aring
1e82f961ac ndisc: add __ndisc_opt_addr_space function
This patch adds __ndisc_opt_addr_space as low-level function for
ndisc_opt_addr_space which doesn't depend on net_device parameter.

Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 20:41:22 -07:00
Alexander Aring
2ad3ed5919 6lowpan: add 802.15.4 short addr slaac
This patch adds the autoconfiguration if a valid 802.15.4 short address
is available for 802.15.4 6LoWPAN interfaces.

Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 20:41:22 -07:00
Alexander Aring
8626a0c83b 6lowpan: add private neighbour data
This patch will introduce a 6lowpan neighbour private data. Like the
interface private data we handle private data for generic 6lowpan and
for link-layer specific 6lowpan.

The current first use case if to save the short address for a 802.15.4
6lowpan neighbour.

Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 20:41:22 -07:00
Eric Dumazet
1b5c5493e3 net_sched: add the ability to defer skb freeing
qdisc are changed under RTNL protection and often
while blocking BH and root qdisc spinlock.

When lots of skbs need to be dropped, we free
them under these locks causing TX/RX freezes,
and more generally latency spikes.

This commit adds rtnl_kfree_skbs(), used to queue
skbs for deferred freeing.

Actual freeing happens right after RTNL is released,
with appropriate scheduling points.

rtnl_qdisc_drop() can also be used in place
of disc_drop() when RTNL is held.

qdisc_reset_queue() and __qdisc_reset_queue() get
the new behavior, so standard qdiscs like pfifo, pfifo_fast...
have their ->reset() method automatically handled.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 14:08:34 -07:00
Michael S. Tsirkin
7d7072e3ba skb_array: resize support
Update skb_array after ptr_ring API changes.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 13:58:27 -07:00
Michael S. Tsirkin
5d49de5320 ptr_ring: resize support
This adds ring resize support. Seems to be necessary as
users such as tun allow userspace control over queue size.

If resize is used, this costs us ability to peek at queue without
consumer lock - should not be a big deal as peek and consumer are
usually run on the same CPU.

If ring is made bigger, ring contents is preserved.  If ring is made
smaller, extra pointers are passed to an optional destructor callback.

Cleanup function also gains destructor callback such that
all pointers in queue can be cleaned up.

This changes some APIs but we don't have any users yet,
so it won't break bisect.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 13:58:27 -07:00
Michael S. Tsirkin
ad69f35d1d skb_array: array based FIFO for skbs
A simple array based FIFO of pointers.  Intended for net stack so uses
skbs for type safety. Implemented as a set of wrappers around ptr_ring.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 13:58:27 -07:00
Michael S. Tsirkin
2e0ab8ca83 ptr_ring: array based FIFO for pointers
A simple array based FIFO of pointers.  Intended for net stack which
commonly has a single consumer/producer.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 13:57:21 -07:00
WANG Cong
b2313077ed net_sched: make tcf_hash_check() boolean
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 12:43:35 -07:00
David Ahern
9ff7438460 net: vrf: Handle ipv6 multicast and link-local addresses
IPv6 multicast and link-local addresses require special handling by the
VRF driver:
1. Rather than using the VRF device index and full FIB lookups,
   packets to/from these addresses should use direct FIB lookups based on
   the VRF device table.

2. fail sends/receives on a VRF device to/from a multicast address
   (e.g, make ping6 ff02::1%<vrf> fail)

3. move the setting of the flow oif to the first dst lookup and revert
   the change in icmpv6_echo_reply made in ca254490c8 ("net: Add VRF
   support to IPv6 stack"). Linklocal/mcast addresses require use of the
   skb->dev.

With this change connections into and out of a VRF enslaved device work
for multicast and link-local addresses work (icmp, tcp, and udp)
e.g.,

1. packets into VM with VRF config:
    ping6 -c3 fe80::e0:f9ff:fe1c:b974%br1
    ping6 -c3 ff02::1%br1

    ssh -6 fe80::e0:f9ff:fe1c:b974%br1

2. packets going out a VRF enslaved device:
    ping6 -c3 fe80::18f8:83ff:fe4b:7a2e%eth1
    ping6 -c3 ff02::1%eth1
    ssh -6 root@fe80::18f8:83ff:fe4b:7a2e%eth1

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 12:34:34 -07:00
David Ahern
cd2a9e62c8 net: l3mdev: Remove const from flowi6 arg to get_rt6_dst
Allow drivers to pass flow arg to functions where the arg is not const
and allow the driver to make updates as needed (eg., setting oif).

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 12:34:34 -07:00
Florian Westphal
99860208bc sched: remove NET_XMIT_POLICED
sch_atm returns this when TC_ACT_SHOT classification occurs.

But all other schedulers that use tc_classify
(htb, hfsc, drr, fq_codel ...) return NET_XMIT_SUCCESS | __BYPASS
in this case so just do that in atm.

BATMAN uses it as an intermediate return value to signal
forwarding vs. buffering, but it did not return POLICED to
callers outside of BATMAN.

Reviewed-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-12 22:02:11 -04:00
Eric Dumazet
45f50bed1d net_sched: remove generic throttled management
__QDISC_STATE_THROTTLED bit manipulation is rather expensive
for HTB and few others.

I already removed it for sch_fq in commit f2600cf02b
("net: sched: avoid costly atomic operation in fq_dequeue()")
and so far nobody complained.

When one ore more packets are stuck in one or more throttled
HTB class, a htb dequeue() performs two atomic operations
to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc
lock is held.

Removing this pair of atomic operations bring me a 8 % performance
increase on 200 TCP_RR tests, in presence of throttled classes.

This patch has no side effect, since nothing actually uses
disc_is_throttled() anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 23:58:21 -07:00
Pramod Kumar
f20e6657a8 mdio: mux: Enhanced MDIO mux framework for integrated multiplexers
An integrated multiplexer uses same address space for
"muxed bus selection" and "generation of mdio transaction"
hence its good to register parent bus from mux driver.

Hence added a mechanism where mux driver could register a
parent bus and pass it down to framework via mdio_mux_init api.

Signed-off-by: Pramod Kumar <pramod.kumar@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 23:24:53 -07:00
David S. Miller
d6cf3a85b4 Merge tag 'mac80211-next-for-davem-2016-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
For the next cycle, we have the following:
 * the biggest change is Michał's work on integrating FQ/codel
   with the mac80211 internal software queues
 * cfg80211 connect result gets clarified for the
   "no connection at all" case
 * advertisement of per-interface type capabilities, in case
   they differ (which makes a lot of sense for some capabilities)
 * most of the nl80211 & hwsim unprivileged namespace operation
   changes
 * human-readable VHT capabilities in debugfs
 * some other cleanups, like spelling
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 23:13:32 -07:00
Zi Shen Lim
002245cc64 bpf: fix missing header inclusion
Commit 0fc174dea5 ("ebpf: make internal bpf API independent of
CONFIG_BPF_SYSCALL ifdefs") introduced usage of ERR_PTR() in
bpf_prog_get(), however did not include linux/err.h.

Without this patch, when compiling arm64 BPF without CONFIG_BPF_SYSCALL:
...
In file included from arch/arm64/net/bpf_jit_comp.c:21:0:
include/linux/bpf.h: In function 'bpf_prog_get':
include/linux/bpf.h:235:9: error: implicit declaration of function 'ERR_PTR' [-Werror=implicit-function-declaration]
  return ERR_PTR(-EOPNOTSUPP);
         ^
include/linux/bpf.h:235:9: warning: return makes pointer from integer without a cast [-Wint-conversion]
In file included from include/linux/rwsem.h:17:0,
                 from include/linux/mm_types.h:10,
                 from include/linux/sched.h:27,
                 from arch/arm64/include/asm/compat.h:25,
                 from arch/arm64/include/asm/stat.h:23,
                 from include/linux/stat.h:5,
                 from include/linux/compat.h:12,
                 from include/linux/filter.h:10,
                 from arch/arm64/net/bpf_jit_comp.c:22:
include/linux/err.h: At top level:
include/linux/err.h:23:35: error: conflicting types for 'ERR_PTR'
 static inline void * __must_check ERR_PTR(long error)
                                   ^
In file included from arch/arm64/net/bpf_jit_comp.c:21:0:
include/linux/bpf.h:235:9: note: previous implicit declaration of 'ERR_PTR' was here
  return ERR_PTR(-EOPNOTSUPP);
         ^
...

Fixes: 0fc174dea5 ("ebpf: make internal bpf API independent of CONFIG_BPF_SYSCALL ifdefs")
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 23:11:49 -07:00
Lawrence Brakmo
6f094b9ec6 tcp: add in_flight to tcp_skb_cb
Add in_flight (bytes in flight when packet was sent) field
to tx component of tcp_skb_cb and make it available to
congestion modules' pkts_acked() function through the
ack_sample function argument.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 23:07:49 -07:00
Mike Rapoport
fd2a0437dc virtio_net: introduce virtio_net_hdr_{from,to}_skb
The code for conversion between virtio_net_hdr and skb GSO info is
duplicated at several places. Let's put it to a common place to allow
reuse.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 23:03:55 -07:00
Mike Rapoport
678ece3022 virtio_net: add _UAPI prefix to virtio_net header guards
This gives better namespacing and prevents conflicts with no-uapi version
of virtio_net header that will be introduced in the following patch.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 23:03:55 -07:00
Daniel Borkmann
a70b506efe bpf: enforce recursion limit on redirects
Respect the stack's xmit_recursion limit for calls into dev_queue_xmit().
Currently, they are not handeled by the limiter when attached to clsact's
egress parent, for example, and a buggy program redirecting it to the
same device again could run into stack overflow eventually. It would be
good if we could notify an admin to give him a chance to react. We reuse
xmit_recursion instead of having one private to eBPF, so that the stack's
current recursion depth will be taken into account as well. Follow-up to
commit 3896d655f4 ("bpf: introduce bpf_clone_redirect() helper") and
27b29f6305 ("bpf: add bpf_redirect() helper").

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 18:00:57 -07:00
William Tu
f2a4d086ed openvswitch: Add packet truncation support.
The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to
truncate packets. A 'max_len' is added for setting up the maximum
packet size, and a 'cutlen' field is to record the number of bytes
to trim the packet when the packet is outputting to a port, or when
the packet is sent to userspace.

Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 17:58:03 -07:00
David S. Miller
1578b0a5e9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/sched/act_police.c
	net/sched/sch_drr.c
	net/sched/sch_hfsc.c
	net/sched/sch_prio.c
	net/sched/sch_red.c
	net/sched/sch_tbf.c

In net-next the drop methods of the packet schedulers got removed, so
the bug fixes to them in 'net' are irrelevant.

A packet action unload crash fix conflicts with the addition of the
new firstuse timestamp.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 11:52:24 -07:00
Linus Torvalds
698ea54dde Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) nfnetlink timestamp taken from wrong skb, fix from Florian Westphal.

 2) Revert some msleep conversions in rtlwifi as these spots are in
    atomic context, from Larry Finger.

 3) Validate that NFTA_SET_TABLE attribute is actually specified when we
    call nf_tables_getset().  From Phil Turnbull.

 4) Don't do mdio_reset in stmmac driver with spinlock held as that can
    sleep, from Vincent Palatin.

 5) sk_filter() does things other than run a BPF filter, so we should
    not elide it's call just because sk->sk_filter is NULL.  Fix from
    Eric Dumazet.

 6) Fix missing backlog updates in several packet schedulers, from Cong
    Wang.

 7) bnx2x driver should allow VLAN add/remove while the interface is
    down, from Michal Schmidt.

 8) Several RDS/TCP race fixes from Sowmini Varadhan.

 9) fq_codel scheduler doesn't return correct queue length in dumps,
    from Eric Dumazet.

10) Fix TCP stats for tail loss probe and early retransmit in ipv6, from
    Yuchung Cheng.

11) Properly initialize udp_tunnel_socket_cfg in l2tp_tunnel_create(),
    from Guillaume Nault.

12) qfq scheduler leaks SKBs if a kzalloc fails, fix from Florian
    Westphal.

13) sock_fprog passed into PACKET_FANOUT_DATA needs compat handling,
    from Willem de Bruijn.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits)
  vmxnet3: segCnt can be 1 for LRO packets
  packet: compat support for sock_fprog
  stmmac: fix parameter to dwmac4_set_umac_addr()
  net/mlx5e: Fix blue flame quota logic
  net/mlx5e: Use ndo_stop explicitly at shutdown flow
  net/mlx5: E-Switch, always set mc_promisc for allmulti vports
  net/mlx5: E-Switch, Modify node guid on vf set MAC
  net/mlx5: E-Switch, Fix vport enable flow
  net/mlx5: E-Switch, Use the correct error check on returned pointers
  net/mlx5: E-Switch, Use the correct free() function
  net/mlx5: Fix E-Switch flow steering capabilities check
  net/mlx5: Fix flow steering NIC capabilities check
  net/mlx5: Fix root flow table update
  net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly
  net/mlx5: Fix masking of reserved bits in XRCD number
  net/mlx5: Fix the size of modify QP mailbox
  mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name()
  mlxsw: spectrum: Make split flow match firmware requirements
  wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel
  cfg80211: remove get/set antenna and tx power warnings
  ...
2016-06-10 08:32:24 -07:00
Linus Torvalds
729d378479 Merge tag 'sound-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "We have only few, mainly HD-audio device-specific fixes.  Realtek
  codec driver got a slightly more LOC, but they are all for the new
  codec chip, and won't affect others at all"

* tag 'sound-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Add PCI ID for Kabylake
  ALSA: hda/realtek: Add T560 docking unit fixup
  ALSA: hda - Fix headset mic detection problem for Dell machine
  ALSA: uapi: Add three missing header files to Kbuild file
  ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703
  ALSA: hda/realtek - ALC256 speaker noise issue
2016-06-10 08:27:30 -07:00
Linus Torvalds
524a3f2ca2 Merge tag 'pm-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "Stable-candidate fixes for the intel_pstate driver and the cpuidle
  core.

  Specifics:

   - Fix two intel_pstate initialization issues, one of which was
     introduced during the 4.4 cycle (Srinivas Pandruvada)

   - Fix kernel build with CONFIG_UBSAN set and CONFIG_CPU_IDLE unset
     (Catalin Marinas)"

* tag 'pm-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo
  cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()
  cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE
2016-06-10 08:09:12 -07:00
Saeed Mahameed
7486216b3a {net,IB}/mlx5: mlx5_ifc updates
Introducing mlx5_ifc updates for upcoming ConnectX-4 features.

Needed bits and hardware structures for mlx5e netdev:
	- MLX5_CQ_PERIOD_NUM_MODES for adaptive moderation
	  support
	- QoS rate limiting
	- SQ context rate limiting
	- Auto negotiation fields in PTYS register
	- Source SQN field in flow table entry match structure
	- DCBX parameters

Needed bits and hardware structures for IB:
	- New XRQ opcodes, commands and capabilities layout
	- Extend q counters definition to support IB.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-06-10 13:29:14 +03:00
Willem de Bruijn
719c44d340 packet: compat support for sock_fprog
Socket option PACKET_FANOUT_DATA takes a struct sock_fprog as argument
if PACKET_FANOUT has mode PACKET_FANOUT_CBPF. This structure contains
a pointer into user memory. If userland is 32-bit and kernel is 64-bit
the two disagree about the layout of struct sock_fprog.

Add compat setsockopt support to convert a 32-bit compat_sock_fprog to
a 64-bit sock_fprog. This is analogous to compat_sock_fprog support for
SO_REUSEPORT added in commit 1957598840 ("soreuseport: add compat
case for setsockopt SO_ATTACH_REUSEPORT_CBPF").

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09 23:41:03 -07:00
Aaron Conole
7d84e37e11 virtio_net: Update the feature bit to comply with spec
A draft version of the MTU Advice feature bit was specified as 25.  This
bit is not within the allowed range for network device feature bits, and
should be changed to be feature bit 3 to fully comply with the spec.

Fixes 14de9d114a ('virtio-net: Add initial MTU advice feature')
Signed-off-by: Aaron Conole <aconole@redhat.com>
Suggested-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09 23:35:26 -07:00
David Ahern
e434863718 net: vrf: Fix crash when IPv6 is disabled at boot time
Frank Kellermann reported a kernel crash with 4.5.0 when IPv6 is
disabled at boot using the kernel option ipv6.disable=1. Using
current net-next with the boot option:

$ ip link add red type vrf table 1001

Generates:
[12210.919584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000748
[12210.921341] IP: [<ffffffff814b30e3>] fib6_get_table+0x2c/0x5a
[12210.922537] PGD b79e3067 PUD bb32b067 PMD 0
[12210.923479] Oops: 0000 [#1] SMP
[12210.924001] Modules linked in: ipvlan 8021q garp mrp stp llc
[12210.925130] CPU: 3 PID: 1177 Comm: ip Not tainted 4.7.0-rc1+ #235
[12210.926168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[12210.928065] task: ffff8800b9ac4640 ti: ffff8800bacac000 task.ti: ffff8800bacac000
[12210.929328] RIP: 0010:[<ffffffff814b30e3>]  [<ffffffff814b30e3>] fib6_get_table+0x2c/0x5a
[12210.930697] RSP: 0018:ffff8800bacaf888  EFLAGS: 00010202
[12210.931563] RAX: 0000000000000748 RBX: ffffffff81a9e280 RCX: ffff8800b9ac4e28
[12210.932688] RDX: 00000000000000e9 RSI: 0000000000000002 RDI: 0000000000000286
[12210.933820] RBP: ffff8800bacaf898 R08: ffff8800b9ac4df0 R09: 000000000052001b
[12210.934941] R10: 00000000657c0000 R11: 000000000000c649 R12: 00000000000003e9
[12210.936032] R13: 00000000000003e9 R14: ffff8800bace7800 R15: ffff8800bb3ec000
[12210.937103] FS:  00007faa1766c700(0000) GS:ffff88013ac00000(0000) knlGS:0000000000000000
[12210.938321] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12210.939166] CR2: 0000000000000748 CR3: 00000000b79d6000 CR4: 00000000000406e0
[12210.940278] Stack:
[12210.940603]  ffff8800bb3ec000 ffffffff81a9e280 ffff8800bacaf8c8 ffffffff814b3135
[12210.941818]  ffff8800bb3ec000 ffffffff81a9e280 ffffffff81a9e280 ffff8800bace7800
[12210.943040]  ffff8800bacaf8f0 ffffffff81397c88 ffff8800bb3ec000 ffffffff81a9e280
[12210.944288] Call Trace:
[12210.944688]  [<ffffffff814b3135>] fib6_new_table+0x24/0x8a
[12210.945516]  [<ffffffff81397c88>] vrf_dev_init+0xd4/0x162
[12210.946328]  [<ffffffff814091e1>] register_netdevice+0x100/0x396
[12210.947209]  [<ffffffff8139823d>] vrf_newlink+0x40/0xb3
[12210.948001]  [<ffffffff814187f0>] rtnl_newlink+0x5d3/0x6d5
...

The problem above is due to the fact that the fib hash table is not
allocated when IPv6 is disabled at boot.

As for the VRF driver it should not do any IPv6 initializations if IPv6
is disabled, so it needs to know if IPv6 is disabled at boot. The disable
parameter is private to the IPv6 module, so provide an accessor for
modules to determine if IPv6 was disabled at boot time.

Fixes: 35402e3136 ("net: Add IPv6 support to VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09 23:34:42 -07:00
David Howells
2341e07757 rxrpc: Simplify connect() implementation and simplify sendmsg() op
Simplify the RxRPC connect() implementation.  It will just note the
destination address it is given, and if a sendmsg() comes along with no
address, this will be assigned as the address.  No transport struct will be
held internally, which will allow us to remove this later.

Simplify sendmsg() also.  Whilst a call is active, userspace refers to it
by a private unique user ID specified in a control message.  When sendmsg()
sees a user ID that doesn't map to an extant call, it creates a new call
for that user ID and attempts to add it.  If, when we try to add it, the
user ID is now registered, we now reject the message with -EEXIST.  We
should never see this situation unless two threads are racing, trying to
create a call with the same ID - which would be an error.

It also isn't required to provide sendmsg() with an address - provided the
control message data holds a user ID that maps to a currently active call.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09 23:30:12 -07:00
Florian Fainelli
967dd82ffc net: dsa: b53: Add support for Broadcom RoboSwitch
This patch adds support for Broadcom's BCM53xx switch family, also known
as RoboSwitch. Some of these switches are ubiquituous, found in home
routers, Wi-Fi routers, DSL and cable modem gateways and other
networking related products.

This drivers adds the library driver (b53_common.c) as well as a few bus
glue drivers for MDIO, SPI, Switch Register Access Block (SRAB) and
memory-mapped I/O into a SoC's address space (Broadcom BCM63xx/33xx).

Basic operations are supported to bring the Layer 1/2 up and running,
but not much more at this point, subsequent patches add the remaining
features.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09 22:21:29 -07:00
Noa Osherovich
23898c763f net/mlx5: E-Switch, Modify node guid on vf set MAC
In RoCE, the RDMA-CM needs the node guid to establish connection
between nodes.
Today, the node guid exposed to mlx5 Ethernet VFs is zero, therefore
RDMA-CM on the VF is broken.

Whenever the administrator sets a MAC for a VF, derive the node guid
from it and set it as well in the following way:
MAC: e4:1d:2d:b3:f4:01 -> node_guid: e4:1d:2d:ff:fe:b3:f4:01

Fixes: 77256579c6 ('net/mlx5: E-Switch, Introduce Vport...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09 22:06:26 -07:00
Maor Gottlieb
876d634d19 net/mlx5: Fix flow steering NIC capabilities check
Flow steering infrastructure is currently used only on link layer
ethernet, therefore the driver should initialize the flow steering
when the device link layer is ethernet.

In addition, add missing capability check before initializing the
namespace of NIC RX flow tables.

Fixes: 2530236303 ('net/mlx5_core: Flow steering tree initialization')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09 22:06:25 -07:00
Shahar Klein
86d56a1a6b net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly
Having MLX5_CMD_OP_MAX on another file causes us to repeatedly miss
accounting new commands added to the driver and hence there're no entries
for them in debugfs. To solve that, we integrate it into the commands enum
as the last entry.

Fixes: 34a40e6893 ('net/mlx5_core: Introduce modify flow table command')
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09 22:06:25 -07:00