Commit Graph

89270 Commits

Author SHA1 Message Date
David Ahern
24045a03b8 net: mpls: Add support for netconf
Add netconf support to MPLS. Allows userpsace to learn and be notified
of changes to 'input' enable setting per interface.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20 11:13:37 -05:00
Michael S. Tsirkin
e716953071 ptr_ring: fix race conditions when resizing
Resizing currently drops consumer lock.  This can cause entries to be
reordered, which isn't good in itself.  More importantly, consumer can
detect a false ring empty condition and block forever.

Further, nesting of consumer within producer lock is problematic for
tun, since it produces entries in a BH, which causes a lock order
reversal:

       CPU0                    CPU1
       ----                    ----
  consume:
  lock(&(&r->consumer_lock)->rlock);
                               resize:
                               local_irq_disable();
                               lock(&(&r->producer_lock)->rlock);
                               lock(&(&r->consumer_lock)->rlock);
  <Interrupt>
  produce:
  lock(&(&r->producer_lock)->rlock);

To fix, nest producer lock within consumer lock during resize,
and keep consumer lock during the whole swap operation.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20 10:27:56 -05:00
Xin Long
4ea0c32f5f sctp: add support for MSG_MORE
This patch is to add support for MSG_MORE on sctp.

It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
creating datamsg according to the send flag. sctp_packet_can_append_data
then uses it to decide if the chunks of this msg will be sent at once or
delay it.

Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
in assoc. As sctp enqueues the chunks first, then dequeue them one by
one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
affect other chunks' bundling.

Since last patch, sctp flush out queue once assoc state falls into
SHUTDOWN_PENDING, the close block problem mentioned in [1] has been
solved as well.

[1] https://patchwork.ozlabs.org/patch/372404/

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20 10:26:09 -05:00
Florian Fainelli
25149ef9d2 net: phy: Check phydev->drv
There are number of function calls, originating from user-space,
typically through the Ethernet driver that can make us crash by
dereferencing phydev->drv which will be NULL once we unbind the driver
from the PHY.

There are still functional issues that prevent an unbind then rebind to
work, but these will be addressed separately.

Suggested-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20 10:15:11 -05:00
Xin Long
a4d69a4c3c sctp: sctp_transport_dst_check should check if transport pmtu is dst mtu
Now when sending a packet, sctp_transport_dst_check will check if dst
is obsolete by calling ipv4/ip6_dst_check. But they return obsolete
only when adding a new cache, after that when the cache's pmtu is
updated again, it will not trigger transport->dst/pmtu's update.

It can be reproduced by reducing route's pmtu twice. At the 1st time
client will add a new cache, and transport->pathmtu gets updated as
sctp_transport_dst_check finds it's obsolete. But at the 2nd time,
cache's mtu is updated, sctp client will never send out any packet,
because transport->pmtu has no chance to update.

This patch is to fix this by also checking if transport pmtu is dst
mtu in sctp_transport_dst_check, so that transport->pmtu can be
updated on time.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:19:37 -05:00
Xin Long
d884aa635b sctp: add reconf chunk event
This patch is to add reconf chunk event based on the sctp event
frame in rx path, it will call sctp_sf_do_reconf to process the
reconf chunk.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:17:59 -05:00
Xin Long
2040d3d7a3 sctp: add reconf chunk process
This patch is to add a function to process the incoming reconf chunk,
in which it verifies the chunk, and traverses the param and process
it with the right function one by one.

sctp_sf_do_reconf would be the process function of reconf chunk event.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:17:59 -05:00
Xin Long
ea62504373 sctp: add a function to verify the sctp reconf chunk
This patch is to add a function sctp_verify_reconf to do some length
check and multi-params check for sctp stream reconf according to rfc6525
section 3.1.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:17:59 -05:00
Xin Long
16e1a91965 sctp: implement receiver-side procedures for the Incoming SSN Reset Request Parameter
This patch is to implement Receiver-Side Procedures for the Incoming
SSN Reset Request Parameter described in rfc6525 section 5.2.3.

It's also to move str_list endian conversion out of sctp_make_strreset_req,
so that sctp_make_strreset_req can be used more conveniently to process
inreq.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:17:59 -05:00
Xin Long
8105447645 sctp: implement receiver-side procedures for the Outgoing SSN Reset Request Parameter
This patch is to implement Receiver-Side Procedures for the Outgoing
SSN Reset Request Parameter described in rfc6525 section 5.2.2.

Note that some checks must be after request_seq check, as even those
checks fail, strreset_inseq still has to be increase by 1.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:17:59 -05:00
Xin Long
35ea82d611 sctp: add support for generating stream ssn reset event notification
This patch is to add Stream Reset Event described in rfc6525
section 6.1.1.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:17:59 -05:00
Xin Long
bd4b9f8b4a sctp: add support for generating stream reconf resp chunk
This patch is to define Re-configuration Response Parameter described
in rfc6525 section 4.4. As optional fields are only for SSN/TSN Reset
Request Parameter, it uses another function to make that.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:17:59 -05:00
Dmitry V. Levin
1786dbf370 uapi: fix linux/rds.h userspace compilation error
On the kernel side, sockaddr_storage is #define'd to
__kernel_sockaddr_storage.  Replacing struct sockaddr_storage with
struct __kernel_sockaddr_storage defined by <linux/socket.h> fixes
the following linux/rds.h userspace compilation error:

/usr/include/linux/rds.h:226:26: error: field 'dest_addr' has incomplete type
  struct sockaddr_storage dest_addr;

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:15:23 -05:00
Dmitry V. Levin
feb0869d90 uapi: fix linux/rds.h userspace compilation errors
Consistently use types from linux/types.h to fix the following
linux/rds.h userspace compilation errors:

/usr/include/linux/rds.h:106:2: error: unknown type name 'uint8_t'
  uint8_t name[32];
/usr/include/linux/rds.h:107:2: error: unknown type name 'uint64_t'
  uint64_t value;
/usr/include/linux/rds.h:117:2: error: unknown type name 'uint64_t'
  uint64_t next_tx_seq;
/usr/include/linux/rds.h:118:2: error: unknown type name 'uint64_t'
  uint64_t next_rx_seq;
/usr/include/linux/rds.h:121:2: error: unknown type name 'uint8_t'
  uint8_t transport[TRANSNAMSIZ];  /* null term ascii */
/usr/include/linux/rds.h:122:2: error: unknown type name 'uint8_t'
  uint8_t flags;
/usr/include/linux/rds.h:129:2: error: unknown type name 'uint64_t'
  uint64_t seq;
/usr/include/linux/rds.h:130:2: error: unknown type name 'uint32_t'
  uint32_t len;
/usr/include/linux/rds.h:135:2: error: unknown type name 'uint8_t'
  uint8_t flags;
/usr/include/linux/rds.h:139:2: error: unknown type name 'uint32_t'
  uint32_t sndbuf;
/usr/include/linux/rds.h:144:2: error: unknown type name 'uint32_t'
  uint32_t rcvbuf;
/usr/include/linux/rds.h:145:2: error: unknown type name 'uint64_t'
  uint64_t inum;
/usr/include/linux/rds.h:153:2: error: unknown type name 'uint64_t'
  uint64_t       hdr_rem;
/usr/include/linux/rds.h:154:2: error: unknown type name 'uint64_t'
  uint64_t       data_rem;
/usr/include/linux/rds.h:155:2: error: unknown type name 'uint32_t'
  uint32_t       last_sent_nxt;
/usr/include/linux/rds.h:156:2: error: unknown type name 'uint32_t'
  uint32_t       last_expected_una;
/usr/include/linux/rds.h:157:2: error: unknown type name 'uint32_t'
  uint32_t       last_seen_una;
/usr/include/linux/rds.h:164:2: error: unknown type name 'uint8_t'
  uint8_t  src_gid[RDS_IB_GID_LEN];
/usr/include/linux/rds.h:165:2: error: unknown type name 'uint8_t'
  uint8_t  dst_gid[RDS_IB_GID_LEN];
/usr/include/linux/rds.h:167:2: error: unknown type name 'uint32_t'
  uint32_t max_send_wr;
/usr/include/linux/rds.h:168:2: error: unknown type name 'uint32_t'
  uint32_t max_recv_wr;
/usr/include/linux/rds.h:169:2: error: unknown type name 'uint32_t'
  uint32_t max_send_sge;
/usr/include/linux/rds.h:170:2: error: unknown type name 'uint32_t'
  uint32_t rdma_mr_max;
/usr/include/linux/rds.h:171:2: error: unknown type name 'uint32_t'
  uint32_t rdma_mr_size;
/usr/include/linux/rds.h:212:9: error: unknown type name 'uint64_t'
 typedef uint64_t rds_rdma_cookie_t;
/usr/include/linux/rds.h:215:2: error: unknown type name 'uint64_t'
  uint64_t addr;
/usr/include/linux/rds.h:216:2: error: unknown type name 'uint64_t'
  uint64_t bytes;
/usr/include/linux/rds.h:221:2: error: unknown type name 'uint64_t'
  uint64_t cookie_addr;
/usr/include/linux/rds.h:222:2: error: unknown type name 'uint64_t'
  uint64_t flags;
/usr/include/linux/rds.h:228:2: error: unknown type name 'uint64_t'
  uint64_t  cookie_addr;
/usr/include/linux/rds.h:229:2: error: unknown type name 'uint64_t'
  uint64_t  flags;
/usr/include/linux/rds.h:234:2: error: unknown type name 'uint64_t'
  uint64_t flags;
/usr/include/linux/rds.h:240:2: error: unknown type name 'uint64_t'
  uint64_t local_vec_addr;
/usr/include/linux/rds.h:241:2: error: unknown type name 'uint64_t'
  uint64_t nr_local;
/usr/include/linux/rds.h:242:2: error: unknown type name 'uint64_t'
  uint64_t flags;
/usr/include/linux/rds.h:243:2: error: unknown type name 'uint64_t'
  uint64_t user_token;
/usr/include/linux/rds.h:248:2: error: unknown type name 'uint64_t'
  uint64_t  local_addr;
/usr/include/linux/rds.h:249:2: error: unknown type name 'uint64_t'
  uint64_t  remote_addr;
/usr/include/linux/rds.h:252:4: error: unknown type name 'uint64_t'
    uint64_t compare;
/usr/include/linux/rds.h:253:4: error: unknown type name 'uint64_t'
    uint64_t swap;
/usr/include/linux/rds.h:256:4: error: unknown type name 'uint64_t'
    uint64_t add;
/usr/include/linux/rds.h:259:4: error: unknown type name 'uint64_t'
    uint64_t compare;
/usr/include/linux/rds.h:260:4: error: unknown type name 'uint64_t'
    uint64_t swap;
/usr/include/linux/rds.h:261:4: error: unknown type name 'uint64_t'
    uint64_t compare_mask;
/usr/include/linux/rds.h:262:4: error: unknown type name 'uint64_t'
    uint64_t swap_mask;
/usr/include/linux/rds.h:265:4: error: unknown type name 'uint64_t'
    uint64_t add;
/usr/include/linux/rds.h:266:4: error: unknown type name 'uint64_t'
    uint64_t nocarry_mask;
/usr/include/linux/rds.h:269:2: error: unknown type name 'uint64_t'
  uint64_t flags;
/usr/include/linux/rds.h:270:2: error: unknown type name 'uint64_t'
  uint64_t user_token;
/usr/include/linux/rds.h:274:2: error: unknown type name 'uint64_t'
  uint64_t user_token;
/usr/include/linux/rds.h:275:2: error: unknown type name 'int32_t'
  int32_t  status;

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:15:23 -05:00
Dmitry V. Levin
bcb41c6bce uapi: fix linux/mroute.h userspace compilation errors
Include <linux/in.h> to fix the following linux/mroute.h userspace
compilation errors:

/usr/include/linux/mroute.h:58:18: error: field 'vifc_lcl_addr' has incomplete type
  struct in_addr vifc_lcl_addr;     /* Local interface address */
/usr/include/linux/mroute.h:61:17: error: field 'vifc_rmt_addr' has incomplete type
  struct in_addr vifc_rmt_addr; /* IPIP tunnel addr */
/usr/include/linux/mroute.h:72:17: error: field 'mfcc_origin' has incomplete type
  struct in_addr mfcc_origin;  /* Origin of mcast */
/usr/include/linux/mroute.h:73:17: error: field 'mfcc_mcastgrp' has incomplete type
  struct in_addr mfcc_mcastgrp;  /* Group in question */
/usr/include/linux/mroute.h:84:17: error: field 'src' has incomplete type
  struct in_addr src;
/usr/include/linux/mroute.h:85:17: error: field 'grp' has incomplete type
  struct in_addr grp;
/usr/include/linux/mroute.h:109:17: error: field 'im_src' has incomplete type
  struct in_addr im_src,im_dst;
/usr/include/linux/mroute.h:109:24: error: field 'im_dst' has incomplete type
  struct in_addr im_src,im_dst;

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:15:12 -05:00
Dmitry V. Levin
72aa107df6 uapi: fix linux/mroute6.h userspace compilation errors
Include <linux/in6.h> to fix the following linux/mroute6.h userspace
compilation errors:

/usr/include/linux/mroute6.h:80:22: error: field 'mf6cc_origin' has incomplete type
  struct sockaddr_in6 mf6cc_origin;  /* Origin of mcast */
/usr/include/linux/mroute6.h:81:22: error: field 'mf6cc_mcastgrp' has incomplete type
  struct sockaddr_in6 mf6cc_mcastgrp;  /* Group in question */
/usr/include/linux/mroute6.h:91:22: error: field 'src' has incomplete type
  struct sockaddr_in6 src;
/usr/include/linux/mroute6.h:92:22: error: field 'grp' has incomplete type
  struct sockaddr_in6 grp;
/usr/include/linux/mroute6.h:132:18: error: field 'im6_src' has incomplete type
  struct in6_addr im6_src, im6_dst;
/usr/include/linux/mroute6.h:132:27: error: field 'im6_dst' has incomplete type
  struct in6_addr im6_src, im6_dst;

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:15:12 -05:00
Dmitry V. Levin
6c07ec0fa5 uapi: fix linux/ipv6_route.h userspace compilation errors
Include <linux/in6.h> to fix the following linux/ipv6_route.h userspace
compilation errors:

/usr/include/linux/ipv6_route.h:42:19: error: field 'rtmsg_dst' has incomplete type
  struct in6_addr  rtmsg_dst;
/usr/include/linux/ipv6_route.h:43:19: error: field 'rtmsg_src' has incomplete type
  struct in6_addr  rtmsg_src;
/ust/include/linux/ipv6_route.h:44:19: error: field 'rtmsg_gateway' has incomplete type
  struct in6_addr  rtmsg_gateway;

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:15:12 -05:00
Arun Easi
1e128c8129 qed: Add support for hardware offloaded FCoE.
This adds the backbone required for the various HW initalizations
which are necessary for the FCoE driver (qedf) for QLogic FastLinQ
4xxxx line of adapters - FW notification, resource initializations, etc.

Signed-off-by: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19 18:10:42 -05:00
Daniel Borkmann
74451e66d5 bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.

This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.

Before:

  7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
         f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)

After:

  7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
  [...]
  7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
         f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 13:40:05 -05:00
Daniel Borkmann
9383191da4 bpf: remove stubs for cBPF from arch code
Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
that a single __weak function in the core that can be overridden
similarly to the eBPF one. Also remove stale pr_err() mentions
of bpf_jit_compile.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 13:40:04 -05:00
Herbert Xu
da20420f83 rhashtable: Add nested tables
This patch adds code that handles GFP_ATOMIC kmalloc failure on
insertion.  As we cannot use vmalloc, we solve it by making our
hash table nested.  That is, we allocate single pages at each level
and reach our desired table size by nesting them.

When a nested table is created, only a single page is allocated
at the top-level.  Lower levels are allocated on demand during
insertion.  Therefore for each insertion to succeed, only two
(non-consecutive) pages are needed.

After a nested table is created, a rehash will be scheduled in
order to switch to a vmalloced table as soon as possible.  Also,
the rehash code will never rehash into a nested table.  If we
detect a nested table during a rehash, the rehash will be aborted
and a new rehash will be scheduled.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 12:28:35 -05:00
Or Gerlitz
e696028acc net/sched: Reflect HW offload status
Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).

Add two new flags, "in hw" and "not in hw" such that user
space can determine if a filter is actually offloaded to
hw or not. The "in hw" UAPI semantics was chosen so it's
similar to the "skip hw" flag logic.

If none of these two flags are set, this signals running
over older kernel.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 12:08:05 -05:00
David S. Miller
99d5ceeea5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-02-16

1) Make struct xfrm_input_afinfo const, nothing writes to it.
   From Florian Westphal.

2) Remove all places that write to the afinfo policy backend
   and make the struct const then.
   From Florian Westphal.

3) Prepare for packet consuming gro callbacks and add
   ESP GRO handlers. ESP packets can be decapsulated
   at the GRO layer then. It saves a round through
   the stack for each ESP packet.

Please note that this has a merge coflict between commit

63fca65d08 ("net: add confirm_neigh method to dst_ops")

from net-next and

3d7d25a68e ("xfrm: policy: remove garbage_collect callback")
a2817d8b27 ("xfrm: policy: remove family field")

from ipsec-next.

The conflict can be solved as it is done in linux-next.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-16 21:25:49 -05:00
David S. Miller
3f64116a83 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-16 19:34:01 -05:00
Linus Torvalds
3c7a9f32f9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) In order to avoid problems in the future, make cgroup bpf overriding
    explicit using BPF_F_ALLOW_OVERRIDE. From Alexei Staovoitov.

 2) LLC sets skb->sk without proper skb->destructor and this explodes,
    fix from Eric Dumazet.

 3) Make sure when we have an ipv4 mapped source address, the
    destination is either also an ipv4 mapped address or
    ipv6_addr_any(). Fix from Jonathan T. Leighton.

 4) Avoid packet loss in fec driver by programming the multicast filter
    more intelligently. From Rui Sousa.

 5) Handle multiple threads invoking fanout_add(), fix from Eric
    Dumazet.

 6) Since we can invoke the TCP input path in process context, without
    BH being disabled, we have to accomodate that in the locking of the
    TCP probe. Also from Eric Dumazet.

 7) Fix erroneous emission of NETEVENT_DELAY_PROBE_TIME_UPDATE when we
    aren't even updating that sysctl value. From Marcus Huewe.

 8) Fix endian bugs in ibmvnic driver, from Thomas Falcon.

[ This is the second version of the pull that reverts the nested
  rhashtable changes that looked a bit too scary for this late in the
  release  - Linus ]

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  rhashtable: Revert nested table changes.
  ibmvnic: Fix endian errors in error reporting output
  ibmvnic: Fix endian error when requesting device capabilities
  net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification
  net: xilinx_emaclite: fix freezes due to unordered I/O
  net: xilinx_emaclite: fix receive buffer overflow
  bpf: kernel header files need to be copied into the tools directory
  tcp: tcp_probe: use spin_lock_bh()
  uapi: fix linux/if_pppol2tp.h userspace compilation errors
  packet: fix races in fanout_add()
  ibmvnic: Fix initial MTU settings
  net: ethernet: ti: cpsw: fix cpsw assignment in resume
  kcm: fix a null pointer dereference in kcm_sendmsg()
  net: fec: fix multicast filtering hardware setup
  ipv6: Handle IPv4-mapped src to in6addr_any dst.
  ipv6: Inhibit IPv4-mapped src address on the wire.
  net/mlx5e: Disable preemption when doing TC statistics upcall
  rhashtable: Add nested tables
  tipc: Fix tipc_sk_reinit race conditions
  gfs2: Use rhashtable walk interface in glock_hash_walk
  ...
2017-02-16 08:37:18 -08:00
David S. Miller
bf3f14d634 rhashtable: Revert nested table changes.
This reverts commits:

6a25478077
9dbbfb0ab6
40137906c5

It's too risky to put in this late in the release
cycle.  We'll put these changes into the next merge
window instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-15 22:29:51 -05:00
Sudarsana Reddy Kalluru
c78c70fa30 qed: Add infrastructure for PTP support
The patch adds the required qed interfaces for configuring/reading
the PTP clock on the adapter.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-15 12:42:52 -05:00
Jiri Pirko
8ae7003255 sched: have stub for tcf_destroy_chain in case NET_CLS is not configured
This fixes broken build for !NET_CLS:

net/built-in.o: In function `fq_codel_destroy':
/home/sab/linux/net-next/net/sched/sch_fq_codel.c:468: undefined reference to `tcf_destroy_chain'

Fixes: cf1facda2f ("sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_api")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-15 12:22:48 -05:00
Steffen Klassert
7785bba299 esp: Add a software GRO codepath
This patch adds GRO ifrastructure and callbacks for ESP on
ipv4 and ipv6.

In case the GRO layer detects an ESP packet, the
esp{4,6}_gro_receive() function does a xfrm state lookup
and calls the xfrm input layer if it finds a matching state.
The packet will be decapsulated and reinjected it into layer 2.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 11:04:11 +01:00
Steffen Klassert
54ef207ac8 xfrm: Extend the sec_path for IPsec offloading
We need to keep per packet offloading informations across
the layers. So we extend the sec_path to carry these for
the input and output offload codepath.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 11:04:10 +01:00
Steffen Klassert
1e29537034 xfrm: Export xfrm_parse_spi.
We need it in the ESP offload handlers, so export it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 09:39:49 +01:00
Steffen Klassert
25393d3fc0 net: Prepare gro for packet consuming gro callbacks
The upcomming IPsec ESP gro callbacks will consume the skb,
so prepare for that.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 09:39:44 +01:00
Steffen Klassert
5f114163f2 net: Add a skb_gro_flush_final helper.
Add a skb_gro_flush_final helper to prepare for  consuming
skbs in call_gro_receive. We will extend this helper to not
touch the skb if the skb is consumed by a gro callback with
a followup patch. We need this to handle the upcomming IPsec
ESP callbacks as they reinject the skb to the napi_gro_receive
asynchronous. The handler is used in all gro_receive functions
that can call the ESP gro handlers.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 09:39:39 +01:00
Steffen Klassert
b0fcee825c xfrm: Add a secpath_set helper.
Add a new helper to set the secpath to the skb.
This avoids code duplication, as this is used
in multiple places.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 09:39:24 +01:00
Dmitry V. Levin
a725eb15db uapi: fix linux/if_pppol2tp.h userspace compilation errors
Because of <linux/libc-compat.h> interface limitations, <netinet/in.h>
provided by libc cannot be included after <linux/in.h>, therefore any
header that includes <netinet/in.h> cannot be included after <linux/in.h>.

Change uapi/linux/l2tp.h, the last uapi header that includes
<netinet/in.h>, to include <linux/in.h> and <linux/in6.h> instead of
<netinet/in.h> and use __SOCK_SIZE__ instead of sizeof(struct sockaddr)
the same way as uapi/linux/in.h does, to fix linux/if_pppol2tp.h userspace
compilation errors like this:

In file included from /usr/include/linux/l2tp.h:12:0,
                 from /usr/include/linux/if_pppol2tp.h:21,
/usr/include/netinet/in.h:31:8: error: redefinition of 'struct in_addr'

Fixes: 47c3e7783b ("net: l2tp: deprecate PPPOL2TP_MSG_* in favour of L2TP_MSG_*")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-14 22:18:05 -05:00
Linus Torvalds
747ae0a96f Merge tag 'media/v4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
 "A colorspace regression fix in V4L2 core and a CEC core bug that makes
  it discard valid messages"

* tag 'media/v4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] cec: initiator should be the same as the destination for, poll
  [media] videodev2.h: go back to limited range Y'CbCr for SRGB and, ADOBERGB
2017-02-14 06:29:21 -08:00
David S. Miller
c3d8103bc0 Merge tag 'rxrpc-rewrite-20170210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:

====================
afs: Use system UUID generation

There is now a general function for generating a UUID and AFS should make
use of it.  It's also been recommended to me that I switch to using random
rather than time plus MAC address-based UUIDs which this function does.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-13 22:24:16 -05:00
Tobias Klauser
fb585b4438 net: make net_device members garp_port and mrp_port conditional
garp_port is only used in net/802/garp.c which is only compiled with
CONFIG_GARP enabled. Same goes for mrp_port which is only used in
net/802/mrp.c with CONFIG_MRP enabled.

Only include the two members in struct net_device if their respective
CONFIG_* is enabled. This saves a few bytes in struct net_device in case
CONFIG_GARP or CONFIG_MRP are not enabled.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-13 22:23:39 -05:00
Eric Dumazet
37fabbf4d4 net: busy-poll: remove LL_FLUSH_FAILED and LL_FLUSH_BUSY
Commit 79e7fff47b ("net: remove support for per driver
ndo_busy_poll()") made them obsolete.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-13 22:23:39 -05:00
Herbert Xu
40137906c5 rhashtable: Add nested tables
This patch adds code that handles GFP_ATOMIC kmalloc failure on
insertion.  As we cannot use vmalloc, we solve it by making our
hash table nested.  That is, we allocate single pages at each level
and reach our desired table size by nesting them.

When a nested table is created, only a single page is allocated
at the top-level.  Lower levels are allocated on demand during
insertion.  Therefore for each insertion to succeed, only two
(non-consecutive) pages are needed.

After a nested table is created, a rehash will be scheduled in
order to switch to a vmalloced table as soon as possible.  Also,
the rehash code will never rehash into a nested table.  If we
detect a nested table during a rehash, the rehash will be aborted
and a new rehash will be scheduled.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-13 22:17:05 -05:00
Hans Verkuil
35879ee476 [media] videodev2.h: go back to limited range Y'CbCr for SRGB and, ADOBERGB
This reverts 'commit 7e0739cd9c ("[media] videodev2.h: fix
sYCC/AdobeYCC default quantization range").

The problem is that many drivers can convert R'G'B' content (often
from sensors) to Y'CbCr, but they all produce limited range Y'CbCr.

To stay backwards compatible the default quantization range for
sRGB and AdobeRGB Y'CbCr encoding should be limited range, not full
range, even though the corresponding standards specify full range.

Update the V4L2_MAP_QUANTIZATION_DEFAULT define accordingly and
also update the documentation.

Fixes: 7e0739cd9c ("[media] videodev2.h: fix sYCC/AdobeYCC default quantization range")
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Cc: <stable@vger.kernel.org>      # for v4.9 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-02-13 14:33:56 -02:00
David S. Miller
7c92d61eca Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, most relevantly they are:

1) Extend nft_exthdr to allow to match TCP options bitfields, from
   Manuel Messner.

2) Allow to check if IPv6 extension header is present in nf_tables,
   from Phil Sutter.

3) Allow to set and match conntrack zone in nf_tables, patches from
   Florian Westphal.

4) Several patches for the nf_tables set infrastructure, this includes
   cleanup and preparatory patches to add the new bitmap set type.

5) Add optional ruleset generation ID check to nf_tables and allow to
   delete rules that got no public handle yet via NFTA_RULE_ID. These
   patches add the missing kernel infrastructure to support rule
   deletion by description from userspace.

6) Missing NFT_SET_OBJECT flag to select the right backend when sets
   stores an object map.

7) A couple of cleanups for the expectation and SIP helper, from Gao
   feng.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-12 22:11:43 -05:00
Alexei Starovoitov
7f67763337 bpf: introduce BPF_F_ALLOW_OVERRIDE flag
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.

Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X

2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.

3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.

In the future this behavior may be extended with a chain of
non-overridable programs.

Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.

Add several testcases and adjust libbpf.

Fixes: 3007098494 ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-12 21:52:19 -05:00
Pablo Neira Ayuso
1a94e38d25 netfilter: nf_tables: add NFTA_RULE_ID attribute
This new attribute allows us to uniquely identify a rule in transaction.
Robots may trigger an insertion followed by deletion in a batch, in that
scenario we still don't have a public rule handle that we can use to
delete the rule. This is similar to the NFTA_SET_ID attribute that
allows us to refer to an anonymous set from a batch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-12 14:45:13 +01:00
Pablo Neira Ayuso
8c4d4e8b56 netfilter: nfnetlink: allow to check for generation ID
This patch allows userspace to specify the generation ID that has been
used to build an incremental batch update.

If userspace specifies the generation ID in the batch message as
attribute, then nfnetlink compares it to the current generation ID so
you make sure that you work against the right baseline. Otherwise, bail
out with ERESTART so userspace knows that its changeset is stale and
needs to respin. Userspace can do this transparently at the cost of
taking slightly more time to refresh caches and rework the changeset.

This check is optional, if there is no NFNL_BATCH_GENID attribute in the
batch begin message, then no check is performed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-12 14:45:11 +01:00
Julian Anastasov
c16ec18599 net: rename dst_neigh_output back to neigh_output
After the dst->pending_confirm flag was removed, we do not
need anymore to provide dst arg to dst_neigh_output.
So, rename it to neigh_output as before commit 5110effee8
("net: Do delayed neigh confirmation.").

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 21:25:18 -05:00
Sainath Grandhi
9a393b5d59 tap: tap as an independent module
This patch makes tap a separate module for other types of virtual interfaces, for example,
ipvlan to use.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 20:59:41 -05:00
Sainath Grandhi
d9f1f61c08 tap: Extending tap device create/destroy APIs
Extending tap APIs get/free_minor and create/destroy_cdev to handle more than one
type of virtual interface.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 20:59:41 -05:00
Sainath Grandhi
6fe3faf867 tap: Abstract type of virtual interface from tap implementation
macvlan object is re-structured to hold tap related elements in a separate
entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with
idr and fetched again on tap_open. Few of the tap functions are modified to
accepted tap_dev as argument. tap_dev object includes callbacks to be used by
underlying virtual interface to take care of tx and rx accounting.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 20:59:41 -05:00
Sainath Grandhi
ebc05ba7e8 tap: Tap character device creation/destroy API
This patch provides tap device create/destroy APIs in tap.c.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 20:59:41 -05:00