Commit Graph

104164 Commits

Author SHA1 Message Date
John Fastabend
7a3dd8c897 tls: async support causes out-of-bounds access in crypto APIs
When async support was added it needed to access the sk from the async
callback to report errors up the stack. The patch tried to use space
after the aead request struct by directly setting the reqsize field in
aead_request. This is an internal field that should not be used
outside the crypto APIs. It is used by the crypto code to define extra
space for private structures used in the crypto context. Users of the
API then use crypto_aead_reqsize() and add the returned amount of
bytes to the end of the request memory allocation before posting the
request to encrypt/decrypt APIs.

So this breaks (with general protection fault and KASAN error, if
enabled) because the request sent to decrypt is shorter than required
causing the crypto API out-of-bounds errors. Also it seems unlikely the
sk is even valid by the time it gets to the callback because of memset
in crypto layer.

Anyways, fix this by holding the sk in the skb->sk field when the
callback is set up and because the skb is already passed through to
the callback handler via void* we can access it in the handler. Then
in the handler we need to be careful to NULL the pointer again before
kfree_skb. I added comments on both the setup (in tls_do_decryption)
and when we clear it from the crypto callback handler
tls_decrypt_done(). After this selftests pass again and fixes KASAN
errors/warnings.

Fixes: 94524d8fc9 ("net/tls: Add support for async decryption of tls records")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Vakul Garg <Vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 08:01:36 -07:00
Li RongQing
52bb6677d5 net: move definition of pcpu_lstats to header file
pcpu_lstats is defined in several files, so unify them as one
and move to header file

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-14 08:32:23 -07:00
YueHaibing
293681f149 vxlan: Remove duplicated include from vxlan.h
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 12:07:56 -07:00
Jason Wang
fe8dd45bb7 tun: switch to new type of msg_control
This patch introduces to a new tun/tap specific msg_control:

#define TUN_MSG_UBUF 1
#define TUN_MSG_PTR  2
struct tun_msg_ctl {
       int type;
       void *ptr;
};

This allows us to pass different kinds of msg_control through
sendmsg(). The first supported type is ubuf (TUN_MSG_UBUF) which will
be used by the existed vhost_net zerocopy code. The second is XDP
buff, which allows vhost_net to pass XDP buff to TUN. This could be
used to implement accepting an array of XDP buffs from vhost_net in
the following patches.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 09:25:40 -07:00
Jason Wang
e4a2a3048e net: sock: introduce SOCK_XDP
This patch introduces a new sock flag - SOCK_XDP. This will be used
for notifying the upper layer that XDP program is attached on the
lower socket, and requires for extra headroom.

TUN will be the first user.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 09:25:40 -07:00
Cong Wang
9708d2b5b7 llc: avoid blocking in llc_sap_close()
llc_sap_close() is called by llc_sap_put() which
could be called in BH context in llc_rcv(). We can't
block in BH.

There is no reason to block it here, kfree_rcu() should
be sufficient.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 09:04:58 -07:00
Andre Naujoks
15033f0457 ipv6: Add sockopt IPV6_MULTICAST_ALL analogue to IP_MULTICAST_ALL
The socket option will be enabled by default to ensure current behaviour
is not changed. This is the same for the IPv4 version.

A socket bound to in6addr_any and a specific port will receive all traffic
on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if
one or more multicast groups were joined (using said socket) and only
pass on multicast traffic from groups, which were explicitly joined via
this socket.

Without this option disabled a socket (system even) joined to multiple
multicast groups is very hard to get right. Filtering by destination
address has to take place in user space to avoid receiving multicast
traffic from other multicast groups, which might have traffic on the same
port.

The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6,
too, is not done to avoid changing the behaviour of current applications.

Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
Acked-By: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 08:17:27 -07:00
Hauke Mehrtens
7969119293 net: dsa: Add Lantiq / Intel GSWIP tag support
This handles the tag added by the PMAC on the VRX200 SoC line.

The GSWIP uses internally a GSWIP special tag which is located after the
Ethernet header. The PMAC which connects the GSWIP to the CPU converts
this special tag used by the GSWIP into the PMAC special tag which is
added in front of the Ethernet header.

This was tested with GSWIP 2.1 found in the VRX200 SoCs, other GSWIP
versions use slightly different PMAC special tags.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 08:14:33 -07:00
David S. Miller
aaf9253025 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-09-12 22:22:42 -07:00
Hangbin Liu
52d0d404d3 geneve: add ttl inherit support
Similar with commit 72f6d71e49 ("vxlan: add ttl inherit support"),
currently ttl == 0 means "use whatever default value" on geneve instead
of inherit inner ttl. To respect compatibility with old behavior, let's
add a new IFLA_GENEVE_TTL_INHERIT for geneve ttl inherit support.

Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 20:38:22 -07:00
Linus Torvalds
67b076095d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix up several Kconfig dependencies in netfilter, from Martin Willi
    and Florian Westphal.

 2) Memory leak in be2net driver, from Petr Oros.

 3) Memory leak in E-Switch handling of mlx5 driver, from Raed Salem.

 4) mlx5_attach_interface needs to check for errors, from Huy Nguyen.

 5) tipc_release() needs to orphan the sock, from Cong Wang.

 6) Need to program TxConfig register after TX/RX is enabled in r8169
    driver, not beforehand, from Maciej S. Szmigiero.

 7) Handle 64K PAGE_SIZE properly in ena driver, from Netanel Belgazal.

 8) Fix crash regression in ip_do_fragment(), from Taehee Yoo.

 9) syzbot can create conditions where kernel log is flooded with
    synflood warnings due to creation of many listening sockets, fix
    that. From Willem de Bruijn.

10) Fix RCU issues in rds socket layer, from Cong Wang.

11) Fix vlan matching in nfp driver, from Pieter Jansen van Vuuren.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
  nfp: flower: reject tunnel encap with ipv6 outer headers for offloading
  nfp: flower: fix vlan match by checking both vlan id and vlan pcp
  tipc: check return value of __tipc_dump_start()
  s390/qeth: don't dump past end of unknown HW header
  s390/qeth: use vzalloc for QUERY OAT buffer
  s390/qeth: switch on SG by default for IQD devices
  s390/qeth: indicate error when netdev allocation fails
  rds: fix two RCU related problems
  r8169: Clear RTL_FLAG_TASK_*_PENDING when clearing RTL_FLAG_TASK_ENABLED
  erspan: fix error handling for erspan tunnel
  erspan: return PACKET_REJECT when the appropriate tunnel is not found
  tcp: rate limit synflood warnings further
  MIPS: lantiq: dma: add dev pointer
  netfilter: xt_hashlimit: use s->file instead of s->private
  netfilter: nfnetlink_queue: Solve the NFQUEUE/conntrack clash for NF_REPEAT
  netfilter: cttimeout: ctnl_timeout_find_get() returns incorrect pointer to type
  netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUT
  netfilter: conntrack: reset tcp maxwin on re-register
  qmi_wwan: Support dynamic config on Quectel EP06
  ethernet: renesas: convert to SPDX identifiers
  ...
2018-09-12 17:32:50 -10:00
Nikolay Aleksandrov
435f2e7cc0 net: bridge: add support for sticky fdb entries
Add support for entries which are "sticky", i.e. will not change their port
if they show up from a different one. A new ndm flag is introduced for that
purpose - NTF_STICKY. We allow to set it only to non-local entries.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 20:30:03 -07:00
Andrew Lunn
22b7d29926 net: ethernet: Add helper to determine if pause configuration is supported
Rather than have MAC drivers open code the test, add a helper in
phylib. This will help when we change the type of phydev->supported.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 20:24:21 -07:00
Andrew Lunn
0c122405d4 net: ethernet: Add helper for set_pauseparam for Pause
ethtool can be used to enable/disable pause. Add a helper to configure
the PHY when Pause is supported.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 20:24:21 -07:00
Andrew Lunn
70814e819c net: ethernet: Add helper for set_pauseparam for Asym Pause
ethtool can be used to enable/disable pause. Add a helper to configure
the PHY when asym pause is supported.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 20:24:21 -07:00
Andrew Lunn
c306ad3618 net: ethernet: Add helper for MACs which support pause
Rather than have the MAC drivers manipulate phydev members, add a
helper function for MACs supporting Pause, but not Asym Pause.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 20:24:21 -07:00
Andrew Lunn
af8d9bb2f2 net: ethernet: Add helper for MACs which support asym pause
Rather than have the MAC drivers manipulate phydev members to indicate
they support Asym Pause, add a helper function.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 20:24:21 -07:00
Andrew Lunn
41124fa64d net: ethernet: Add helper to remove a supported link mode
Some MAC hardware cannot support a subset of link modes. e.g. often
1Gbps Full duplex is supported, but Half duplex is not. Add a helper
to remove such a link mode.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 20:24:21 -07:00
David Ahern
67edf21e5a scsi: libcxgbi: fib6_ino reference in rt6_info is rcu protected
The fib6_info reference in rt6_info is rcu protected. Add a helper
to extract prefsrc from and update cxgbi_check_route6 to use it.

Fixes: 0153167aeb ("net/ipv6: Remove rt6i_prefsrc")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 00:08:00 -07:00
David S. Miller
4ecdf77091 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for you net tree:

1) Remove duplicated include at the end of UDP conntrack, from Yue Haibing.

2) Restore conntrack dependency on xt_cluster, from Martin Willi.

3) Fix splat with GSO skbs from the checksum target, from Florian Westphal.

4) Rework ct timeout support, the template strategy to attach custom timeouts
   is not correct since it will not work in conjunction with conntrack zones
   and we have a possible free after use when removing the rule due to missing
   refcounting. To fix these problems, do not use conntrack template at all
   and set custom timeout on the already valid conntrack object. This
   fix comes with a preparation patch to simplify timeout adjustment by
   initializating the first position of the timeout array for all of the
   existing trackers. Patchset from Florian Westphal.

5) Fix missing dependency on from IPv4 chain NAT type, from Florian.

6) Release chain reference counter from the flush path, from Taehee Yoo.

7) After flushing an iptables ruleset, conntrack hooks are unregistered
   and entries are left stale to be cleaned up by the timeout garbage
   collector. No TCP tracking is done on established flows by this time.
   If ruleset is reloaded, then hooks are registered again and TCP
   tracking is restored, which considers packets to be invalid. Clear
   window tracking to exercise TCP flow pickup from the middle given that
   history is lost for us. Again from Florian.

8) Fix crash from netlink interface with CONFIG_NF_CONNTRACK_TIMEOUT=y
   and CONFIG_NF_CT_NETLINK_TIMEOUT=n.

9) Broken CT target due to returning incorrect type from
   ctnl_timeout_find_get().

10) Solve conntrack clash on NF_REPEAT verdicts too, from Michal Vaner.

11) Missing conversion of hashlimit sysctl interface to new API, from
    Cong Wang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 21:17:30 -07:00
Linus Torvalds
5e335542de Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:

 - functional regression fix for sensor-hub driver from Hans de Goede

 - stop doing device reset for i2c-hid devices, which unbreaks some of
   them (and is in line with the specification), from Kai-Heng Feng

 - error handling fix for hid-core from Gustavo A. R. Silva

 - functional regression fix for some Elan panels from Benjamin
   Tissoires

 - a few new device ID additions and misc small fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: i2c-hid: Don't reset device upon system resume
  HID: sensor-hub: Restore fixup for Lenovo ThinkPad Helix 2 sensor hub report
  HID: core: fix NULL pointer dereference
  HID: core: fix grouping by application
  HID: multitouch: fix Elan panels with 2 input modes declaration
  HID: hid-saitek: Add device ID for RAT 7 Contagion
  HID: core: fix memory leak on probe
  HID: input: fix leaking custom input node name
  HID: add support for Apple Magic Keyboards
  HID: i2c-hid: Fix flooded incomplete report after S3 on Rayd touchscreen
  HID: intel-ish-hid: Enable Sunrise Point-H ish driver
2018-09-11 16:23:21 -10:00
Vlad Buslov
86c55361e5 net: sched: cls_flower: dump offload count value
Change flower in_hw_count type to fixed-size u32 and dump it as
TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared
blocks and re-offload functionality.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:35:15 -07:00
David S. Miller
992cba7e27 net: Add and use skb_list_del_init().
It documents what is happening, and eliminates the spurious list
pointer poisoning.

In the long term, in order to get proper list head debugging, we
might want to use the list poison value as the indicator that
an SKB is a singleton and not on a list.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:06:54 -07:00
David S. Miller
a8305bff68 net: Add and use skb_mark_not_on_list().
An SKB is not on a list if skb->next is NULL.

Codify this convention into a helper function and use it
where we are dequeueing an SKB and need to mark it as such.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:06:54 -07:00
David S. Miller
8b69bd7d8a ppp: Remove direct skb_queue_head list pointer access.
Add a helper, __skb_peek(), and use it in ppp_mp_reconstruct().

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:06:53 -07:00
David S. Miller
596977300a sch_netem: Move private queue handler to generic location.
By hand copies of SKB list handlers do not belong in individual packet
schedulers.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:06:53 -07:00
David S. Miller
aea890b8b2 sch_htb: Remove local SKB queue handling code.
Instead, adjust __qdisc_enqueue_tail() such that HTB can use it
instead.

The only other caller of __qdisc_enqueue_tail() is
qdisc_enqueue_tail() so we can move the backlog and return value
handling (which HTB doesn't need/want) to the latter.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:06:52 -07:00
David Ahern
0153167aeb net/ipv6: Remove rt6i_prefsrc
After the conversion to fib6_info, rt6i_prefsrc has a single user that
reads the value and otherwise it is only set. The one reader can be
converted to use rt->from so rt6i_prefsrc can be removed, reducing
rt6_info by another 20 bytes.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:02:25 -07:00
Linus Torvalds
3567994a05 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping fixes from Thomas Gleixner:
 "Two fixes for timekeeping:

   - Revert to the previous kthread based update, which is unfortunately
     required due to lock ordering issues. The removal caused boot
     failures on old Core2 machines. Add a proper comment why the thread
     needs to stay to prevent accidental removal in the future.

   - Fix a silly typo in a function declaration"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Revert "Remove kthread"
  timekeeping: Fix declaration of read_persistent_wall_and_boot_offset()
2018-09-09 06:55:27 -07:00
David S. Miller
ddc9cc0131 Merge tag 'mlx5e-updates-2018-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5e-updates-2018-09-05

This series provides updates to mlx5 ethernet driver.

1) Starting with a four patches series to optimize flow counters updates,
From Vlad Buslov:
==============================================

By default mlx5 driver updates cached counters each second. Update function
consumes noticeable amount of CPU resources. The goal of this patch series
is to optimize update function.

Investigation revealed following bottlenecks in fs counters
implementation:
 1) Update code(scheduled each second) iterates over all counters twice.
 (first for finding and deleting counters that are marked for deletion,
 second iteration is for actually updating the counters)
 2) Counters are stored in rb tree. Linear iteration over all rb tree
 elements(rb_next in profiling data) consumed ~65% of time spent in
 update function.

Following optimizations were implemented:
 1) Instead of just marking counters for deletion, store them in
 standalone list. This removes first iteration over whole counters tree.
 2) Store counters in sorted list to optimize traversing them and remove
 calls to rb_next.

First implementation of these changes caused degradation of performance,
instead of improving it. Investigation revealed that there first cache
line of struct mlx5_fc is full and adding anything to it causes amount
of cache misses to double. To mitigate that, following refactorings were
implemented:
 - Change 'addlist' list type from double linked to single linked. This
 allowes to get free space for one additional pointer that is used to
 store deletion list(optimization 1)
 - Substitute rb tree with idr. Idr is non-intrusive data structure and
 doesn't require adding any new members to struct mlx5_fc. Use free
 space that became available for double linked sorted list that is used
 for traversing all counters. (optimization 2)

Described changes reduced CPU time spent in mlx5_fc_stats_work from 70%
to 44%. (global perf profile mode)
============================================

The rest of the series are misc updates:

2) From Kamal, Move mlx5e_priv_flags into en_ethtool.c, to avoid a
compilation warning.

3) From Roi Dayan, Move Q counters allocation and drop RQ to init_rx profile
function to avoid allocating Q counters when not required.

4) From Shay Agroskin, Replace PTP clock lock from RW lock to seq lock.
Almost double the packet rate when timestamping is active on multiple TX
queues.

5) From: Natali Shechtman, set ECN for received packets using CQE indication.

6) From: Alaa Hleihel, don't set CHECKSUM_COMPLETE on SCTP packets.
CHECKSUM_COMPLETE is not applicable to SCTP protocol.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-06 15:42:04 -07:00
Linus Torvalds
ca16eb342e Merge tag 'for-linus-20180906' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Small collection of fixes that should go into this release. This
  contains:

   - Small series that fixes a race between blkcg teardown and writeback
     (Dennis Zhou)

   - Fix disallowing invalid block size settings from the nbd ioctl (me)

   - BFQ fix for a use-after-free on last release of a bfqg (Konstantin
     Khlebnikov)

   - Fix for the "don't warn for flush" fix (Mikulas)"

* tag 'for-linus-20180906' of git://git.kernel.dk/linux-block:
  block: bfq: swap puts in bfqg_and_blkg_put
  block: don't warn when doing fsync on read-only devices
  nbd: don't allow invalid blocksize settings
  blkcg: use tryget logic when associating a blkg with a bio
  blkcg: delay blkg destruction until after writeback has finished
  Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"
2018-09-06 14:01:15 -07:00
Linus Torvalds
be65e2595b Merge tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "This fixes two annoying bugs:

   - The first one is a side effect caused by using SRCU for rcuidle
     tracepoints. It seems that the perf was depending on the rcuidle
     tracepoints to make RCU watch when it wasn't.

     The real fix will be to have perf use SRCU instead of depending on
     RCU watching, but that can't be done until SRCU is safe to use in
     NMI context (Paul's working on that).

   - The second bug fix is for a bug that's been periodically making my
     tests fail randomly for some time. I haven't had time to track it
     down, but finally have. It has to do with stressing NMIs (via perf)
     while enabling or disabling ftrace function handling with lockdep
     enabled.

     If an interrupt happens and just as it returns, it sets lockdep
     back to "interrupts enabled" but before it returns an NMI is
     triggered, and if this happens while printk_nmi_enter has a
     breakpoint attached to it (because ftrace is converting it to or
     from nop to call fentry), the breakpoint trap also calls into
     lockdep, and since returning from the NMI to a interrupt handler,
     interrupts were disabled when the NMI went off, lockdep keeps its
     state as interrupts disabled when it returns back from the
     interrupt handler where interrupts are enabled.

     This causes lockdep_assert_irqs_enabled() to trigger a false
     positive"

* tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  printk/tracing: Do not trace printk_nmi_enter()
  tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
2018-09-06 09:06:49 -07:00
Denis Bolotin
a3f723079d qed*: Utilize FW 8.37.7.0
This patch adds a new qed firmware with fixes and support for new features.

Fixes:
- Fix a rare case of device crash with iWARP, iSCSI or FCoE offload.
- Fix GRE tunneled traffic when iWARP offload is enabled.
- Fix RoCE failure in ib_send_bw when using inline data.
- Fix latency optimization flow for inline WQEs.
- BigBear 100G fix

RDMA:
- Reduce task context size.
- Application page sizes above 2GB support.
- Performance improvements.

ETH:
- Tenant DCB support.
- Replace RSS indirection table update interface.

Misc:
- Debug Tools changes.

Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-06 07:44:35 -07:00
Christian Brauner
19d8f1ad12 if_link: add IFLA_TARGET_NETNSID alias
This adds IFLA_TARGET_NETNSID as an alias for IFLA_IF_NETNSID for
RTM_*LINK requests.
The new name is clearer and also aligns with the newly introduced
IFA_TARGET_NETNSID propert for RTM_*ADDR requests.

Signed-off-by: Christian Brauner <christian@brauner.io>
Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
9f3c057c14 if_addr: add IFA_TARGET_NETNSID
This adds a new IFA_TARGET_NETNSID property to be used by address
families such as PF_INET and PF_INET6.
The IFA_TARGET_NETNSID property can be used to send a network namespace
identifier as part of a request. If a IFA_TARGET_NETNSID property is
identified it will be used to retrieve the target network namespace in
which the request is to be made.

Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
c383edc424 rtnetlink: add rtnl_get_net_ns_capable()
get_target_net() will be used in follow-up patches in ipv{4,6} codepaths to
retrieve network namespaces based on network namespace identifiers. So
remove the static declaration and export in the rtnetlink header. Also,
rename it to rtnl_get_net_ns_capable() to make it obvious what this
function is doing.
Export rtnl_get_net_ns_capable() so it can be used when ipv6 is built as
a module.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Vincent Whitchurch
fa788d986a packet: add sockopt to ignore outgoing packets
Currently, the only way to ignore outgoing packets on a packet socket is
via the BPF filter.  With MSG_ZEROCOPY, packets that are looped into
AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even
if the filter run from packet_rcv() would reject them.  So the presence
of a packet socket on the interface takes away the benefits of
MSG_ZEROCOPY, even if the packet socket is not interested in outgoing
packets.  (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily
cloned, but the cost for that is much lower.)

Add a socket option to allow AF_PACKET sockets to ignore outgoing
packets to solve this.  Note that the *BSDs already have something
similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT.

The first intended user is lldpd.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:09:37 -07:00
Shay Agroskin
64109f1dc4 net/mlx5e: Replace PTP clock lock from RW lock to seq lock
Changed "priv.clock.lock" lock from 'rw_lock' to 'seq_lock'
in order to improve packet rate performance.

Tested on Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz.
Sent 64b packets between two peers connected by ConnectX-5,
and measured packet rate for the receiver in three modes:
	no time-stamping (base rate)
	time-stamping using rw_lock (old lock) for critical region
	time-stamping using seq_lock (new lock) for critical region
Only the receiver time stamped its packets.

The measured packet rate improvements are:

	Single flow (multiple TX rings to single RX ring):
		without timestamping:	  4.26 (M packets)/sec
		with rw-lock (old lock):  4.1  (M packets)/sec
		with seq-lock (new lock): 4.16 (M packets)/sec
		1.46% improvement

	Multiple flows (multiple TX rings to six RX rings):
		without timestamping: 	  22   (M packets)/sec
		with rw-lock (old lock):  11.7 (M packets)/sec
		with seq-lock (new lock): 21.3 (M packets)/sec
		82.05% improvement

The packet rate improvement is due to the lack of atomic operations
for the 'readers' by the seq-lock.
Since there are much more 'readers' than 'writers' contention
on this lock, almost all atomic operations are saved.
this results in a dramatic decrease in overall
cache misses.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 21:14:57 -07:00
Vlad Buslov
12d6066c3b net/mlx5: Add flow counters idr
Previous patch in series changed flow counter storage structure from
rb_tree to linked list in order to improve flow counter traversal
performance. The drawback of such solution is that flow counter lookup by
id becomes linear in complexity.

Store pointers to flow counters in idr in order to improve lookup
performance to logarithmic again. Idr is non-intrusive data structure and
doesn't require extending flow counter struct with new elements. This means
that idr can be used for lookup, while linked list from previous patch is
used for traversal, and struct mlx5_fc size is <= 2 cache lines.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 21:14:57 -07:00
Vlad Buslov
9aff93d7d0 net/mlx5: Store flow counters in a list
In order to improve performance of flow counter stats query loop that
traverses all configured flow counters, replace rb_tree with double-linked
list. This change improves performance of traversing flow counters by
removing the tree traversal. (profiling data showed that call to rb_next
was most top CPU consumer)

However, lookup of flow flow counter in list becomes linear, instead of
logarithmic. This problem is fixed by next patch in series, which adds idr
for fast lookup. Idr is to be used because it is not an intrusive data
structure and doesn't require adding any new members to struct mlx5_fc,
which allows its control data part to stay <= 1 cache line in size.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 21:14:57 -07:00
Vlad Buslov
6e5e228391 net/mlx5: Add new list to store deleted flow counters
In order to prevent flow counters stats work function from traversing whole
flow counters tree while searching for deleted flow counters, new list to
store deleted flow counters is added to struct mlx5_fc_stats. Lockless
NULL-terminated single linked list data type is used due to following
reasons:
 - This use case only needs to add single element to list and
 remove/iterate whole list. Lockless list doesn't require any additional
 synchronization for these operations.
 - First cache line of flow counter data structure only has space to store
 single additional pointer, which precludes usage of double linked list.

Remove flow counter 'deleted' flag that is no longer needed.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 21:14:57 -07:00
Vlad Buslov
83033688b7 net/mlx5: Change flow counters addlist type to single linked list
In order to prevent flow counters stats work function from traversing whole
flow counters tree while searching for deleted flow counters, new list to
store deleted flow counters will be added to struct mlx5_fc_stats. However,
the flow counter structure itself has no space left to store any more data
in first cache line. To free space that is needed to store additional list
node, convert current addlist double linked list (two pointers per node) to
atomic single linked list (one pointer per node).

Lockless NULL-terminated single linked list data type doesn't require any
additional external synchronization for operations used by flow counters
module (add single new element, remove all elements from list and traverse
them). Remove addlist_lock that is no longer needed.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 21:14:56 -07:00
Tariq Toukan
a090362210 net/mlx5: Use u16 for Work Queue buffer strides offset
Minimal stride size is 16.
Hence, the number of strides in a fragment (of PAGE_SIZE)
is <= PAGE_SIZE / 16 <= 4K.

u16 is sufficient to represent this.

Fixes: d7037ad73d ("net/mlx5: Fix QP fragmented buffer allocation")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Tariq Toukan
8d71e81850 net/mlx5: Use u16 for Work Queue buffer fragment size
Minimal stride size is 16.
Hence, the number of strides in a fragment (of PAGE_SIZE)
is <= PAGE_SIZE / 16 <= 4K.

u16 is sufficient to represent this.

Fixes: 388ca8be00 ("IB/mlx5: Implement fragmented completion queue (CQ)")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Jack Morgenstein
76d5581c87 net/mlx5: Fix use-after-free in self-healing flow
When the mlx5 health mechanism detects a problem while the driver
is in the middle of init_one or remove_one, the driver needs to prevent
the health mechanism from scheduling future work; if future work
is scheduled, there is a problem with use-after-free: the system WQ
tries to run the work item (which has been freed) at the scheduled
future time.

Prevent this by disabling work item scheduling in the health mechanism
when the driver is in the middle of init_one() or remove_one().

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Steven Rostedt (VMware)
865e63b04e tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
Borislav reported the following splat:

 =============================
 WARNING: suspicious RCU usage
 4.19.0-rc1+ #1 Not tainted
 -----------------------------
 ./include/linux/rcupdate.h:631 rcu_read_lock() used illegally while idle!
 other info that might help us debug this:

 RCU used illegally from idle CPU!
 rcu_scheduler_active = 2, debug_locks = 1
 RCU used illegally from extended quiescent state!
 1 lock held by swapper/0/0:
  #0: 000000004557ee0e (rcu_read_lock){....}, at: perf_event_output_forward+0x0/0x130

 stack backtrace:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc1+ #1
 Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
 Call Trace:
  dump_stack+0x85/0xcb
  perf_event_output_forward+0xf6/0x130
  __perf_event_overflow+0x52/0xe0
  perf_swevent_overflow+0x91/0xb0
  perf_tp_event+0x11a/0x350
  ? find_held_lock+0x2d/0x90
  ? __lock_acquire+0x2ce/0x1350
  ? __lock_acquire+0x2ce/0x1350
  ? retint_kernel+0x2d/0x2d
  ? find_held_lock+0x2d/0x90
  ? tick_nohz_get_sleep_length+0x83/0xb0
  ? perf_trace_cpu+0xbb/0xd0
  ? perf_trace_buf_alloc+0x5a/0xa0
  perf_trace_cpu+0xbb/0xd0
  cpuidle_enter_state+0x185/0x340
  do_idle+0x1eb/0x260
  cpu_startup_entry+0x5f/0x70
  start_kernel+0x49b/0x4a6
  secondary_startup_64+0xa4/0xb0

This is due to the tracepoints moving to SRCU usage which does not require
RCU to be "watching". But perf uses these tracepoints with RCU and expects
it to be. Hence, we still need to add in the rcu_irq_enter/exit_irqson()
calls for "rcuidle" tracepoints. This is a temporary fix until we have SRCU
working in NMI context, and then perf can be converted to use that instead
of normal RCU.

Link: http://lkml.kernel.org/r/20180904162611.6a120068@gandalf.local.home

Cc: x86-ml <x86@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Fixes: e6753f23d9 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-09-05 11:23:21 -04:00
Sara Sharon
9739fe29a2 mac80211: add an option for drivers to check if packets can be aggregated
Some hardwares have limitations on the packets' type in AMSDU.
Add an optional driver callback to determine if two skbs can
be used in the same AMSDU or not.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:11:50 +02:00
Sara Sharon
edba6bdad6 mac80211: allow AMSDU size limitation per-TID
Some drivers may have AMSDU size limitation per TID, due to
HW constrains. Add an option to set this limit.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:10:26 +02:00
Sara Sharon
0eeb2b674f mac80211: add an option for station management TXQ
We have a TXQ abstraction for non-data packets that need
powersave buffering. Since the AP cannot sleep, in case
of station we can use this TXQ for all management frames,
regardless if they are bufferable. Add HW flag to allow
that.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:10:11 +02:00
Shaul Triebitz
add7453ad6 wireless: align to draft 11ax D3.0
Align to new 11ax draft D3.0.  Change/add new MAC and PHY capabilities
and update drivers' 11ax capabilities and mac80211's debugfs
accordingly.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:09:50 +02:00