Commit Graph

84362 Commits

Author SHA1 Message Date
Jiri Pirko
bf3994d2ed net/sched: introduce Match-all classifier
The matchall classifier matches every packet and allows the user to apply
actions on it. This filter is very useful in usecases where every packet
should be matched, for example, packet mirroring (SPAN) can be setup very
easily using that filter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:59 -07:00
David S. Miller
c42d7121fb Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next,
they are:

1) Count pre-established connections as active in "least connection"
   schedulers such that pre-established connections to avoid overloading
   backend servers on peak demands, from Michal Kubecek via Simon Horman.

2) Address a race condition when resizing the conntrack table by caching
   the bucket size when fulling iterating over the hashtable in these
   three possible scenarios: 1) dump via /proc/net/nf_conntrack,
   2) unlinking userspace helper and 3) unlinking custom conntrack timeout.
   From Liping Zhang.

3) Revisit early_drop() path to perform lockless traversal on conntrack
   eviction under stress, use del_timer() as synchronization point to
   avoid two CPUs evicting the same entry, from Florian Westphal.

4) Move NAT hlist_head to nf_conn object, this simplifies the existing
   NAT extension and it doesn't increase size since recent patches to
   align nf_conn, from Florian.

5) Use rhashtable for the by-source NAT hashtable, also from Florian.

6) Don't allow --physdev-is-out from OUTPUT chain, just like
   --physdev-out is not either, from Hangbin Liu.

7) Automagically set on nf_conntrack counters if the user tries to
   match ct bytes/packets from nftables, from Liping Zhang.

8) Remove possible_net_t fields in nf_tables set objects since we just
   simply pass the net pointer to the backend set type implementations.

9) Fix possible off-by-one in h323, from Toby DiPasquale.

10) early_drop() may be called from ctnetlink patch, so we must hold
    rcu read size lock from them too, this amends Florian's patch #3
    coming in this batch, from Liping Zhang.

11) Use binary search to validate jump offset in x_tables, this
    addresses the O(n!) validation that was introduced recently
    resolve security issues with unpriviledge namespaces, from Florian.

12) Fix reference leak to connlabel in error path of nft_ct, from Zhang.

13) Three updates for nft_log: Fix log prefix leak in error path. Bail
    out on loglevel larger than debug in nft_log and set on the new
    NF_LOG_F_COPY_LEN flag when snaplen is specified. Again from Zhang.

14) Allow to filter rule dumps in nf_tables based on table and chain
    names.

15) Simplify connlabel to always use 128 bits to store labels and
    get rid of unused function in xt_connlabel, from Florian.

16) Replace set_expect_timeout() by mod_timer() from the h323 conntrack
    helper, by Gao Feng.

17) Put back x_tables module reference in nft_compat on error, from
    Liping Zhang.

18) Add a reference count to the x_tables extensions cache in
    nft_compat, so we can remove them when unused and avoid a crash
    if the extensions are rmmod, again from Zhang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 22:02:36 -07:00
David S. Miller
de0ba9a0d8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just several instances of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 00:53:32 -04:00
Linus Torvalds
107df03203 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix memory leak in nftables, from Liping Zhang.

 2) Need to check result of vlan_insert_tag() in batman-adv otherwise we
    risk NULL skb derefs, from Sven Eckelmann.

 3) Check for dev_alloc_skb() failures in cfg80211, from Gregory
    Greenman.

 4) Handle properly when we have ppp_unregister_channel() happening in
    parallel with ppp_connect_channel(), from WANG Cong.

 5) Fix DCCP deadlock, from Eric Dumazet.

 6) Bail out properly in UDP if sk_filter() truncates the packet to be
    smaller than even the space that the protocol headers need.  From
    Michal Kubecek.

 7) Similarly for rose, dccp, and sctp, from Willem de Bruijn.

 8) Make TCP challenge ACKs less predictable, from Eric Dumazet.

 9) Fix infinite loop in bgmac_dma_tx_add() from Florian Fainelli.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
  packet: propagate sock_cmsg_send() error
  net/mlx5e: Fix del vxlan port command buffer memset
  packet: fix second argument of sock_tx_timestamp()
  net: switchdev: change ageing_time type to clock_t
  Update maintainer for EHEA driver.
  net/mlx4_en: Add resilience in low memory systems
  net/mlx4_en: Move filters cleanup to a proper location
  sctp: load transport header after sk_filter
  net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int
  net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata
  net: nb8800: Fix SKB leak in nb8800_receive()
  et131x: Fix logical vs bitwise check in et131x_tx_timeout()
  vlan: use a valid default mtu value for vlan over macsec
  net: bgmac: Fix infinite loop in bgmac_dma_tx_add()
  mlxsw: spectrum: Prevent invalid ingress buffer mapping
  mlxsw: spectrum: Prevent overwrite of DCB capability fields
  mlxsw: spectrum: Don't emit errors when PFC is disabled
  mlxsw: spectrum: Indicate support for autonegotiation
  mlxsw: spectrum: Force link training according to admin state
  r8152: add MODULE_VERSION
  ...
2016-07-23 15:44:31 +09:00
Andrey Ryabinin
3cb9185c67 radix-tree: fix radix_tree_iter_retry() for tagged iterators.
radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
leading to crash:

  RIP: radix_tree_next_slot include/linux/radix-tree.h:473
    find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
  ....
  Call Trace:
    pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
    mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
    ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
    do_writepages+0x97/0x100 mm/page-writeback.c:2364
    __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
    filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
    ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
    vfs_fsync_range+0x10a/0x250 fs/sync.c:195
    vfs_fsync fs/sync.c:209
    do_fsync+0x42/0x70 fs/sync.c:219
    SYSC_fdatasync fs/sync.c:232
    SyS_fdatasync+0x19/0x20 fs/sync.c:230
    entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207

We must reset iterator's tags to bail out from radix_tree_next_slot()
and go to the slow-path in radix_tree_next_chunk().

Fixes: 46437f9a55 ("radix-tree: fix race in gang lookup")
Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-23 10:25:54 +09:00
Johannes Weiner
73f576c04b mm: memcontrol: fix cgroup creation failure after many small jobs
The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears.  At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild.  Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs.  Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.

Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.

Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later.  They pose no hurdle.

Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages.  And those
references are under the user's control, so they are manageable.

This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that.  This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.

This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:

  set -e
  mkdir -p pages
  for x in `seq 128000`; do
    [ $((x % 1000)) -eq 0 ] && echo $x
    mkdir /cgroup/foo
    echo $$ >/cgroup/foo/cgroup.procs
    echo trex >pages/$x
    echo $$ >/cgroup/cgroup.procs
    rmdir /cgroup/foo
  done

When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:

  [root@ham ~]# ./cssidstress.sh
  [...]
  65000
  mkdir: cannot create directory '/cgroup/foo': No space left on device

After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.

[hannes@cmpxchg.org: init the IDR]
  Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
Fixes: b2052564e6 ("mm: memcontrol: continue cache reclaim from offlined groups")
Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: John Garcia <john.garcia@mesosphere.io>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Nikolay Borisov <kernel@kyup.com>
Cc: <stable@vger.kernel.org>	[3.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-23 10:25:54 +09:00
Florian Westphal
857ed310c0 netfilter: connlabels: move set helper to xt_connlabel
xt_connlabel is the only user so move it.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-22 17:05:10 +02:00
Florian Westphal
23014011ba netfilter: conntrack: support a fixed size of 128 distinct labels
The conntrack label extension is currently variable-sized, e.g. if
only 2 labels are used by iptables rules then the labels->bits[] array
will only contain one element.

We track size of each label storage area in the 'words' member.

But in nftables and openvswitch we always have to ask for worst-case
since we don't know what bit will be used at configuration time.

As most arches are 64bit we need to allocate 24 bytes in this case:

struct nf_conn_labels {
    u8            words;   /*     0     1 */
    /* XXX 7 bytes hole, try to pack */
    long unsigned bits[2]; /*     8     24 */

Make bits a fixed size and drop the words member, it simplifies
the code and only increases memory requirements on x86 when
less than 64bit labels are required.

We still only allocate the extension if its needed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-22 17:04:55 +02:00
Paolo Abeni
5f652bb2eb gro_cells: gro_cells_receive now return error code
so that the caller can update stats accordingly, if needed

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-22 00:50:41 -04:00
David S. Miller
276b8c77c3 Merge tag 'nfc-next-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz says:

====================
NFC 4.8 pull request

This is the first NFC pull request for 4.8. We have:

- A fairly large NFC digital stack patchset:
  * RTOX fixes.
  * Proper DEP RWT support.
  * ACK and NACK PDUs handling fixes, in both initiator
    and target modes.
  * A few memory leak fixes.

- A conversion of the nfcsim driver to use the digital stack.
  The driver supports the DEP protocol in both NFC-A and NFC-F.

- Error injection through debugfs for the nfcsim driver.

- Improvements to the port100 driver for the Sony USB chipset, in
  particular to the command abort and cancellation code paths.

- A few minor fixes for the pn533, trf7970a and fdp drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 23:39:36 -07:00
Gao Feng
82de0be686 netfilter: Add helper array register/unregister functions
Add nf_ct_helper_init(), nf_conntrack_helpers_register() and
nf_conntrack_helpers_unregister() functions to avoid repetitive
opencoded initialization in helpers.

This patch keeps an id parameter for nf_ct_helper_init() not to break
helper matching by name that has been inconsistently exposed to
userspace through ports, eg. ftp-2121, and through an incremental id,
eg. tftp-1.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-21 02:31:53 +02:00
David S. Miller
4599f772e7 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2016-07-19

Here's likely the last bluetooth-next pull request for the 4.8 kernel:

 - Fix for L2CAP setsockopt
 - Fix for is_suspending flag handling in btmrvl driver
 - Addition of Bluetooth HW & FW info fields to debugfs
 - Fix to use int instead of char for callback status.

The last one (from Geert Uytterhoeven) is actually not purely a
Bluetooth (or 802.15.4) patch, but it was agreed with other maintainers
that we take it through the bluetooth-next tree.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 14:42:28 -07:00
Daniel Borkmann
b02b94b331 bpf, elf: add official ELF machine define for eBPF
Add the official BPF ELF e_machine value that was assigned recently [1,2]
and will be propagated to glibc, et al. LLVM is switching to it in 3.9
release.

  [1] 36b9c09330
  [2] http://lists.iovisor.org/pipermail/iovisor-dev/2016-June/000266.html

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 12:37:39 -07:00
Brenden Blanco
cc2e0b3fbc bpf: fix implicit declaration of bpf_prog_add
For the ifndef case of CONFIG_BPF_SYSCALL, an inline version of
bpf_prog_add needs to exist otherwise the build breaks on some configs.

 drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2544:10: error: implicit declaration of function 'bpf_prog_add'
       prog = bpf_prog_add(prog, priv->rx_ring_num - 1);

The function is introduced in
59d3656d5b ("bpf: add bpf_prog_add api for bulk prog refcnt")
and first used in
47f1afdba2b87 ("net/mlx4_en: add support for fast rx drop bpf program").

Fixes: 47f1afdba2b87 ("net/mlx4_en: add support for fast rx drop bpf program")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Tariq Toukan <ttoukan.linux@gmail.com>
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 11:59:21 -07:00
Brenden Blanco
224e92e02a net/mlx4_en: break out tx_desc write into separate function
In preparation for writing the tx descriptor from multiple functions,
create a helper for both normal and blueflame access.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:33 -07:00
Brenden Blanco
6ce96ca348 bpf: add XDP_TX xdp_action for direct forwarding
XDP enabled drivers must transmit received packets back out on the same
port they were received on when a program returns this action.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:32 -07:00
Brenden Blanco
d1fdd91386 rtnl: add option for setting link xdp prog
Sets the bpf program represented by fd as an early filter in the rx path
of the netdev. The fd must have been created as BPF_PROG_TYPE_XDP.
Providing a negative value as fd clears the program. Getting the fd back
via rtnl is not possible, therefore reading of this value merely
provides a bool whether the program is valid on the link or not.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:32 -07:00
Brenden Blanco
a7862b4584 net: add ndo to setup/query xdp prog in adapter rx
Add one new netdev op for drivers implementing the BPF_PROG_TYPE_XDP
filter. The single op is used for both setup/query of the xdp program,
modelled after ndo_setup_tc.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:31 -07:00
Brenden Blanco
6a773a15a1 bpf: add XDP prog type for early driver filter
Add a new bpf prog type that is intended to run in early stages of the
packet rx path. Only minimal packet metadata will be available, hence a
new context type, struct xdp_md, is exposed to userspace. So far only
expose the packet start and end pointers, and only in read mode.

An XDP program must return one of the well known enum values, all other
return codes are reserved for future use. Unfortunately, this
restriction is hard to enforce at verification time, so take the
approach of warning at runtime when such programs are encountered. Out
of bounds return codes should alias to XDP_ABORTED.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:31 -07:00
Brenden Blanco
59d3656d5b bpf: add bpf_prog_add api for bulk prog refcnt
A subsystem may need to store many copies of a bpf program, each
deserving its own reference. Rather than requiring the caller to loop
one by one (with possible mid-loop failure), add a bulk bpf_prog_add
api.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:31 -07:00
Gavin Shan
e6f44ed6d0 net/ncsi: Package and channel management
This manages NCSI packages and channels:

 * The available packages and channels are enumerated in the first
   time of calling ncsi_start_dev(). The channels' capabilities are
   probed in the meanwhile. The NCSI network topology won't change
   until the NCSI device is destroyed.
 * There in a queue in every NCSI device. The element in the queue,
   channel, is waiting for configuration (bringup) or suspending
   (teardown). The channel's state (inactive/active) indicates the
   futher action (configuration or suspending) will be applied on the
   channel. Another channel's state (invisible) means the requested
   action is being applied.
 * The hardware arbitration will be enabled if all available packages
   and channels support it. All available channels try to provide
   service when hardware arbitration is enabled. Otherwise, one channel
   is selected as the active one at once.
 * When channel is in active state, meaning it's providing service, a
   timer started to retrieve the channe's link status. If the channel's
   link status fails to be updated in the determined period, the channel
   is going to be reconfigured. It's the error handling implementation
   as defined in NCSI spec.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 20:49:17 -07:00
Gavin Shan
6389eaa7fa net/ncsi: NCSI command packet handler
The NCSI command packets are sent from MC (Management Controller)
to remote end. They are used for multiple purposes: probe existing
NCSI package/channel, retrieve NCSI channel's capability, configure
NCSI channel etc.

This defines struct to represent NCSI command packets and introduces
function ncsi_xmit_cmd(), which will be used to transmit NCSI command
packet according to the request. The request is represented by struct
ncsi_cmd_arg.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 20:49:16 -07:00
Gavin Shan
2d283bdd07 net/ncsi: Resource management
NCSI spec (DSP0222) defines several objects: package, channel, mode,
filter, version and statistics etc. This introduces the data structs
to represent those objects and implement functions to manage them.
Also, this introduces CONFIG_NET_NCSI for the newly implemented NCSI
stack.

   * The user (e.g. netdev driver) dereference NCSI device by
     "struct ncsi_dev", which is embedded to "struct ncsi_dev_priv".
     The later one is used by NCSI stack internally.
   * Every NCSI device can have multiple packages simultaneously, up
     to 8 packages. It's represented by "struct ncsi_package" and
     identified by 3-bits ID.
   * Every NCSI package can have multiple channels, up to 32. It's
     represented by "struct ncsi_channel" and identified by 5-bits ID.
   * Every NCSI channel has version, statistics, various modes and
     filters. They are represented by "struct ncsi_channel_version",
     "struct ncsi_channel_stats", "struct ncsi_channel_mode" and
     "struct ncsi_channel_filter" separately.
   * Apart from AEN (Asynchronous Event Notification), the NCSI stack
     works in terms of command and response. This introduces "struct
     ncsi_req" to represent a complete NCSI transaction made of NCSI
     request and response.

link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 20:49:16 -07:00
Vivien Didelot
34a79f63bb net: dsa: support switchdev ageing time attr
Add a new function for DSA drivers to handle the switchdev
SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute.

The ageing time is passed as milliseconds.

Also because we can have multiple logical bridges on top of a physical
switch and ageing time are switch-wide, call the driver function with
the fastest ageing time in use on the chip instead of the requested one.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 19:42:01 -07:00
Vivien Didelot
eabfdda934 net: switchdev: change ageing_time type to clock_t
The switchdev value for the SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME
attribute is a clock_t and requires to use helpers such as
clock_t_to_jiffies() to convert to milliseconds.

Change ageing_time type from u32 to clock_t to make it explicit.

Fixes: f55ac58ae6 ("switchdev: add bridge ageing_time attribute")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 16:49:20 -07:00
Shmulik Ladkani
359ebda25a net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags
This flag indicates whether fragmentation of segments is allowed.

Formerly this policy was hardcoded according to IPSKB_FORWARDED (set by
either ip_forward or ipmr_forward).

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 16:40:22 -07:00
Florian Westphal
f4dc77713f netfilter: x_tables: speed up jump target validation
The dummy ruleset I used to test the original validation change was broken,
most rules were unreachable and were not tested by mark_source_chains().

In some cases rulesets that used to load in a few seconds now require
several minutes.

sample ruleset that shows the behaviour:

echo "*filter"
for i in $(seq 0 100000);do
        printf ":chain_%06x - [0:0]\n" $i
done
for i in $(seq 0 100000);do
   printf -- "-A INPUT -j chain_%06x\n" $i
   printf -- "-A INPUT -j chain_%06x\n" $i
   printf -- "-A INPUT -j chain_%06x\n" $i
done
echo COMMIT

[ pipe result into iptables-restore ]

This ruleset will be about 74mbyte in size, with ~500k searches
though all 500k[1] rule entries. iptables-restore will take forever
(gave up after 10 minutes)

Instead of always searching the entire blob for a match, fill an
array with the start offsets of every single ipt_entry struct,
then do a binary search to check if the jump target is present or not.

After this change ruleset restore times get again close to what one
gets when reverting 3647234101 (~3 seconds on my workstation).

[1] every user-defined rule gets an implicit RETURN, so we get
300k jumps + 100k userchains + 100k returns -> 500k rule entries

Fixes: 3647234101 ("netfilter: x_tables: validate targets of jumps")
Reported-by: Jeff Wu <wujiafu@gmail.com>
Tested-by: Jeff Wu <wujiafu@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-18 21:35:23 +02:00
Marcel Holtmann
5177a83827 Bluetooth: Add debugfs fields for hardware and firmware info
Some Bluetooth controllers allow for reading hardware and firmware
related vendor specific infos. If they are available, then they can be
exposed via debugfs now.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2016-07-18 09:33:28 +03:00
Marcel Holtmann
f962fe32f2 Bluetooth: Move hci_recv_frame and hci_recv_diag prototypes
The protoypes for hci_recv_frame and hci_recv_diag are in the wrong
location in the header file. Move them close to all the other hci_dev
related exported functions.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2016-07-18 09:33:24 +03:00
Geert Uytterhoeven
4360fa22ad drivers: misc: ti-st: Use int instead of fuzzy char for callback status
On mips and parisc:

    drivers/bluetooth/btwilink.c: In function 'ti_st_open':
    drivers/bluetooth/btwilink.c:174:21: warning: overflow in implicit constant conversion [-Woverflow]
       hst->reg_status = -EINPROGRESS;

    drivers/nfc/nfcwilink.c: In function 'nfcwilink_open':
    drivers/nfc/nfcwilink.c:396:31: warning: overflow in implicit constant conversion [-Woverflow]
      drv->st_register_cb_status = -EINPROGRESS;

There are actually two issues:
  1. Whether "char" is signed or unsigned depends on the architecture.
     As the completion callback data is used to pass a (negative) error
     code, it should always be signed.
  2. EINPROGRESS is 150 on mips, 245 on parisc.
     Hence -EINPROGRESS doesn't fit in a signed 8-bit number.

Change the callback status from "char" to "int" to fix these.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-07-17 19:59:26 +02:00
Dongpo Li
b786241253 of_mdio: Abstract a general interface for phy connect
Abstract a general interface "of_phy_get_and_connect"
for PHY connect. User will have no bother with getting
"phy-mode" and "phy-handle" any more.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Reviewed-by: Jiancheng Xue <xuejiancheng@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16 21:32:58 -07:00
Nikolay Aleksandrov
43b9e12740 net: ipmr/ip6mr: add support for keeping an entry age
In preparation for hardware offloading of ipmr/ip6mr we need an
interface that allows to check (and later update) the age of entries.
Relying on stats alone can show activity but not actual age of the entry,
furthermore when there're tens of thousands of entries a lot of the
hardware implementations only support "hit" bits which are cleared on
read to denote that the entry was active and shouldn't be aged out,
these can then be naturally translated into age timestamp and will be
compatible with the software forwarding age. Using a lastuse entry doesn't
affect performance because the members in that cache line are written to
along with the age.
Since all new users are encouraged to use ipmr via netlink, this is
exported via the RTA_EXPIRES attribute.
Also do a minor local variable declaration style adjustment - arrange them
longest to shortest.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Shrijeet Mukherjee <shm@cumulusnetworks.com>
CC: Satish Ashok <sashok@cumulusnetworks.com>
CC: Donald Sharp <sharpd@cumulusnetworks.com>
CC: David S. Miller <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16 20:19:43 -07:00
Paolo Abeni
18d3df3eab vlan: use a valid default mtu value for vlan over macsec
macsec can't cope with mtu frames which need vlan tag insertion, and
vlan device set the default mtu equal to the underlying dev's one.
By default vlan over macsec devices use invalid mtu, dropping
all the large packets.
This patch adds a netif helper to check if an upper vlan device
needs mtu reduction. The helper is used during vlan devices
initialization to set a valid default and during mtu updating to
forbid invalid, too bit, mtu values.
The helper currently only check if the lower dev is a macsec device,
if we get more users, we need to update only the helper (possibly
reserving an additional IFF bit).

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16 20:15:02 -07:00
Jesper Dangaard Brouer
02a198777e net: fixup for tracepoint napi:napi_poll
The recent change to tracepoint napi:napi_poll changed the order of
the parameters that perf scripts sees, the printk was correct.  The
problem was that the new parameters (work and budget) were pushed
in front of dev_name.

The new parameters obviously need to be appended to keep backward
compatible.

Fixes: 1db19db7f5 ("net: tracepoint napi:napi_poll add work and budget")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 15:55:01 -07:00
Linus Torvalds
631517032f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "A few last-minute updates for the input subsystem"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ts4800-ts - add missing of_node_put after calling of_parse_phandle
  Input: synaptics-rmi4 - use of_get_child_by_name() to fix refcount
  Revert "Input: wacom_w8001 - drop use of ABS_MT_TOOL_TYPE"
  Input: xpad - validate USB endpoint count during probe
  Input: add SW_PEN_INSERTED define
2016-07-16 07:04:12 +09:00
Daniel Borkmann
555c8a8623 bpf: avoid stack copy and use skb ctx for event output
This work addresses a couple of issues bpf_skb_event_output()
helper currently has: i) We need two copies instead of just a
single one for the skb data when it should be part of a sample.
The data can be non-linear and thus needs to be extracted via
bpf_skb_load_bytes() helper first, and then copied once again
into the ring buffer slot. ii) Since bpf_skb_load_bytes()
currently needs to be used first, the helper needs to see a
constant size on the passed stack buffer to make sure BPF
verifier can do sanity checks on it during verification time.
Thus, just passing skb->len (or any other non-constant value)
wouldn't work, but changing bpf_skb_load_bytes() is also not
the proper solution, since the two copies are generally still
needed. iii) bpf_skb_load_bytes() is just for rather small
buffers like headers, since they need to sit on the limited
BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
this work improves the bpf_skb_event_output() helper to address
all 3 at once.

We can make use of the passed in skb context that we have in
the helper anyway, and use some of the reserved flag bits as
a length argument. The helper will use the new __output_custom()
facility from perf side with bpf_skb_copy() as callback helper
to walk and extract the data. It will pass the data for setup
to bpf_event_output(), which generates and pushes the raw record
with an additional frag part. The linear data used in the first
frag of the record serves as programmatically defined meta data
passed along with the appended sample.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:23:56 -07:00
Daniel Borkmann
7e3f977edd perf, events: add non-linear data support for raw records
This patch adds support for non-linear data on raw records. It
extends raw records to have one or multiple fragments that will
be written linearly into the ring slot, where each fragment can
optionally have a custom callback handler to walk and extract
complex, possibly non-linear data.

If a callback handler is provided for a fragment, then the new
__output_custom() will be used instead of __output_copy() for
the perf_output_sample() part. perf_prepare_sample() does all
the size calculation only once, so perf_output_sample() doesn't
need to redo the same work anymore, meaning real_size and padding
will be cached in the raw record. The raw record becomes 32 bytes
in size without holes; to not increase it further and to avoid
doing unnecessary recalculations in fast-path, we can reuse
next pointer of the last fragment, idea here is borrowed from
ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
path for PERF_SAMPLE_RAW minimal.

This facility is needed for BPF's event output helper as a first
user that will, in a follow-up, add an additional perf_raw_frag
to its perf_raw_record in order to be able to more efficiently
dump skb context after a linear head meta data related to it.
skbs can be non-linear and thus need a custom output function to
dump buffers. Currently, the skb data needs to be copied twice;
with the help of __output_custom() this work only needs to be
done once. Future users could be things like XDP/BPF programs
that work on different context though and would thus also have
a different callback function.

The few users of raw records are adapted to initialize their frag
data from the raw record itself, no change in behavior for them.
The code is based upon a PoC diff provided by Peter Zijlstra [1].

  [1] http://thread.gmane.org/gmane.linux.network/421294

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:23:56 -07:00
Philipp Zabel
77501a79ce net: phy: micrel: Add KSZ8041FTL fiber mode support
We can't detect the FXEN (fiber mode) bootstrap pin, so configure
it via a boolean device tree property "micrel,fiber-mode".
If it is enabled, auto-negotiation is not supported.
The only available modes are 100base-fx (full duplex and half duplex).

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 11:53:02 -07:00
Linus Torvalds
fa3a9f5744 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "20 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  m32r: fix build warning about putc
  mm: workingset: printk missing log level, use pr_info()
  mm: thp: refix false positive BUG in page_move_anon_rmap()
  mm: rmap: call page_check_address() with sync enabled to avoid racy check
  mm: thp: move pmd check inside ptl for freeze_page()
  vmlinux.lds: account for destructor sections
  gcov: add support for gcc version >= 6
  mm, meminit: ensure node is online before checking whether pages are uninitialised
  mm, meminit: always return a valid node from early_pfn_to_nid
  kasan/quarantine: fix bugs on qlist_move_cache()
  uapi: export lirc.h header
  madvise_free, thp: fix madvise_free_huge_pmd return value after splitting
  Revert "scripts/gdb: add documentation example for radix tree"
  Revert "scripts/gdb: add a Radix Tree Parser"
  scripts/gdb: Perform path expansion to lx-symbol's arguments
  scripts/gdb: add constants.py to .gitignore
  scripts/gdb: rebuild constants.py on dependancy change
  scripts/gdb: silence 'nothing to do' message
  kasan: add newline to messages
  mm, compaction: prevent VM_BUG_ON when terminating freeing scanner
2016-07-15 16:00:18 +09:00
Hugh Dickins
5a49973d71 mm: thp: refix false positive BUG in page_move_anon_rmap()
The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's
worth: the syzkaller fuzzer hit it again.  It's still wrong for some THP
cases, because linear_page_index() was never intended to apply to
addresses before the start of a vma.

That's easily fixed with a signed long cast inside linear_page_index();
and Dmitry has tested such a patch, to verify the false positive.  But
why extend linear_page_index() just for this case? when the avoidance in
page_move_anon_rmap() has already grown ugly, and there's no reason for
the check at all (nothing else there is using address or index).

Remove address arg from page_move_anon_rmap(), remove VM_BUG_ON_PAGE,
remove CONFIG_DEBUG_VM PageTransHuge adjustment.

And one more thing: should the compound_head(page) be done inside or
outside page_move_anon_rmap()? It's usually pushed down to the lowest
level nowadays (and mm/memory.c shows no other explicit use of it), so I
think it's better done in page_move_anon_rmap() than by caller.

Fixes: 0798d3c022 ("mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()")
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1607120444540.12528@eggly.anvils
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>	[4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-15 14:54:27 +09:00
Naoya Horiguchi
33f4751e99 mm: thp: move pmd check inside ptl for freeze_page()
I found a race condition triggering VM_BUG_ON() in freeze_page(), when
running a testcase with 3 processes:
  - process 1: keep writing thp,
  - process 2: keep clearing soft-dirty bits from virtual address of process 1
  - process 3: call migratepages for process 1,

The kernel message is like this:

  kernel BUG at /src/linux-dev/mm/huge_memory.c:3096!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: cfg80211 rfkill crc32c_intel ppdev serio_raw pcspkr virtio_balloon virtio_console parport_pc parport pvpanic acpi_cpufreq tpm_tis tpm i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy virtio_pci virtio_ring virtio
  CPU: 0 PID: 28863 Comm: migratepages Not tainted 4.6.0-v4.6-160602-0827-+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff880037320000 ti: ffff88007cdd0000 task.ti: ffff88007cdd0000
  RIP: 0010:[<ffffffff811f8e06>]  [<ffffffff811f8e06>] split_huge_page_to_list+0x496/0x590
  RSP: 0018:ffff88007cdd3b70  EFLAGS: 00010202
  RAX: 0000000000000001 RBX: ffff88007c7b88c0 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000700000200 RDI: ffffea0003188000
  RBP: ffff88007cdd3bb8 R08: 0000000000000001 R09: 00003ffffffff000
  R10: ffff880000000000 R11: ffffc000001fffff R12: ffffea0003188000
  R13: ffffea0003188000 R14: 0000000000000000 R15: 0400000000000080
  FS:  00007f8ec241d740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000             CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f8ec1f3ed20 CR3: 000000003707b000 CR4: 00000000000006f0
  Call Trace:
    ? list_del+0xd/0x30
    queue_pages_pte_range+0x4d1/0x590
    __walk_page_range+0x204/0x4e0
    walk_page_range+0x71/0xf0
    queue_pages_range+0x75/0x90
    ? queue_pages_hugetlb+0x190/0x190
    ? new_node_page+0xc0/0xc0
    ? change_prot_numa+0x40/0x40
    migrate_to_node+0x71/0xd0
    do_migrate_pages+0x1c3/0x210
    SyS_migrate_pages+0x261/0x290
    entry_SYSCALL_64_fastpath+0x1a/0xa4
  Code: e8 b0 87 fb ff 0f 0b 48 c7 c6 30 32 9f 81 e8 a2 87 fb ff 0f 0b 48 c7 c6 b8 46 9f 81 e8 94 87 fb ff 0f 0b 85 c0 0f 84 3e fd ff ff <0f> 0b 85 c0 0f 85 a6 00 00 00 48 8b 75 c0 4c 89 f7 41 be f0 ff
  RIP   split_huge_page_to_list+0x496/0x590

I'm not sure of the full scenario of the reproduction, but my debug
showed that split_huge_pmd_address(freeze=true) returned without running
main code of pmd splitting because pmd_present(*pmd) in precheck somehow
returned 0.  If this happens, the subsequent try_to_unmap() fails and
returns non-zero (because page_mapcount() still > 0), and finally
VM_BUG_ON() fires.  This patch tries to fix it by prechecking pmd state
inside ptl.

Link: http://lkml.kernel.org/r/1466990929-7452-1-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-15 14:54:27 +09:00
Dmitry Vyukov
e41f501d39 vmlinux.lds: account for destructor sections
If CONFIG_KASAN is enabled and gcc is configured with
--disable-initfini-array and/or gold linker is used, gcc emits
.ctors/.dtors and .text.startup/.text.exit sections instead of
.init_array/.fini_array.  .dtors section is not explicitly accounted in
the linker script and messes vvar/percpu layout.

We want:
  ffffffff822bfd80 D _edata
  ffffffff822c0000 D __vvar_beginning_hack
  ffffffff822c0000 A __vvar_page
  ffffffff822c0080 0000000000000098 D vsyscall_gtod_data
  ffffffff822c1000 A __init_begin
  ffffffff822c1000 D init_per_cpu__irq_stack_union
  ffffffff822c1000 A __per_cpu_load
  ffffffff822d3000 D init_per_cpu__gdt_page

We got:
  ffffffff8279a600 D _edata
  ffffffff8279b000 A __vvar_page
  ffffffff8279c000 A __init_begin
  ffffffff8279c000 D init_per_cpu__irq_stack_union
  ffffffff8279c000 A __per_cpu_load
  ffffffff8279e000 D __vvar_beginning_hack
  ffffffff8279e080 0000000000000098 D vsyscall_gtod_data
  ffffffff827ae000 D init_per_cpu__gdt_page

This happens because __vvar_page and .vvar get different addresses in
arch/x86/kernel/vmlinux.lds.S:

	. = ALIGN(PAGE_SIZE);
	__vvar_page = .;

	.vvar : AT(ADDR(.vvar) - LOAD_OFFSET) {
		/* work around gold bug 13023 */
		__vvar_beginning_hack = .;

Discard .dtors/.fini_array/.text.exit, since we don't call dtors.
Merge .text.startup into init text.

Link: http://lkml.kernel.org/r/1467386363-120030-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-15 14:54:27 +09:00
Mauro Carvalho Chehab
12cb22bb8a uapi: export lirc.h header
This header contains the userspace API for lirc.

This is a fixup for commit b7be755733 ("[media] bz#75751: Move
internal header file lirc.h to uapi/").  It moved the header to the
right place, but it forgot to add it at Kbuild.  So, despite being at
uapi, it is not copied to the right place.

Fixes: b7be755733 ("[media] bz#75751: Move internal header file lirc.h to uapi/")
Link: http://lkml.kernel.org/r/320c765d32bfc82c582e336d52ffe1026c73c644.1468439021.git.mchehab@s-opensource.com
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Alec Leamas <leamas.alec@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-15 14:54:27 +09:00
Arnd Bergmann
caeccd5180 devlink: fix trace format string
Including devlink.h on ARM and probably other 32-bit architectures results in
a harmless warning:

In file included from ../include/trace/define_trace.h:95:0,
                 from ../include/trace/events/devlink.h:51,
                 from ../net/core/devlink.c:30:
include/trace/events/devlink.h: In function 'trace_raw_output_devlink_hwmsg':
include/trace/events/devlink.h:42:12: error: format '%lu' expects argument of type 'long unsigned int', but argument 10 has type 'size_t {aka unsigned int}' [-Werror=format=]

The correct format string for 'size_t' is %zu, not %lu, this works on all
architectures.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: e5224f0fe2 ("devlink: add hardware messages tracing facility")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 22:16:05 -07:00
Jiri Pirko
0e1824c98a tracing: change owner name to driver name for devlink hwmsg tracepoint
Turned on that driver->owner which is struct module is not available when
modules are disabled. Better to depend on a driver name which is
always available.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: e5224f0fe2 ("devlink: add hardware messages tracing facility")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 22:16:05 -07:00
Dave Airlie
d2e1204f89 Merge branch 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux into drm-fixes
A bunch of vmwgfx fixes that fix a black screen issue on latest distros/hw combos.

* 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux:
  drm/vmwgfx: Fix error paths when mapping framebuffer
  drm/vmwgfx: Fix corner case screen target management
  drm/vmwgfx: Delay pinning fbdev framebuffer until after mode set
  drm/vmwgfx: Check pin count before attempting to move a buffer
  drm/ttm: Make ttm_bo_mem_compat available
  drm/vmwgfx: Add an option to change assumed FB bpp
  drm/vmwgfx: Work around mode set failure in 2D VMs
  drm/vmwgfx: Add a check to handle host message failure
2016-07-15 13:51:55 +10:00
Or Gerlitz
8438884d4a net/switchdev: Export the same parent ID service function
This helper serves to know if two switchdev port netdevices belong to the
same HW ASIC, e.g to figure out if forwarding offload is possible between them.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:29 -07:00
Amir Vadai
a351a1b03b net/mlx5: Introduce bulk reading of flow counters
This commit utilize the ability of ConnectX-4 to bulk read flow counters.
Few bulk counter queries could be done instead of issuing thousands of
firmware commands per second to get statistics of all flows set to HW,
such as those programmed when we offload tc filters.

Counters are stored sorted by hardware id, and queried in blocks (id +
number of counters).

Due to hardware requirement, start of block and number of counters in a
block must be four aligned.

Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:28 -07:00
Amir Vadai
29cc667907 net/mlx5: Store counters in rbtree instead of list
In order to use bulk counters, we need to have counters sorted by id.

Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:28 -07:00
Marcelo Ricardo Leitner
e7487c86dc sctp: avoid identifying address family many times for a chunk
Identifying address family operations during rx path is not something
expensive but it's ugly to the eye to have it done multiple times,
specially when we already validated it during initial rx processing.

This patch takes advantage of the now shared sctp_input_cb and make the
pointer to the operations readily available.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13 18:10:14 -07:00