Commit Graph

82326 Commits

Author SHA1 Message Date
Jiri Benc
54bfd872bf vxlan: keep flags and vni in network byte order
Prevent repeated conversions from and to network order in the fast path.

To achieve this, define all flag constants in big endian order and store VNI
as __be32. To prevent confusion between the actual VNI value and the VNI
field from the header (which contains additional reserved byte), strictly
distinguish between "vni" and "vni_field".

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 23:52:11 -05:00
Jiri Benc
d4ac05ff36 vxlan: introduce vxlan_hdr
Currently, pointer to the vxlan header is kept in a local variable. It has
to be reloaded whenever the pskb pull operations are performed which usually
happens somewhere deep in called functions.

Create a vxlan_hdr function and use it to reference the vxlan header
instead.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 23:52:11 -05:00
John Fastabend
e014860e31 net: pack tc_cls_u32_knode struct slighter better
By packing the structure we can remove a few holes as Jamal
suggests.

before:

struct tc_cls_u32_knode {
	struct tcf_exts *          exts;                 /*     0     8 */
	u8                         fshift;               /*     8     1 */

	/* XXX 3 bytes hole, try to pack */

	u32                        handle;               /*    12     4 */
	u32                        val;                  /*    16     4 */
	u32                        mask;                 /*    20     4 */
	u32                        link_handle;          /*    24     4 */

	/* XXX 4 bytes hole, try to pack */

	struct tc_u32_sel *        sel;                  /*    32     8 */

	/* size: 40, cachelines: 1, members: 7 */
	/* sum members: 33, holes: 2, sum holes: 7 */
	/* last cacheline: 40 bytes */
};

after:

struct tc_cls_u32_knode {
	struct tcf_exts *          exts;                 /*     0     8 */
	struct tc_u32_sel *        sel;                  /*     8     8 */
	u32                        handle;               /*    16     4 */
	u32                        val;                  /*    20     4 */
	u32                        mask;                 /*    24     4 */
	u32                        link_handle;          /*    28     4 */
	u8                         fshift;               /*    32     1 */

	/* size: 40, cachelines: 1, members: 7 */
	/* padding: 7 */
	/* last cacheline: 40 bytes */
};

Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 21:44:12 -05:00
Bjorn Helgaas
67b4eab91c Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"
Revert 811a4e6fce ("PCI: Add helpers to manage pci_dev->irq and
pci_dev->irq_managed").

This is part of reverting 991de2e590 ("PCI, x86: Implement
pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
introduced.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e590 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
CC: Jiang Liu <jiang.liu@linux.intel.com>
2016-02-17 17:23:36 -06:00
Ben Hutchings
7bbf3cae65 ipv4: Remove inet_lro library
There are no longer any in-tree drivers that use it.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 16:15:46 -05:00
Jessica Yu
7dcd182bec ftrace/module: remove ftrace module notifier
Remove the ftrace module notifier in favor of directly calling
ftrace_module_enable() and ftrace_release_mod() in the module loader.
Hard-coding the function calls directly in the module loader removes
dependence on the module notifier call chain and provides better
visibility and control over what gets called when, which is important
to kernel utilities such as livepatch.

This fixes a notifier ordering issue in which the ftrace module notifier
(and hence ftrace_module_enable()) for coming modules was being called
after klp_module_notify(), which caused livepatch modules to initialize
incorrectly. This patch removes dependence on the module notifier call
chain in favor of hard coding the corresponding function calls in the
module loader. This ensures that ftrace and livepatch code get called in
the correct order on patch module load and unload.

Fixes: 5156dca34a ("ftrace: Fix the race between ftrace and insmod")
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-02-17 22:14:06 +01:00
William Breathitt Gray
3347a0656e iio: Fix typos in the struct iio_event_spec documentation comments
This patch fixes a few minor typos in the documentation comments for the
scan_type member of the iio_event_spec structure. The sign member name
was improperly capitalized as "Sign" in the comments. The storagebits
member name was improperly listed as "storage_bits" in the comments. The
endianness member entry in the comments was moved after the repeat
member entry in order to maintain consistency with the actual struct
iio_event_spec layout.

Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2016-02-17 21:11:08 +00:00
Yuval Mintz
fc48b7a614 qed/qede: use 8.7.3.0 FW.
This patch moves the qed* driver into utilizing the 8.7.3.0 FW.
This new FW is required for a lot of new SW features, including:
  - Vlan filtering offload
  - Encapsulation offload support
  - HW ingress aggregations
As well as paving the way for the possibility of adding storage protocols
in the future.

V2:
 - Fix kbuild test robot error/warnings.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 16:04:42 -05:00
Xin Long
1cd4d5c432 sctp: remove the unused sctp_datamsg_free()
Since commit 8b570dc9f7 ("sctp: only drop the reference on the datamsg
after sending a msg") used sctp_datamsg_put in sctp_sendmsg, instead of
sctp_datamsg_free, this function has no use in sctp.

So we will remove it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 15:41:54 -05:00
Linus Torvalds
2850713576 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A collection of fixes from the past few weeks that should go into 4.5.
  This contains:

   - Overflow fix for sysfs discard show function from Alan.

   - A stacking limit init fix for max_dev_sectors, so we don't end up
     artificially capping some use cases.  From Keith.

   - Have blk-mq proper end unstarted requests on a dying queue, instead
     of pushing that to the driver.  From Keith.

   - NVMe:
        - Update to Kconfig description for NVME_SCSI, since it was
          vague and having it on is important for some SUSE distros.
          From Christoph.
        - Set of fixes from Keith, around surprise removal. Also kills
          the no-merge flag, so it supports merging.

   - Set of fixes for lightnvm from Matias, Javier, and Wenwei.

   - Fix null_blk oops when asked for lightnvm, but not available.  From
     Matias.

   - Copy-to-user EINTR fix from Hannes, fixing a case where SG_IO fails
     if interrupted by a signal.

   - Two floppy fixes from Jiri, fixing signal handling and blocking
     open.

   - A use-after-free fix for O_DIRECT, from Mike Krinkin.

   - A block module ref count fix from Roman Pen.

   - An fs IO wait accounting fix for O_DSYNC from Stephane Gasparini.

   - Smaller reallo fix for xen-blkfront from Bob Liu.

   - Removal of an unused struct member in the deadline IO scheduler,
     from Tahsin.

   - Also from Tahsin, properly initialize inode struct members
     associated with cgroup writeback, if enabled.

   - From Tejun, ensure that we keep the superblock pinned during cgroup
     writeback"

* 'for-linus' of git://git.kernel.dk/linux-block: (25 commits)
  blk: fix overflow in queue_discard_max_hw_show
  writeback: initialize inode members that track writeback history
  writeback: keep superblock pinned during cgroup writeback association switches
  bio: return EINTR if copying to user space got interrupted
  NVMe: Rate limit nvme IO warnings
  NVMe: Poll device while still active during remove
  NVMe: Requeue requests on suspended queues
  NVMe: Allow request merges
  NVMe: Fix io incapable return values
  blk-mq: End unstarted requests on dying queue
  block: Initialize max_dev_sectors to 0
  null_blk: oops when initializing without lightnvm
  block: fix module reference leak on put_disk() call for cgroups throttle
  nvme: fix Kconfig description for BLK_DEV_NVME_SCSI
  kernel/fs: fix I/O wait not accounted for RW O_DSYNC
  floppy: refactor open() flags handling
  lightnvm: allow to force mm initialization
  lightnvm: check overflow and correct mlc pairs
  lightnvm: fix request intersection locking in rrpc
  lightnvm: warn if irqs are disabled in lock laddr
  ...
2016-02-17 11:59:23 -08:00
Linus Torvalds
a9f70bd4e7 Merge tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "This includes two fixes.

  The first is something that has come up a few times and has been
  worked out individually, but it's come up now enough that the problem
  should be generic.  Tracepoints are protected by RCU sched.  There are
  several tracepoints within core infrastructure like kfree().  If a
  tracepoint is called when the CPU is going down, or when it's coming
  up but has yet to be recognized by RCU, a RCU warning is triggered.

  This is a true bug as that tracepoint is not protected by RCU.
  Usually, this is taken care of by testing for cpu online as a
  tracepoint condition.  But as this is happening more often, moving it
  from a individual tracepoint to a check in the tracepoint
  infrastructure is more robust.

  Note, there is now a duplicate of a cpu online test, because this
  update does not remove the individual checks.  But the overhead is
  small enough that the removal can be done in another release.

  The second change is strange linker breakage due to the branch
  tracer's builtin_constant_p() check failing, and treating the
  condition as a variable instead of a constant.  Arnd Bergmann found
  that this can be fixed by testing !!(cond) instead of just (cond)"

* tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix freak link error caused by branch tracer
  tracepoints: Do not trace when cpu is offline
2016-02-17 11:35:41 -08:00
Alexandre Belloni
af719c1807 clk: at91: pmc: drop at91_pmc_base
at91_pmc_base is not used anymore, remove it along with at91_pmc_read and
at91_pmc_write.

Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
2016-02-17 17:53:03 +01:00
Kinglong Mee
c89757061a pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
unreferenced object 0xffffc90000abf000 (size 16900):
  comm "fsync02", pid 15765, jiffies 4297431627 (age 423.772s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 a0 c2 19 00 88 ff ff  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8174d54e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811b9b91>] __vmalloc_node_range+0x231/0x280
    [<ffffffff811b9c2a>] __vmalloc+0x4a/0x50
    [<ffffffffa02c9ec1>] ext_tree_prepare_commit+0x231/0x2e0 [blocklayoutdriver]
    [<ffffffffa02c700e>] bl_prepare_layoutcommit+0xe/0x10 [blocklayoutdriver]
    [<ffffffffa0596a6c>] pnfs_layoutcommit_inode+0x29c/0x330 [nfsv4]
    [<ffffffffa0596b13>] pnfs_generic_sync+0x13/0x20 [nfsv4]
    [<ffffffffa0585188>] nfs4_file_fsync+0x58/0x150 [nfsv4]
    [<ffffffff81228e5b>] vfs_fsync_range+0x4b/0xb0
    [<ffffffff81228f1d>] do_fsync+0x3d/0x70
    [<ffffffff812291d0>] SyS_fsync+0x10/0x20
    [<ffffffff81757def>] entry_SYSCALL_64_fastpath+0x12/0x76
    [<ffffffffffffffff>] 0xffffffffffffffff

v2, add missing include header

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-17 11:44:45 -05:00
Huy Nguyen
85743f1eb3 net/mlx4_core: Set UAR page size to 4KB regardless of system page size
problem description:

The current code sets UAR page size equal to system page size.
The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages.
The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages.

solution:

Always set UAR page to 4KB. This allows more UAR pages if the OS
has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB
system page size, with 4MB uar region, there are 4MB/2/64KB = 32
uars (half for uar, half for blueflame). This does not meet minimum 128
UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars
which meet the minimum requirement.

Note that only codes in mlx4_core that deal with firmware know that uar
page size is 4KB. Codes that deal with usr page in cq and qp context
(mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption
that uar page size equals to system page size.

Note that with this implementation, on 64KB system page size kernel, there
are 16 uars per system page but only one uars is used. The other 15
uars are ignored because of the above assumption.

Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size
to 4KB and mlx4_core code in virtual OS will obtain the uar page size from
firmware.

Regarding backward compatibility in SR-IOV, if hypervisor has this new code,
the virtual OS must be updated. If hypervisor has old code, and the virtual
OS has this new code, the new code will be backward compatible with the
old code. If the uar size is big enough, this new code in VF continues to
work with 64 KB uar page size (on PowerPc kernel). If the uar size does not
meet 128 uars requirement, this new code not loaded in VF and print the same
error message as the old code in Hypervisor.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 10:29:27 -05:00
John Fastabend
3b01cf56da net: tc: helper functions to query action types
This is a helper function drivers can use to learn if the
action type is a drop action.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:36 -05:00
John Fastabend
1c78c64e9c net: add tc offload feature flag
Its useful to turn off the qdisc offload feature at a per device
level. This gives us a big hammer to enable/disable offloading.
More fine grained control (i.e. per rule) may be supported later.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:36 -05:00
John Fastabend
a1b7c5fd7f net: sched: add cls_u32 offload hooks for netdevs
This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.

This works aligns with how network drivers have been doing qdisc
offloads for mqprio.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:36 -05:00
John Fastabend
16e5cc6471 net: rework setup_tc ndo op to consume general tc operand
This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.

This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:35 -05:00
John Fastabend
e4c6734eaa net: rework ndo tc op to consume additional qdisc handle parameter
The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.

This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.

CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:35 -05:00
Boqun Feng
e1ab7f39d7 atomics: Allow architectures to define their own __atomic_op_* helpers
Some architectures may have their special barriers for acquire, release
and fence semantics, so that general memory barriers(smp_mb__*_atomic())
in the default __atomic_op_*() may be too strong, so allow architectures
to define their own helpers which can overwrite the default helpers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-18 00:11:02 +11:00
Nikolay Borisov
0fbf4cb27e ipv4: namespacify ip fragment max dist sysctl knob
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:42:54 -05:00
Nikolay Borisov
e21145a987 ipv4: namespacify ip_early_demux sysctl knob
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:42:54 -05:00
Nikolay Borisov
287b7f38fd ipv4: Namespacify ip_dynaddr sysctl knob
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:42:54 -05:00
Nikolay Borisov
fa50d974d1 ipv4: Namespaceify ip_default_ttl sysctl knob
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:42:54 -05:00
David S. Miller
6cd21d7941 Merge tag 'wireless-drivers-next-for-davem-2016-02-12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:

====================
Major changes:

wl12xx

* add device tree support for SPI

mwifiex

* add debugfs file to read chip information
* add MSIx support for newer pcie chipsets (8997 onwards)
* add schedule scan support
* add WoWLAN net-detect support
* firmware dump support for w8997 chipset

iwlwifi

* continue the work on multiple Rx queues
* add support for beacon storing used in low power states
* use the regular firmware image of WoWLAN
* fix 8000 devices for Big Endian machines
* more firmware debug hooks
* add support for P2P Client snoozing
* make the beacon filtering for AP mode configurable
* fix transmit queues overflow with LSO

libertas

* add support for setting power save via cfg80211
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:38:29 -05:00
Eric Dumazet
cd9b266095 tcp: add tcpi_min_rtt and tcpi_notsent_bytes to tcp_info
tcpi_min_rtt reports the minimal rtt observed by TCP stack for the flow,
in usec unit. Might be ~0U if not yet known.

tcpi_notsent_bytes reports the amount of bytes in the write queue that
were not yet sent.

This is done in a single patch to not add a temporary 32bit padding hole
in tcp_info.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:27:35 -05:00
Paolo Abeni
d71785ffc7 net: add dst_cache to ovs vxlan lwtunnel
In case of UDP traffic with datagram length
below MTU this give about 2% performance increase
when tunneling over ipv4 and about 60% when tunneling
over ipv6

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:21:48 -05:00
Paolo Abeni
0c1d70af92 net: use dst_cache for vxlan device
In case of UDP traffic with datagram length
below MTU this give about 3% performance increase
when tunneling over ipv4 and about 70% when
tunneling over ipv6.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:21:48 -05:00
Paolo Abeni
e09acddf87 ip_tunnel: replace dst_cache with generic implementation
The current ip_tunnel cache implementation is prone to a race
that will cause the wrong dst to be cached on cuncurrent dst cache
miss and ip tunnel update via netlink.

Replacing with the generic implementation fix the issue.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:21:48 -05:00
Paolo Abeni
607f725f6f net: replace dst_cache ip6_tunnel implementation with the generic one
This also fix a potential race into the existing tunnel code, which
could lead to the wrong dst to be permanenty cached:

CPU1:					CPU2:
  <xmit on ip6_tunnel>
  <cache lookup fails>
  dst = ip6_route_output(...)
					<tunnel params are changed via nl>
					dst_cache_reset() // no effect,
							// the cache is empty
  dst_cache_set() // the wrong dst
	// is permanenty stored
	// into the cache

With the new dst implementation the above race is not possible
since the first cache lookup after dst_cache_reset will fail due
to the timestamp check

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:21:48 -05:00
Paolo Abeni
911362c70d net: add dst_cache support
This patch add a generic, lockless dst cache implementation.
The need for lock is avoided updating the dst cache fields
only in per cpu scope, and requiring that the cache manipulation
functions are invoked with the local bh disabled.

The refresh_ts and reset_ts fields are used to ensure the cache
consistency in case of cuncurrent cache update (dst_cache_set*) and
reset operation (dst_cache_reset).

Consider the following scenario:

CPU1:                                   	CPU2:
  <cache lookup with emtpy cache: it fails>
  <get dst via uncached route lookup>
						<related configuration changes>
                                        	dst_cache_reset()
  dst_cache_set()

The dst entry set passed to dst_cache_set() should not be used
for later dst cache lookup, because it's obtained using old
configuration values.

Since the refresh_ts is updated only on dst_cache lookup, the
cached value in the above scenario will be discarded on the next
lookup.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:21:48 -05:00
Jake Oshins
92016ba5c1 PCI: Add fwnode_handle to x86 pci_sysdata
Add an fwnode_handle to the x86 struct pci_sysdata, which will be used to
locate an IRQ domain associated with a root PCI bus.

[bhelgaas: changelog]
Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-02-16 16:48:11 -06:00
Matan Barak
b4ff3a36d3 net/mlx5: Use offset based reserved field names in the IFC header file
mlx5_ifc.h is a header file representing the API and ABI between
the driver to the firmware and hardware. This file is used from
both the mlx5_ib and mlx5_core drivers.

Previously, this file used incrementing counter to indicate
reserved fields, for example:

struct mlx5_ifc_odp_per_transport_service_cap_bits {
        u8         send[0x1];
        u8         receive[0x1];
        u8         write[0x1];
        u8         read[0x1];
        u8         reserved_0[0x1];
        u8         srq_receive[0x1];
        u8         reserved_1[0x1a];
};

If one developer implements through net-next feature A that uses
reserved_0, they replace it with featureA and renames reserved_1 to
reserved_0. In the same kernel cycle, a 2nd developer could implement
feature B through the rdma tree, that uses reserved_1 and split it to
featureB and a smaller reserved_1 field. This will cause a conflict
when the two trees are merged.

The source of this conflict is that the 1st developer changed *all*
reserved fields.

As Linus suggested, we change the layout of structs to:

struct mlx5_ifc_odp_per_transport_service_cap_bits {
	u8         send[0x1];
	u8         receive[0x1];
	u8         write[0x1];
	u8         read[0x1];
	u8         reserved_at_4[0x1];
	u8         srq_receive[0x1];
	u8         reserved_at_6[0x1a];
};

This makes the conflicts much more rare and preserves the locality of
changes.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 15:21:47 -05:00
Keller, Jacob E
d4ab428627 ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}
Ethernet drivers implementing both {GS}RXFH and {GS}CHANNELS ethtool ops
incorrectly allow SCHANNELS when it would conflict with the settings
from SRXFH. This occurs because it is not possible for drivers to
understand whether their Rx flow indirection table has been configured
or is in the default state. In addition, drivers currently behave in
various ways when increasing the number of Rx channels.

Some drivers will always destroy the Rx flow indirection table when this
occurs, whether it has been set by the user or not. Other drivers will
attempt to preserve the table even if the user has never modified it
from the default driver settings. Neither of these situation is
desirable because it leads to unexpected behavior or loss of user
configuration.

The correct behavior is to simply return -EINVAL when SCHANNELS would
conflict with the current Rx flow table settings. However, it should
only do so if the current settings were modified by the user. If we
required that the new settings never conflict with the current (default)
Rx flow settings, we would force users to first reduce their Rx flow
settings and then reduce the number of Rx channels.

This patch proposes a solution implemented in net/core/ethtool.c which
ensures that all drivers behave correctly. It checks whether the RXFH
table has been configured to non-default settings, and stores this
information in a private netdev flag. When the number of channels is
requested to change, it first ensures that the current Rx flow table is
not going to assign flows to now disabled channels.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 15:19:49 -05:00
Stephan Mueller
3981d37ff3 crypto: doc - update AEAD AD handling
The associated data handling with the kernel crypto API has been
updated. This needs to be reflected in the documentation.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-02-17 04:07:54 +08:00
Stephan Mueller
e1eabc057a crypto: doc - add akcipher API
Reference the new akcipher API calls in the kernel crypto API DocBook.

Also, fix the comments in the akcipher.h file: double dashes do not look
good in the DocBook; fix a typo.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-02-17 04:07:53 +08:00
Stephan Mueller
28856a9e52 crypto: xts - consolidate sanity check for keys
The patch centralizes the XTS key check logic into the service function
xts_check_key which is invoked from the different XTS implementations.
With this, the XTS implementations in ARM, ARM64, PPC and S390 have now
a sanity check for the XTS keys similar to the other arches.

In addition, this service function received a check to ensure that the
key != the tweak key which is mandated by FIPS 140-2 IG A.9. As the
check is not present in the standards defining XTS, it is only enforced
in FIPS mode of the kernel.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-02-17 04:07:51 +08:00
Aditya Kali
fb3c831565 kernfs: define kernfs_node_dentry
Add a new kernfs api is added to lookup the dentry for a particular
kernfs path.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-02-16 13:04:58 -05:00
Aditya Kali
a79a908fd2 cgroup: introduce cgroup namespaces
Introduce the ability to create new cgroup namespace. The newly created
cgroup namespace remembers the cgroup of the process at the point
of creation of the cgroup namespace (referred as cgroupns-root).
The main purpose of cgroup namespace is to virtualize the contents
of /proc/self/cgroup file. Processes inside a cgroup namespace
are only able to see paths relative to their namespace root
(unless they are moved outside of their cgroupns-root, at which point
 they will see a relative path from their cgroupns-root).
For a correctly setup container this enables container-tools
(like libcontainer, lxc, lmctfy, etc.) to create completely virtualized
containers without leaking system level cgroup hierarchy to the task.
This patch only implements the 'unshare' part of the cgroupns.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-02-16 13:04:58 -05:00
Aditya Kali
5e2bec7c22 sched: new clone flag CLONE_NEWCGROUP for cgroup namespace
CLONE_NEWCGROUP will be used to create new cgroup namespace.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-02-16 13:04:58 -05:00
Aditya Kali
9f6df573a4 kernfs: Add API to generate relative kernfs path
The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-02-16 13:04:58 -05:00
Andrey Smetanin
83326e43f2 kvm/x86: Hyper-V VMBus hypercall userspace exit
The patch implements KVM_EXIT_HYPERV userspace exit
functionality for Hyper-V VMBus hypercalls:
HV_X64_HCALL_POST_MESSAGE, HV_X64_HCALL_SIGNAL_EVENT.

Changes v3:
* use vcpu->arch.complete_userspace_io to setup hypercall
result

Changes v2:
* use KVM_EXIT_HYPERV for hypercalls

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Joerg Roedel <joro@8bytes.org>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16 18:48:44 +01:00
Christian Borntraeger
6b6de68c63 KVM: halt_polling: improve grow/shrink settings
Right now halt_poll_ns can be change during runtime. The
grow and shrink factors can only be set during module load.
Lets fix several aspects of grow shrink:
- make grow/shrink changeable by root
- make all variables unsigned int
- read the variables once to prevent races

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16 18:48:29 +01:00
Kees Cook
416dd13ad6 ARM: 8503/1: clk_register_clkdev: remove format string interface
Many callers either use NULL or const strings for the third argument of
clk_register_clkdev. For those that do not and use a non-const string,
this is a risk for format strings being accidentally processed (for
example in device names). As this interface is already used as if it
weren't a format string (prints nothing when NULL), and there are zero
users of the format strings, remove the format string interface to make
sure format strings will not leak into the clkdev.

$ git grep '\bclk_register_clkdev\b' | grep % | wc -l
0

Unfortunately, all the internals expect a va_list even though they treat
a NULL format string as special. To deal with this, we must pass either
(..., "%s", string) or (..., NULL) so that a the va_list will be created
correctly (passing the name as an argument, not as a format string).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-02-16 16:34:18 +00:00
Tomeu Vizoso
656b8035b0 ARM: 8524/1: driver cohandle -EPROBE_DEFER from bus_type.match()
Allow implementations of the match() callback in struct bus_type to
return errors and if it's -EPROBE_DEFER then queue the device for
deferred probing.

This is useful to buses such as AMBA in which devices are registered
before their matching information can be retrieved from the HW
(typically because a clock driver hasn't probed yet).

[changed if-else code structure, adjusted documentation to match the code,
extended comments]

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-02-16 16:28:51 +00:00
Linus Torvalds
87bbcfdecc Merge tag 'for-linus-20160216' of git://git.infradead.org/intel-iommu
Pull IOMMU SVM fixes from David Woodhouse:
 "Minor register size and interrupt acknowledgement fixes which only
  showed up in testing on newer hardware, but mostly a fix to the MM
  refcount handling to prevent a recursive refcount issue when mmap() is
  used on the file descriptor associated with a bound PASID"

* tag 'for-linus-20160216' of git://git.infradead.org/intel-iommu:
  iommu/vt-d: Clear PPR bit to ensure we get more page request interrupts
  iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG
  iommu/vt-d: Fix mm refcounting to hold mm_count not mm_users
2016-02-16 08:04:06 -08:00
Mark Rutland
3694bd7678 asm-generic: Fix local variable shadow in __set_fixmap_offset
Currently __set_fixmap_offset is a macro function which has a local
variable called 'addr'. If a caller passes a 'phys' parameter which is
derived from a variable also called 'addr', the local variable will
shadow this, and the compiler will complain about the use of an
uninitialized variable. To avoid the issue with namespace clashes,
'addr' is prefixed with a liberal sprinkling of underscores.

Turning __set_fixmap_offset into a static inline breaks the build for
several architectures. Fixing this properly requires updates to a number
of architectures to make them agree on the prototype of __set_fixmap (it
could be done as a subsequent patch series).

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
[catalin.marinas@arm.com: squashed the original function patch and macro fixup]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-16 15:10:44 +00:00
Linus Walleij
143b65d677 gpio: create an API to detect open drain/source on lines
My left hand merges code to privatize the descriptor handling
while my right hand merges drivers that poke around and
disrespect with the same gpiolib internals.

So let's expose the proper APIs for drivers to ask the gpiolib
core if a line is marked as open drain or open source and
get some order around things so this driver compiles again.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Nicolas Saenz Julienne <nicolassaenzj@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-02-16 15:41:42 +01:00
Ingo Molnar
4682c211a8 Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:

 * Prevent accidental deletion of EFI variables through efivarfs that
   may brick machines. We use a whitelist of known-safe variables to
   allow things like installing distributions to work out of the box, and
   instead restrict vendor-specific variable deletion by making
   non-whitelist variables immutable (Peter Jones)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 13:14:57 +01:00
Mauro Carvalho Chehab
8c755c2910 Merge branch 'fixes' into patchwork
Some macros were changed/removed at the material for v4.5. We need
to sync with those changes here, in order to avoid troubles.

* v4l_for_linus:
  [media] media.h: get rid of MEDIA_ENT_F_CONN_TEST
  [media] [for,v4.5] media.h: increase the spacing between function ranges
  [media] media: i2c/adp1653: probe: fix erroneous return value
  [media] media: davinci_vpfe: fix missing unlock on error in vpfe_prepare_pipeline()
2016-02-16 09:20:45 -02:00