Commit Graph

114223 Commits

Author SHA1 Message Date
David S. Miller
446bf64b61 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge conflict of mlx5 resolved using instructions in merge
commit 9566e650bf.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 11:54:03 -07:00
Maxime Jourdan
60a039eb27 media: videodev2.h: add V4L2_FMT_FLAG_DYN_RESOLUTION
Add an enum_fmt format flag to specifically tag coded formats where
dynamic resolution switching is supported by the device.

This is useful for some codec drivers that can support dynamic
resolution switching for one or more of their listed coded formats. It
allows userspace to know whether it should extract the video parameters
itself, or if it can rely on the device to send V4L2_EVENT_SOURCE_CHANGE
when such changes are detected.

Signed-off-by: Maxime Jourdan <mjourdan@baylibre.com>
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Reviewed-by: Alexandre Courbot <acourbot@chromium.org>
Acked-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-08-19 14:56:31 -03:00
Hans Verkuil
2b770bee78 media: videodev2.h: add V4L2_FMT_FLAG_CONTINUOUS_BYTESTREAM
Add an enum_fmt format flag to specifically tag coded formats where
full bytestream parsing is supported by the device.

Some stateful decoders are capable of fully parsing a bytestream,
but others require that userspace pre-parses the bytestream into
frames or fields (see the corresponding pixelformat descriptions
for details).

If this flag is set, then this pre-parsing step is not required
(but still possible, of course).

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Reviewed-by: Alexandre Courbot <acourbot@chromium.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-08-19 14:41:45 -03:00
Linus Torvalds
06821504fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

  1) Fix jmp to 1st instruction in x64 JIT, from Alexei Starovoitov.

  2) Severl kTLS fixes in mlx5 driver, from Tariq Toukan.

  3) Fix severe performance regression due to lack of SKB coalescing of
     fragments during local delivery, from Guillaume Nault.

  4) Error path memory leak in sch_taprio, from Ivan Khoronzhuk.

  5) Fix batched events in skbedit packet action, from Roman Mashak.

  6) Propagate VLAN TX offload to hw_enc_features in bond and team
     drivers, from Yue Haibing.

  7) RXRPC local endpoint refcounting fix and read after free in
     rxrpc_queue_local(), from David Howells.

  8) Fix endian bug in ibmveth multicast list handling, from Thomas
     Falcon.

  9) Oops, make nlmsg_parse() wrap around the correct function,
     __nlmsg_parse not __nla_parse(). Fix from David Ahern.

 10) Memleak in sctp_scend_reset_streams(), fro Zheng Bin.

 11) Fix memory leak in cxgb4, from Wenwen Wang.

 12) Yet another race in AF_PACKET, from Eric Dumazet.

 13) Fix false detection of retransmit failures in tipc, from Tuong
     Lien.

 14) Use after free in ravb_tstamp_skb, from Tho Vu.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
  ravb: Fix use-after-free ravb_tstamp_skb
  netfilter: nf_tables: map basechain priority to hardware priority
  net: sched: use major priority number as hardware priority
  wimax/i2400m: fix a memory leak bug
  net: cavium: fix driver name
  ibmvnic: Unmap DMA address of TX descriptor buffers after use
  bnxt_en: Fix to include flow direction in L2 key
  bnxt_en: Use correct src_fid to determine direction of the flow
  bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command
  bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
  bnxt_en: Improve RX doorbell sequence.
  bnxt_en: Fix VNIC clearing logic for 57500 chips.
  net: kalmia: fix memory leaks
  cx82310_eth: fix a memory leak bug
  bnx2x: Fix VF's VLAN reconfiguration in reload.
  Bluetooth: Add debug setting for changing minimum encryption key size
  tipc: fix false detection of retransmit failures
  lan78xx: Fix memory leaks
  MAINTAINERS: r8169: Update path to the driver
  MAINTAINERS: PHY LIBRARY: Update files in the record
  ...
2019-08-19 10:00:01 -07:00
David Howells
555df336c7 keys: Fix description size
The maximum key description size is 4095.  Commit f771fde820 ("keys:
Simplify key description management") inadvertantly reduced that to 255
and made sizes between 256 and 4095 work weirdly, and any size whereby
size & 255 == 0 would cause an assertion in __key_link_begin() at the
following line:

	BUG_ON(index_key->desc_len == 0);

This can be fixed by simply increasing the size of desc_len in struct
keyring_index_key to a u16.

Note the argument length test in keyutils only checked empty
descriptions and descriptions with a size around the limit (ie.  4095)
and not for all the values in between, so it missed this.  This has been
addressed and

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/commit/?id=066bf56807c26cd3045a25f355b34c1d8a20a5aa

now exhaustively tests all possible lengths of type, description and
payload and then some.

The assertion failure looks something like:

 kernel BUG at security/keys/keyring.c:1245!
 ...
 RIP: 0010:__key_link_begin+0x88/0xa0
 ...
 Call Trace:
  key_create_or_update+0x211/0x4b0
  __x64_sys_add_key+0x101/0x200
  do_syscall_64+0x5b/0x1e0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

It can be triggered by:

	keyctl add user "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" a @s

Fixes: f771fde820 ("keys: Simplify key description management")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-19 09:43:57 -07:00
Boris Brezillon
c3adb85745 media: uapi: h264: Get rid of the p0/b0/b1 ref-lists
Those lists can be extracted from the dpb, let's simplify userspace
life and build that list kernel-side (generic helpers will be provided
for drivers that need this list).

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com>
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-08-19 13:24:04 -03:00
Ezequiel Garcia
8cae93e090 media: uapi: h264: Add the concept of start code
Stateless decoders have different expectations about the
start code that is prepended on H264 slices. Add a
menu control to express the supported start code types
(including no start code).

Drivers are allowed to support only one start code type,
but they can support both too.

Note that this is independent of the H264 decoding mode,
which specifies the granularity of the decoding operations.
Either in frame-based or slice-based mode, this new control
will allow to define the start code expected on H264 slices.

Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-08-19 13:23:12 -03:00
Boris Brezillon
5604be66a5 media: uapi: h264: Add the concept of decoding mode
Some stateless decoders don't support per-slice decoding granularity
(or at least not in a way that would make them efficient or easy to use).

Expose a menu to control the supported decoding modes. Drivers are
allowed to support only one decoding but they can support both too.

To fully specify the decoding operation, we need to introduce
a start_byte_offset, to indicate where slices start.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-08-19 13:21:51 -03:00
Ezequiel Garcia
7bb3c32abd media: uapi: h264: Rename pixel format
The V4L2_PIX_FMT_H264_SLICE_RAW name was originally suggested
because the pixel format would represent H264 slices without any
start code.

However, as we will now introduce a start code menu control,
give the pixel format a more meaningful name, while it's
still early enough to do so.

Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-08-19 13:16:08 -03:00
Rasmus Villemoes
4333fb96ca media: lib/sort.c: implement sort() variant taking context argument
Our list_sort() utility has always supported a context argument that
is passed through to the comparison routine. Now there's a use case
for the similar thing for sort().

This implements sort_r by simply extending the existing sort function
in the obvious way. To avoid code duplication, we want to implement
sort() in terms of sort_r(). The naive way to do that is

static int cmp_wrapper(const void *a, const void *b, const void *ctx)
{
  int (*real_cmp)(const void*, const void*) = ctx;
  return real_cmp(a, b);
}

sort(..., cmp) { sort_r(..., cmp_wrapper, cmp) }

but this would do two indirect calls for each comparison. Instead, do
as is done for the default swap functions - that only adds a cost of a
single easily predicted branch to each comparison call.

Aside from introducing support for the context argument, this also
serves as preparation for patches that will eliminate the indirect
comparison calls in common cases.

Requested-by: Boris Brezillon <boris.brezillon@collabora.com>

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-08-19 13:14:53 -03:00
Trond Myklebust
b72679ee89 notify: export symbols for use by the knfsd file cache
The knfsd file cache will need to detect when files are unlinked, so that
it can close the associated cached files. Export a minimal set of notifier
functions to allow it to do so.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19 11:00:39 -04:00
Jeff Layton
18f6622ebb locks: create a new notifier chain for lease attempts
With the new file caching infrastructure in nfsd, we can end up holding
files open for an indefinite period of time, even when they are still
idle. This may prevent the kernel from handing out leases on the file,
which is something we don't want to block.

Fix this by running a SRCU notifier call chain whenever on any
lease attempt. nfsd can then purge the cache for that inode before
returning.

Since SRCU is only conditionally compiled in, we must only define the
new chain if it's enabled, and users of the chain must ensure that
SRCU is enabled.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19 11:00:39 -04:00
Jeff Layton
f69d6d8eef sunrpc: add a new cache_detail operation for when a cache is flushed
When the exports table is changed, exportfs will usually write a new
time to the "flush" file in the nfsd.export cache procfile. This tells
the kernel to flush any entries that are older than that value.

This gives us a mechanism to tell whether an unexport might have
occurred. Add a new ->flush cache_detail operation that is called after
flushing the cache whenever someone writes to a "flush" file.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19 11:00:39 -04:00
Chuck Lever
4866073e6d svcrdma: Use llist for managing cache of recv_ctxts
Use a wait-free mechanism for managing the svc_rdma_recv_ctxts free
list. Subsequently, sc_recv_lock can be eliminated.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19 10:59:28 -04:00
Chuck Lever
d6dfe43ec6 svcrdma: Remove svc_rdma_wq
Clean up: the system workqueue will work just as well.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19 10:59:28 -04:00
Junxiao Bi
988721db93 block: remove struct request_queue queue_head
The dispatch list is not used any more, as the legacy block IO stack
has been removed.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-19 08:55:10 -06:00
Thomas Gleixner
b6a32bbd87 genirq: Force interrupt threading on RT
Switch force_irqthreads from a boot time modifiable variable to a compile
time constant when CONFIG_PREEMPT_RT is enabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190816160923.12855-1-bigeasy@linutronix.de
2019-08-19 15:45:48 +02:00
Masahiro Yamada
38a429c898 netfilter: add include guard to nf_conntrack_h323_types.h
Add a header include guard just in case.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-19 13:59:57 +02:00
Leonard Crestez
be378b6007 clk: imx8mn: Add GIC clock
This is enabled by default but if it's not explicitly defined and marked
as critical then its parent might get turned off.

Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-08-19 13:54:40 +02:00
Eric W. Biederman
33da8e7c81 signal: Allow cifs and drbd to receive their terminating signals
My recent to change to only use force_sig for a synchronous events
wound up breaking signal reception cifs and drbd.  I had overlooked
the fact that by default kthreads start out with all signals set to
SIG_IGN.  So a change I thought was safe turned out to have made it
impossible for those kernel thread to catch their signals.

Reverting the work on force_sig is a bad idea because what the code
was doing was very much a misuse of force_sig.  As the way force_sig
ultimately allowed the signal to happen was to change the signal
handler to SIG_DFL.  Which after the first signal will allow userspace
to send signals to these kernel threads.  At least for
wake_ack_receiver in drbd that does not appear actively wrong.

So correct this problem by adding allow_kernel_signal that will allow
signals whose siginfo reports they were sent by the kernel through,
but will not allow userspace generated signals, and update cifs and
drbd to call allow_kernel_signal in an appropriate place so that their
thread can receive this signal.

Fixing things this way ensures that userspace won't be able to send
signals and cause problems, that it is clear which signals the
threads are expecting to receive, and it guarantees that nothing
else in the system will be affected.

This change was partly inspired by similar cifs and drbd patches that
added allow_signal.

Reported-by: ronnie sahlberg <ronniesahlberg@gmail.com>
Reported-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Cc: Steve French <smfrench@gmail.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Fixes: 247bc9470b ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes")
Fixes: 72abe3bcf0 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig")
Fixes: fee109901f ("signal/drbd: Use send_sig not force_sig")
Fixes: 3cf5d076fb ("signal: Remove task parameter from force_sig")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-08-19 06:34:13 -05:00
Juliana Rodrigueiro
89a26cd4b5 netfilter: xt_nfacct: Fix alignment mismatch in xt_nfacct_match_info
When running a 64-bit kernel with a 32-bit iptables binary, the size of
the xt_nfacct_match_info struct diverges.

    kernel: sizeof(struct xt_nfacct_match_info) : 40
    iptables: sizeof(struct xt_nfacct_match_info)) : 36

Trying to append nfacct related rules results in an unhelpful message.
Although it is suggested to look for more information in dmesg, nothing
can be found there.

    # iptables -A <chain> -m nfacct --nfacct-name <acct-object>
    iptables: Invalid argument. Run `dmesg' for more information.

This patch fixes the memory misalignment by enforcing 8-byte alignment
within the struct's first revision. This solution is often used in many
other uapi netfilter headers.

Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-19 09:34:21 +02:00
Greg Kroah-Hartman
7ffc95e90e Merge 5.3-rc5 into usb-next
We need the usb fixes in here as well for other patches to build on.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-19 07:15:42 +02:00
Greg Kroah-Hartman
c6d6832ce3 Merge 5.3-rc5 into staging-next
We need the staging fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-19 07:13:42 +02:00
Greg Kroah-Hartman
e70c971d7d Merge 5.3-rc5 into char-misc-next
We need the char/misc fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-19 07:11:53 +02:00
Pablo Neira Ayuso
3bc158f8d0 netfilter: nf_tables: map basechain priority to hardware priority
This patch adds initial support for offloading basechains using the
priority range from 1 to 65535. This is restricting the netfilter
priority range to 16-bit integer since this is what most drivers assume
so far from tc. It should be possible to extend this range of supported
priorities later on once drivers are updated to support for 32-bit
integer priorities.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-18 14:13:23 -07:00
Pablo Neira Ayuso
ef01adae0e net: sched: use major priority number as hardware priority
tc transparently maps the software priority number to hardware. Update
it to pass the major priority which is what most drivers expect. Update
drivers too so they do not need to lshift the priority field of the
flow_cls_common_offload object. The stmmac driver is an exception, since
this code assumes the tc software priority is fine, therefore, lshift it
just to be conservative.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-18 14:13:23 -07:00
Marc Zyngier
24cab82c34 KVM: arm/arm64: vgic: Add LPI translation cache definition
Add the basic data structure that expresses an MSI to LPI
translation as well as the allocation/release hooks.

The size of the cache is arbitrarily defined as 16*nr_vcpus.

Tested-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18 18:38:35 +01:00
Linus Torvalds
359334caf7 Merge tag 'usb-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are number of small USB fixes for 5.3-rc5.

  Syzbot has been on a tear recently now that it has some good USB
  debugging hooks integrated, so there's a number of fixes in here found
  by those tools for some _very_ old bugs. Also a handful of gadget
  driver fixes for reported issues, some hopefully-final dma fixes for
  host controller drivers, and some new USB serial gadget driver ids.

  All of these have been in linux-next this week with no reported issues
  (the usb-serial ones were in linux-next in its own branch, but merged
  into mine on Friday)"

* tag 'usb-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: add a hcd_uses_dma helper
  usb: don't create dma pools for HCDs with a localmem_pool
  usb: chipidea: imx: fix EPROBE_DEFER support during driver probe
  usb: host: fotg2: restart hcd after port reset
  USB: CDC: fix sanity checks in CDC union parser
  usb: cdc-acm: make sure a refcount is taken early enough
  USB: serial: option: add the BroadMobi BM818 card
  USB: serial: option: Add Motorola modem UARTs
  USB: core: Fix races in character device registration and deregistraion
  usb: gadget: mass_storage: Fix races between fsg_disable and fsg_set_alt
  usb: gadget: composite: Clear "suspended" on reset/disconnect
  usb: gadget: udc: renesas_usb3: Fix sysfs interface of "role"
  USB: serial: option: add D-Link DWM-222 device ID
  USB: serial: option: Add support for ZTE MF871A
2019-08-18 09:11:29 -07:00
Linus Torvalds
8fde2832bd Merge tag 'for-linus-2019-08-17' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A collection of fixes that should go into this series. This contains:

   - Revert of the REQ_NOWAIT_INLINE and associated dio changes. There
     were still corner cases there, and even though I had a solution for
     it, it's too involved for this stage. (me)

   - Set of NVMe fixes (via Sagi)

   - io_uring fix for fixed buffers (Anthony)

   - io_uring defer issue fix (Jackie)

   - Regression fix for queue sync at exit time (zhengbin)

   - xen blk-back memory leak fix (Wenwen)"

* tag 'for-linus-2019-08-17' of git://git.kernel.dk/linux-block:
  io_uring: fix an issue when IOSQE_IO_LINK is inserted into defer list
  block: remove REQ_NOWAIT_INLINE
  io_uring: fix manual setup of iov_iter for fixed buffers
  xen/blkback: fix memory leaks
  blk-mq: move cancel of requeue_work to the front of blk_exit_queue
  nvme-pci: Fix async probe remove race
  nvme: fix controller removal race with scan work
  nvme-rdma: fix possible use-after-free in connect error flow
  nvme: fix a possible deadlock when passthru commands sent to a multipath device
  nvme-core: Fix extra device_put() call on error path
  nvmet-file: fix nvmet_file_flush() always returning an error
  nvmet-loop: Flush nvme_delete_wq when removing the port
  nvmet: Fix use-after-free bug when a port is removed
  nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns
2019-08-17 19:39:54 -07:00
Björn Töpel
0402acd683 xsk: remove AF_XDP socket from map when the socket is released
When an AF_XDP socket is released/closed the XSKMAP still holds a
reference to the socket in a "released" state. The socket will still
use the netdev queue resource, and block newly created sockets from
attaching to that queue, but no user application can access the
fill/complete/rx/tx queues. This results in that all applications need
to explicitly clear the map entry from the old "zombie state"
socket. This should be done automatically.

In this patch, the sockets tracks, and have a reference to, which maps
it resides in. When the socket is released, it will remove itself from
all maps.

Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17 23:24:45 +02:00
Stanislav Fomichev
8f51dfc73b bpf: support cloning sk storage on accept()
Add new helper bpf_sk_storage_clone which optionally clones sk storage
and call it from sk_clone_lock.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17 23:18:54 +02:00
Stanislav Fomichev
b0e4701ce1 bpf: export bpf_map_inc_not_zero
Rename existing bpf_map_inc_not_zero to __bpf_map_inc_not_zero to
indicate that it's caller's responsibility to do proper locking.
Create and export bpf_map_inc_not_zero wrapper that properly
locks map_idr_lock. Will be used in the next commit to
hold a map while cloning a socket.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17 23:18:54 +02:00
Magnus Karlsson
77cd0d7b3f xsk: add support for need_wakeup flag in AF_XDP rings
This commit adds support for a new flag called need_wakeup in the
AF_XDP Tx and fill rings. When this flag is set, it means that the
application has to explicitly wake up the kernel Rx (for the bit in
the fill ring) or kernel Tx (for bit in the Tx ring) processing by
issuing a syscall. Poll() can wake up both depending on the flags
submitted and sendto() will wake up tx processing only.

The main reason for introducing this new flag is to be able to
efficiently support the case when application and driver is executing
on the same core. Previously, the driver was just busy-spinning on the
fill ring if it ran out of buffers in the HW and there were none on
the fill ring. This approach works when the application is running on
another core as it can replenish the fill ring while the driver is
busy-spinning. Though, this is a lousy approach if both of them are
running on the same core as the probability of the fill ring getting
more entries when the driver is busy-spinning is zero. With this new
feature the driver now sets the need_wakeup flag and returns to the
application. The application can then replenish the fill queue and
then explicitly wake up the Rx processing in the kernel using the
syscall poll(). For Tx, the flag is only set to one if the driver has
no outstanding Tx completion interrupts. If it has some, the flag is
zero as it will be woken up by a completion interrupt anyway.

As a nice side effect, this new flag also improves the performance of
the case where application and driver are running on two different
cores as it reduces the number of syscalls to the kernel. The kernel
tells user space if it needs to be woken up by a syscall, and this
eliminates many of the syscalls.

This flag needs some simple driver support. If the driver does not
support this, the Rx flag is always zero and the Tx flag is always
one. This makes any application relying on this feature default to the
old behaviour of not requiring any syscalls in the Rx path and always
having to call sendto() in the Tx path.

For backwards compatibility reasons, this feature has to be explicitly
turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend
that you always turn it on as it so far always have had a positive
performance impact.

The name and inspiration of the flag has been taken from io_uring by
Jens Axboe. Details about this feature in io_uring can be found in
http://kernel.dk/io_uring.pdf, section 8.3.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17 23:07:32 +02:00
Magnus Karlsson
9116e5e2b1 xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeup
This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new
ndo provides the same functionality as before but with the addition of
a new flags field that is used to specifiy if Rx, Tx or both should be
woken up. The previous ndo only woke up Tx, as implied by the
name. The i40e and ixgbe drivers (which are all the supported ones)
are updated with this new interface.

This new ndo will be used by the new need_wakeup functionality of XDP
sockets that need to be able to wake up both Rx and Tx driver
processing.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17 23:07:31 +02:00
Ido Schimmel
f3047ca01f Documentation: Add devlink-trap documentation
Add initial documentation of the devlink-trap mechanism, explaining the
background, motivation and the semantics of the interface.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:40:08 -07:00
Ido Schimmel
391203ab11 devlink: Add generic packet traps and groups
Add generic packet traps and groups that can report dropped packets as
well as exceptions such as TTL error.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:40:08 -07:00
Ido Schimmel
0f420b6c52 devlink: Add packet trap infrastructure
Add the basic packet trap infrastructure that allows device drivers to
register their supported packet traps and trap groups with devlink.

Each driver is expected to provide basic information about each
supported trap, such as name and ID, but also the supported metadata
types that will accompany each packet trapped via the trap. The
currently supported metadata type is just the input port, but more will
be added in the future. For example, output port and traffic class.

Trap groups allow users to set the action of all member traps. In
addition, users can retrieve per-group statistics in case per-trap
statistics are too narrow. In the future, the trap group object can be
extended with more attributes, such as policer settings which will limit
the amount of traffic generated by member traps towards the CPU.

Beside registering their packet traps with devlink, drivers are also
expected to report trapped packets to devlink along with relevant
metadata. devlink will maintain packets and bytes statistics for each
packet trap and will potentially report the trapped packet with its
metadata to user space via drop monitor netlink channel.

The interface towards the drivers is simple and allows devlink to set
the action of the trap. Currently, only two actions are supported:
'trap' and 'drop'. When set to 'trap', the device is expected to provide
the sole copy of the packet to the driver which will pass it to devlink.
When set to 'drop', the device is expected to drop the packet and not
send a copy to the driver. In the future, more actions can be added,
such as 'mirror'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:40:08 -07:00
Ido Schimmel
8e94c3bc92 drop_monitor: Allow user to start monitoring hardware drops
Drop monitor has start and stop commands, but so far these were only
used to start and stop monitoring of software drops.

Now that drop monitor can also monitor hardware drops, we should allow
the user to control these as well.

Do that by adding SW and HW flags to these commands. If no flag is
specified, then only start / stop monitoring software drops. This is
done in order to maintain backward-compatibility with existing user
space applications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:40:08 -07:00
Ido Schimmel
d40e1deb93 drop_monitor: Add support for summary alert mode for hardware drops
In summary alert mode a notification is sent with a list of recent drop
reasons and a count of how many packets were dropped due to this reason.

To avoid expensive operations in the context in which packets are
dropped, each CPU holds an array whose number of entries is the maximum
number of drop reasons that can be encoded in the netlink notification.
Each entry stores the drop reason and a count. When a packet is dropped
the array is traversed and a new entry is created or the count of an
existing entry is incremented.

Later, in process context, the array is replaced with a newly allocated
copy and the old array is encoded in a netlink notification. To avoid
breaking user space, the notification includes the ancillary header,
which is 'struct net_dm_alert_msg' with number of entries set to '0'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:40:08 -07:00
Ido Schimmel
5e58109b1e drop_monitor: Add support for packet alert mode for hardware drops
In a similar fashion to software drops, extend drop monitor to send
netlink events when packets are dropped by the underlying hardware.

The main difference is that instead of encoding the program counter (PC)
from which kfree_skb() was called in the netlink message, we encode the
hardware trap name. The two are mostly equivalent since they should both
help the user understand why the packet was dropped.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:40:08 -07:00
Ido Schimmel
edd3d0074c drop_monitor: Add basic infrastructure for hardware drops
Export a function that can be invoked in order to report packets that
were dropped by the underlying hardware along with metadata.

Subsequent patches will add support for the different alert modes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:40:08 -07:00
David S. Miller
42eb455470 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
pull request: bluetooth 2019-08-17

Here's a set of Bluetooth fixes for the 5.3-rc series:

 - Multiple fixes for Qualcomm (btqca & hci_qca) drivers
 - Minimum encryption key size debugfs setting (this is required for
   Bluetooth Qualification)
 - Fix hidp_send_message() to have a meaningful return value
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:37:47 -07:00
Heiner Kallweit
4b9cb2a5ce net: phy: remove genphy_config_init
Now that all users have been removed we can remove genphy_config_init.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:34:50 -07:00
Chris Wilson
f2cb60e9a3 dma-fence: Store the timestamp in the same union as the cb_list
The timestamp and the cb_list are mutually exclusive, the cb_list can
only be added to prior to being signaled (and once signaled we drain),
while the timestamp is only valid upon being signaled. Both the
timestamp and the cb_list are only valid while the fence is alive, and
as soon as no references are held can be replaced by the rcu_head.

By reusing the union for the timestamp, we squeeze the base dma_fence
struct to 64 bytes on x86-64.

v2: Sort the union chronologically

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian König <christian.koenig@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>.
Link: https://patchwork.freedesktop.org/patch/msgid/20190817153022.5749-1-chris@chris-wilson.co.uk
2019-08-17 18:46:33 +01:00
Chris Wilson
4fe3997a68 dma-fence: Shrink size of struct dma_fence
Rearrange the couple of 32-bit atomics hidden amongst the field of
pointers that unnecessarily caused the compiler to insert some padding,
shrinks the size of the base struct dma_fence from 80 to 72 bytes on
x86-64.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817144736.7826-3-chris@chris-wilson.co.uk
2019-08-17 18:02:49 +01:00
Marcel Holtmann
58a96fc353 Bluetooth: Add debug setting for changing minimum encryption key size
For testing and qualification purposes it is useful to allow changing
the minimum encryption key size value that the host stack is going to
enforce. This adds a new debugfs setting min_encrypt_key_size to achieve
this functionality.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2019-08-17 13:54:40 +03:00
Christoph Hellwig
f7bc6e42bf drivers: remove the SGI SN2 IOC4 base support
The IOC4 is a multi-function chip seen on SGI SN2 and some SGI MIPS
systems.  This removes the base driver, which while not having an SN2
Kconfig dependency was only for sub-drivers that had one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/20190813072514.23299-15-hch@lst.de
Signed-off-by: Tony Luck <tony.luck@intel.com>
2019-08-16 11:33:57 -07:00
Stephen Boyd
0214f33c4e clk: Overwrite clk_hw::init with NULL during clk_register()
We don't want clk provider drivers to use the init structure after clk
registration time, but we leave a dangling reference to it by means of
clk_hw::init. Let's overwrite the member with NULL during clk_register()
so that this can't be used anymore after registration time.

Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Doug Anderson <dianders@chromium.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20190731193517.237136-10-sboyd@kernel.org
Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
2019-08-16 10:27:29 -07:00
Linus Torvalds
2d63ba3e41 Merge tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These add a check to avoid recent suspend-to-idle power regression on
  systems with NVMe drives where the PCIe ASPM policy is "performance"
  (or when the kernel is built without ASPM support), fix an issue
  related to frequency limits in the schedutil cpufreq governor and fix
  a mistake related to the PM QoS usage in the cpufreq core introduced
  recently.

  Specifics:

   - Disable NVMe power optimization related to suspend-to-idle added
     recently on systems where PCIe ASPM is not able to put PCIe links
     into low-power states to prevent excess power from being drawn by
     the system while suspended (Rafael Wysocki).

   - Make the schedutil governor handle frequency limits changes
     properly in all cases (Viresh Kumar).

   - Prevent the cpufreq core from treating positive values returned by
     dev_pm_qos_update_request() as errors (Viresh Kumar)"

* tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  PCI/ASPM: Add pcie_aspm_enabled()
  cpufreq: schedutil: Don't skip freq update when limits change
  cpufreq: dev_pm_qos_update_request() can return 1 on success
2019-08-16 09:13:16 -07:00
Jason Gunthorpe
2c7933f53f mm/mmu_notifiers: add a get/put scheme for the registration
Many places in the kernel have a flow where userspace will create some
object and that object will need to connect to the subsystem's
mmu_notifier subscription for the duration of its lifetime.

In this case the subsystem is usually tracking multiple mm_structs and it
is difficult to keep track of what struct mmu_notifier's have been
allocated for what mm's.

Since this has been open coded in a variety of exciting ways, provide core
functionality to do this safely.

This approach uses the struct mmu_notifier_ops * as a key to determine if
the subsystem has a notifier registered on the mm or not. If there is a
registration then the existing notifier struct is returned, otherwise the
ops->alloc_notifiers() is used to create a new per-subsystem notifier for
the mm.

The destroy side incorporates an async call_srcu based destruction which
will avoid bugs in the callers such as commit 6d7c3cde93 ("mm/hmm: fix
use after free with struct hmm in the mmu notifiers").

Since we are inside the mmu notifier core locking is fairly simple, the
allocation uses the same approach as for mmu_notifier_mm, the write side
of the mmap_sem makes everything deterministic and we only need to do
hlist_add_head_rcu() under the mm_take_all_locks(). The new users count
and the discoverability in the hlist is fully serialized by the
mmu_notifier_mm->lock.

Link: https://lore.kernel.org/r/20190806231548.25242-4-jgg@ziepe.ca
Co-developed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-16 12:02:52 -03:00