Commit Graph

109044 Commits

Author SHA1 Message Date
Linus Torvalds
1a9df9e29c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Fixes here and there, a couple new device IDs, as usual:

   1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei.

   2) Fix 64-bit division in iwlwifi, from Arnd Bergmann.

   3) Fix documentation for some eBPF helpers, from Quentin Monnet.

   4) Some UAPI bpf header sync with tools, also from Quentin Monnet.

   5) Set descriptor ownership bit at the right time for jumbo frames in
      stmmac driver, from Aaro Koskinen.

   6) Set IFF_UP properly in tun driver, from Eric Dumazet.

   7) Fix load/store doubleword instruction generation in powerpc eBPF
      JIT, from Naveen N. Rao.

   8) nla_nest_start() return value checks all over, from Kangjie Lu.

   9) Fix asoc_id handling in SCTP after the SCTP_*_ASSOC changes this
      merge window. From Marcelo Ricardo Leitner and Xin Long.

  10) Fix memory corruption with large MTUs in stmmac, from Aaro
      Koskinen.

  11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric
      Dumazet.

  12) Fix topology subscription cancellation in tipc, from Erik Hugne.

  13) Memory leak in genetlink error path, from Yue Haibing.

  14) Valid control actions properly in packet scheduler, from Davide
      Caratti.

  15) Even if we get EEXIST, we still need to rehash if a shrink was
      delayed. From Herbert Xu.

  16) Fix interrupt mask handling in interrupt handler of r8169, from
      Heiner Kallweit.

  17) Fix leak in ehea driver, from Wen Yang"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits)
  dpaa2-eth: fix race condition with bql frame accounting
  chelsio: use BUG() instead of BUG_ON(1)
  net: devlink: skip info_get op call if it is not defined in dumpit
  net: phy: bcm54xx: Encode link speed and activity into LEDs
  tipc: change to check tipc_own_id to return in tipc_net_stop
  net: usb: aqc111: Extend HWID table by QNAP device
  net: sched: Kconfig: update reference link for PIE
  net: dsa: qca8k: extend slave-bus implementations
  net: dsa: qca8k: remove leftover phy accessors
  dt-bindings: net: dsa: qca8k: support internal mdio-bus
  dt-bindings: net: dsa: qca8k: fix example
  net: phy: don't clear BMCR in genphy_soft_reset
  bpf, libbpf: clarify bump in libbpf version info
  bpf, libbpf: fix version info and add it to shared object
  rxrpc: avoid clang -Wuninitialized warning
  tipc: tipc clang warning
  net: sched: fix cleanup NULL pointer exception in act_mirr
  r8169: fix cable re-plugging issue
  net: ethernet: ti: fix possible object reference leak
  net: ibm: fix possible object reference leak
  ...
2019-03-27 12:22:57 -07:00
Vladimir Oltean
450895d04b net: phy: bcm54xx: Encode link speed and activity into LEDs
Previously the green and amber LEDs on this quad PHY were solid, to
indicate an encoding of the link speed (10/100/1000).

This keeps the LEDs always on just as before, but now they flash on
Rx/Tx activity.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-26 11:24:47 -07:00
Linus Torvalds
a3ac7917b7 Revert "parport: daisy: use new parport device model"
This reverts commit 1aec421120.

Steven Rostedt reports that it causes a hang at bootup and bisected it
to this commit.

The troigger is apparently a module alias for "parport_lowlevel" that
points to "parport_pc", which causes a hang with

    modprobe -q -- parport_lowlevel

blocking forever with a backtrace like this:

    wait_for_completion_killable+0x1c/0x28
    call_usermodehelper_exec+0xa7/0x108
    __request_module+0x351/0x3d8
    get_lowlevel_driver+0x28/0x41 [parport]
    __parport_register_driver+0x39/0x1f4 [parport]
    daisy_drv_init+0x31/0x4f [parport]
    parport_bus_init+0x5d/0x7b [parport]
    parport_default_proc_register+0x26/0x1000 [parport]
    do_one_initcall+0xc2/0x1e0
    do_init_module+0x50/0x1d4
    load_module+0x1c2e/0x21b3
    sys_init_module+0xef/0x117

Supid says:
 "Due to the new device model daisy driver will now try to find the
  parallel ports while trying to register its driver so that it can bind
  with them. Now, since daisy driver is loaded while parport bus is
  initialising the list of parport is still empty and it tries to load
  the lowlevel driver, which has an alias set to parport_pc, now causes
  a deadlock"

But I don't think the daisy driver should be loaded by the parport
initialization in the first place, so let's revert the whole change.

If the daisy driver can just initialize separately on its own (like a
driver should), instead of hooking into the parport init sequence
directly, this issue probably would go away.

Reported-and-bisected-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-25 14:49:00 -07:00
Linus Torvalds
19caf581ba Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of x86 fixes:

   - Prevent potential NULL pointer dereferences in the HPET and HyperV
     code

   - Exclude the GART aperture from /proc/kcore to prevent kernel
     crashes on access

   - Use the correct macros for Cyrix I/O on Geode processors

   - Remove yet another kernel address printk leak

   - Announce microcode reload completion as requested by quite some
     people. Microcode loading has become popular recently.

   - Some 'Make Clang' happy fixlets

   - A few cleanups for recently added code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/gart: Exclude GART aperture from kcore
  x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
  x86/mm/pti: Make local symbols static
  x86/cpu/cyrix: Remove {get,set}Cx86_old macros used for Cyrix processors
  x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
  x86/microcode: Announce reload operation's completion
  x86/hyperv: Prevent potential NULL pointer dereference
  x86/hpet: Prevent potential NULL pointer dereference
  x86/lib: Fix indentation issue, remove extra tab
  x86/boot: Restrict header scope to make Clang happy
  x86/mm: Don't leak kernel addresses
  x86/cpufeature: Fix various quality problems in the <asm/cpu_device_hd.h> header
2019-03-24 11:12:27 -07:00
Linus Torvalds
e08fef881d Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "A set of fixes for the interrupt subsystem:

   - Remove secondary GIC support on systems w/o device-tree support

   - A set of small fixlets in various irqchip drivers

   - static and fall-through annotations

   - Kernel doc and typo fixes"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Mark expected switch case fall-through
  genirq/devres: Remove excess parameter from kernel doc
  irqchip/irq-mvebu-sei: Make mvebu_sei_ap806_caps static
  irqchip/mbigen: Don't clear eventid when freeing an MSI
  irqchip/stm32: Don't set rising configuration registers at init
  irqchip/stm32: Don't clear rising/falling config registers at init
  dt-bindings: irqchip: renesas-irqc: Document r8a774c0 support
  irqchip/mmp: Make mmp_irq_domain_ops static
  irqchip/brcmstb-l2: Make two init functions static
  genirq: Fix typo in comment of IRQD_MOVE_PCNTXT
  irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp
  irqchip/gic: Drop support for secondary GIC in non-DT systems
  irqchip/imx-irqsteer: Fix of_property_read_u32() error handling
2019-03-24 10:51:23 -07:00
Linus Torvalds
e0046bb302 Merge tag 'auxdisplay-for-linus-v5.1-rc2' of git://github.com/ojeda/linux
Pull auxdisplay updates from Miguel Ojeda:
 "A few fixes and improvements for auxdisplay:

   - Series to fix a memory leak in hd44780 while introducing
     charlcd_free(). From Andy Shevchenko

   - Series to clean up the Kconfig menus and a couple of improvements
     for charlcd. From Mans Rullgard"

* tag 'auxdisplay-for-linus-v5.1-rc2' of git://github.com/ojeda/linux:
  auxdisplay: charlcd: make backlight initial state configurable
  auxdisplay: charlcd: simplify init message display
  auxdisplay: deconfuse configuration
  auxdisplay: hd44780: Convert to use charlcd_free()
  auxdisplay: panel: Convert to use charlcd_free()
  auxdisplay: charlcd: Introduce charlcd_free() helper
  auxdisplay: charlcd: Move to_priv() to charlcd namespace
  auxdisplay: hd44780: Fix memory leak on ->remove()
2019-03-24 09:51:55 -07:00
Linus Torvalds
1bdd3dbfff Merge tag 'io_uring-20190323' of git://git.kernel.dk/linux-block
Pull io_uring fixes and improvements from Jens Axboe:
 "The first five in this series are heavily inspired by the work Al did
  on the aio side to fix the races there.

  The last two re-introduce a feature that was in io_uring before it got
  merged, but which I pulled since we didn't have a good way to have
  BVEC iters that already have a stable reference. These aren't
  necessarily related to block, it's just how io_uring pins fixed
  buffers"

* tag 'io_uring-20190323' of git://git.kernel.dk/linux-block:
  block: add BIO_NO_PAGE_REF flag
  iov_iter: add ITER_BVEC_FLAG_NO_REF flag
  io_uring: mark me as the maintainer
  io_uring: retry bulk slab allocs as single allocs
  io_uring: fix poll races
  io_uring: fix fget/fput handling
  io_uring: add prepped flag
  io_uring: make io_read/write return an integer
  io_uring: use regular request ref counts
2019-03-23 10:25:12 -07:00
Linus Torvalds
2335cbe648 Merge tag 'for-linus-20190323' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A set of fixes/changes that should go into this series. This contains:

   - Kernel doc / comment updates (Bart, Shenghui)

   - Un-export of core-only used function (Bart)

   - Fix race on loop file access (Dongli)

   - pf/pcd queue cleanup fixes (me)

   - Use appropriate helper for RESTART bit set (Yufen)

   - Use named identifier for classic poll (Yufen)"

* tag 'for-linus-20190323' of git://git.kernel.dk/linux-block:
  sbitmap: trivial - update comment for sbitmap_deferred_clear_bit
  blkcg: Fix kernel-doc warnings
  blk-iolatency: #include "blk.h"
  block: Unexport blk_mq_add_to_requeue_list()
  block: add BLK_MQ_POLL_CLASSIC for hybrid poll and return EINVAL for unexpected value
  blk-mq: remove unused 'nr_expired' from blk_mq_hw_ctx
  loop: access lo_backing_file only when the loop device is Lo_bound
  blk-mq: use blk_mq_sched_mark_restart_hctx to set RESTART
  paride/pcd: cleanup queues when detection fails
  paride/pf: cleanup queues when detection fails
2019-03-23 10:14:42 -07:00
Linus Torvalds
9a1050ad83 Merge tag 'ceph-for-5.1-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
 "A follow up for the new alloc_size logic and a blacklisting fix,
  marked for stable"

* tag 'ceph-for-5.1-rc2' of git://github.com/ceph/ceph-client:
  rbd: drop wait_for_latest_osdmap()
  libceph: wait for latest osdmap in ceph_monc_blacklist_add()
  rbd: set io_min, io_opt and discard_granularity to alloc_size
2019-03-23 10:04:47 -07:00
Kairui Song
ffc8599aa9 x86/gart: Exclude GART aperture from kcore
On machines where the GART aperture is mapped over physical RAM,
/proc/kcore contains the GART aperture range. Accessing the GART range via
/proc/kcore results in a kernel crash.

vmcore used to have the same issue, until it was fixed with commit
2a3e83c6f9 ("x86/gart: Exclude GART aperture from vmcore")', leveraging
existing hook infrastructure in vmcore to let /proc/vmcore return zeroes
when attempting to read the aperture region, and so it won't read from the
actual memory.

Apply the same workaround for kcore. First implement the same hook
infrastructure for kcore, then reuse the hook functions introduced in the
previous vmcore fix. Just with some minor adjustment, rename some functions
for more general usage, and simplify the hook infrastructure a bit as there
is no module usage yet.

Suggested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Dave Young <dyoung@redhat.com>
Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com
2019-03-23 12:11:49 +01:00
Shenghui Wang
1e4471e74c sbitmap: trivial - update comment for sbitmap_deferred_clear_bit
"sbitmap_batch_clear" should be "sbitmap_deferred_clear"

Acked-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-22 11:01:02 -06:00
Davide Caratti
ee3bbfe806 net/sched: let actions use RCU to access 'goto_chain'
use RCU when accessing the action chain, to avoid use after free in the
traffic path when 'goto chain' is replaced on existing TC actions (see
script below). Since the control action is read in the traffic path
without holding the action spinlock, we need to explicitly ensure that
a->goto_chain is not NULL before dereferencing (i.e it's not sufficient
to rely on the value of TC_ACT_GOTO_CHAIN bits). Not doing so caused NULL
dereferences in tcf_action_goto_chain_exec() when the following script:

 # tc chain add dev dd0 chain 42 ingress protocol ip flower \
 > ip_proto udp action pass index 4
 # tc filter add dev dd0 ingress protocol ip flower \
 > ip_proto udp action csum udp goto chain 42 index 66
 # tc chain del dev dd0 chain 42 ingress
 (start UDP traffic towards dd0)
 # tc action replace action csum udp pass index 66

was run repeatedly for several hours.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Suggested-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:42 -07:00
Davide Caratti
fe384e2fa3 net/sched: don't dereference a->goto_chain to read the chain index
callers of tcf_gact_goto_chain_index() can potentially read an old value
of the chain index, or even dereference a NULL 'goto_chain' pointer,
because 'goto_chain' and 'tcfa_action' are read in the traffic path
without caring of concurrent write in the control path. The most recent
value of chain index can be read also from a->tcfa_action (it's encoded
there together with TC_ACT_GOTO_CHAIN bits), so we don't really need to
dereference 'goto_chain': just read the chain id from the control action.

Fixes: e457d86ada ("net: sched: add couple of goto_chain helpers")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:42 -07:00
Davide Caratti
85d0966fa5 net/sched: prepare TC actions to properly validate the control action
- pass a pointer to struct tcf_proto in each actions's init() handler,
  to allow validating the control action, checking whether the chain
  exists and (eventually) refcounting it.
- remove code that validates the control action after a successful call
  to the action's init() handler, and replace it with a test that forbids
  addition of actions having 'goto_chain' and NULL goto_chain pointer at
  the same time.
- add tcf_action_check_ctrlact(), that will validate the control action
  and eventually allocate the action 'goto_chain' within the init()
  handler.
- add tcf_action_set_ctrlact(), that will assign the control action and
  swap the current 'goto_chain' pointer with the new given one.

This disallows 'goto_chain' on actions that don't initialize it properly
in their init() handler, i.e. calling tcf_action_check_ctrlact() after
successful IDR reservation and then calling tcf_action_set_ctrlact()
to assign 'goto_chain' and 'tcf_action' consistently.

By doing this, the kernel does not leak anymore refcounts when a valid
'goto chain' handle is replaced in TC actions, causing kmemleak splats
like the following one:

 # tc chain add dev dd0 chain 42 ingress protocol ip flower \
 > ip_proto tcp action drop
 # tc chain add dev dd0 chain 43 ingress protocol ip flower \
 > ip_proto udp action drop
 # tc filter add dev dd0 ingress matchall \
 > action gact goto chain 42 index 66
 # tc filter replace dev dd0 ingress matchall \
 > action gact goto chain 43 index 66
 # echo scan >/sys/kernel/debug/kmemleak
 <...>
 unreferenced object 0xffff93c0ee09f000 (size 1024):
 comm "tc", pid 2565, jiffies 4295339808 (age 65.426s)
 hex dump (first 32 bytes):
   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
   00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00  ................
 backtrace:
   [<000000009b63f92d>] tc_ctl_chain+0x3d2/0x4c0
   [<00000000683a8d72>] rtnetlink_rcv_msg+0x263/0x2d0
   [<00000000ddd88f8e>] netlink_rcv_skb+0x4a/0x110
   [<000000006126a348>] netlink_unicast+0x1a0/0x250
   [<00000000b3340877>] netlink_sendmsg+0x2c1/0x3c0
   [<00000000a25a2171>] sock_sendmsg+0x36/0x40
   [<00000000f19ee1ec>] ___sys_sendmsg+0x280/0x2f0
   [<00000000d0422042>] __sys_sendmsg+0x5e/0xa0
   [<000000007a6c61f9>] do_syscall_64+0x5b/0x180
   [<00000000ccd07542>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
   [<0000000013eaa334>] 0xffffffffffffffff

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Thomas Gleixner
3ce8461f45 Merge tag 'irqchip-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip updates for 5.1 from Marc Zyngier:

 - irqsteer error handling fix
 - GICv3 range coalescing fix
 - stm32 coprocessor coexistence fixes
 - mbigen MSI teardown fix
 - non-DT secondary GIC infrastructure removed
 - various cleanups (brcmstb-l2, mmp)
 - new DT bindings (r8a774c0)
2019-03-21 12:30:54 +01:00
Peter Xu
551417af91 genirq: Fix typo in comment of IRQD_MOVE_PCNTXT
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Dou Liyang <douliyangs@gmail.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Link: https://lkml.kernel.org/r/20190318065123.11862-1-peterx@redhat.com
2019-03-21 11:52:37 +01:00
Bart Van Assche
e6c987120e block: Unexport blk_mq_add_to_requeue_list()
This function is not used outside the block layer core. Hence unexport it.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-20 14:19:36 -06:00
Yufen Yu
29ece8b435 block: add BLK_MQ_POLL_CLASSIC for hybrid poll and return EINVAL for unexpected value
For q->poll_nsec == -1, means doing classic poll, not hybrid poll.
We introduce a new flag BLK_MQ_POLL_CLASSIC to replace -1, which
may make code much easier to read.

Additionally, since val is an int obtained with kstrtoint(), val can be
a negative value other than -1, so return -EINVAL for that case.

Thanks to Damien Le Moal for some good suggestion.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-20 14:02:07 -06:00
Ilya Dryomov
bb229bbb3b libceph: wait for latest osdmap in ceph_monc_blacklist_add()
Because map updates are distributed lazily, an OSD may not know about
the new blacklist for quite some time after "osd blacklist add" command
is completed.  This makes it possible for a blacklisted but still alive
client to overwrite a post-blacklist update, resulting in data
corruption.

Waiting for latest osdmap in ceph_monc_blacklist_add() and thus using
the post-blacklist epoch for all post-blacklist requests ensures that
all such requests "wait" for the blacklist to come into force on their
respective OSDs.

Cc: stable@vger.kernel.org
Fixes: 6305a3b415 ("libceph: support for blacklisting clients")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-03-20 16:27:40 +01:00
Dongli Zhang
9496c015ed blk-mq: remove unused 'nr_expired' from blk_mq_hw_ctx
There is no usage of 'nr_expired'.

The 'nr_expired' was introduced by commit 1d9bd5161b ("blk-mq: replace
timeout synchronization with a RCU and generation based scheme"). Its usage
was removed since commit 12f5b93145 ("blk-mq: Remove generation
seqeunce").

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-19 09:04:06 -06:00
Xin Long
273160ffc6 sctp: get sctphdr by offset in sctp_compute_cksum
sctp_hdr(skb) only works when skb->transport_header is set properly.

But in Netfilter, skb->transport_header for ipv6 is not guaranteed
to be right value for sctphdr. It would cause to fail to check the
checksum for sctp packets.

So fix it by using offset, which is always right in all places.

v1->v2:
  - Fix the changelog.

Fixes: e6d8b64b34 ("net: sctp: fix and consolidate SCTP checksumming code")
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-18 18:16:12 -07:00
Maxime Chevallier
a4dc6a4915 packets: Always register packet sk in the same order
When using fanouts with AF_PACKET, the demux functions such as
fanout_demux_cpu will return an index in the fanout socket array, which
corresponds to the selected socket.

The ordering of this array depends on the order the sockets were added
to a given fanout group, so for FANOUT_CPU this means sockets are bound
to cpus in the order they are configured, which is OK.

However, when stopping then restarting the interface these sockets are
bound to, the sockets are reassigned to the fanout group in the reverse
order, due to the fact that they were inserted at the head of the
interface's AF_PACKET socket list.

This means that traffic that was directed to the first socket in the
fanout group is now directed to the last one after an interface restart.

In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
socket that used to receive traffic from the last CPU after an interface
restart.

This commit introduces a helper to add a socket at the tail of a list,
then uses it to register AF_PACKET sockets.

Note that this changes the order in which sockets are listed in /proc and
with sock_diag.

Fixes: dc99f60069 ("packet: Add fanout support")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-18 17:58:39 -07:00
Jens Axboe
399254aaf4 block: add BIO_NO_PAGE_REF flag
If bio_iov_iter_get_pages() is called on an iov_iter that is flagged
with NO_REF, then we don't need to add a page reference for the pages
that we add.

Add BIO_NO_PAGE_REF to track this in the bio, so IO completion knows
not to drop a reference to these pages.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-18 10:44:48 -06:00
Jens Axboe
875f1d0769 iov_iter: add ITER_BVEC_FLAG_NO_REF flag
For ITER_BVEC, if we're holding on to kernel pages, the caller
doesn't need to grab a reference to the bvec pages, and drop that
same reference on IO completion. This is essentially safe for any
ITER_BVEC, but some use cases end up reusing pages and uncondtionally
dropping a page reference on completion. And example of that is
sendfile(2), that ends up being a splice_in + splice_out on the
pipe pages.

Add a flag that tells us it's fine to not grab a page reference
to the bvec pages, since that caller knows not to drop a reference
when it's done with the pages.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-18 10:44:48 -06:00
Yishai Hadas
c5ae1954c4 IB/mlx5: Use mlx5 core to create/destroy a DEVX DCT
To prevent a hardware memory leak when a DEVX DCT object is destroyed
without calling DRAIN DCT before, (e.g. under cleanup flow), need to
manage its creation and destruction via mlx5 core.

In that case the DRAIN DCT command will be called and only once that it
will be completed the DESTROY DCT command will be called.  Otherwise, the
DESTROY DCT may fail and a hardware leak may occur.

As of that change the DRAIN DCT command should not be exposed any more
from DEVX, it's managed internally by the driver to work as expected by
the device specification.

Fixes: 7efce3691d ("IB/mlx5: Add obj create and destroy functionality")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-17 21:40:39 -03:00
Linus Torvalds
28d747f266 Merge tag 'kbuild-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:

 - add more Build-Depends to Debian source package

 - prefix header search paths with $(srctree)/

 - make modpost show verbose section mismatch warnings

 - avoid hard-coded CROSS_COMPILE for h8300

 - fix regression for Debian make-kpkg command

 - add semantic patch to detect missing put_device()

 - fix some warnings of 'make deb-pkg'

 - optimize NOSTDINC_FLAGS evaluation

 - add warnings about redundant generic-y

 - clean up Makefiles and scripts

* tag 'kbuild-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: remove stale lxdialog/.gitignore
  kbuild: force all architectures except um to include mandatory-y
  kbuild: warn redundant generic-y
  Revert "modsign: Abort modules_install when signing fails"
  kbuild: Make NOSTDINC_FLAGS a simply expanded variable
  kbuild: deb-pkg: avoid implicit effects
  coccinelle: semantic code search for missing put_device()
  kbuild: pkg: grep include/config/auto.conf instead of $KCONFIG_CONFIG
  kbuild: deb-pkg: introduce is_enabled and if_enabled_echo to builddeb
  kbuild: deb-pkg: add CONFIG_ prefix to kernel config options
  kbuild: add workaround for Debian make-kpkg
  kbuild: source include/config/auto.conf instead of ${KCONFIG_CONFIG}
  unicore32: simplify linker script generation for decompressor
  h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux-
  kbuild: move archive command to scripts/Makefile.lib
  modpost: always show verbose warning for section mismatch
  ia64: prefix header search path with $(srctree)/
  libfdt: prefix header search paths with $(srctree)/
  deb-pkg: generate correct build dependencies
2019-03-17 13:25:26 -07:00
Andy Shevchenko
8e44fc8506 auxdisplay: charlcd: Introduce charlcd_free() helper
The charlcd_free() is a counterpart to charlcd_alloc()
and should be called symmetrically on tear down.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2019-03-17 08:48:16 +01:00
Masahiro Yamada
037fc3368b kbuild: force all architectures except um to include mandatory-y
Currently, every arch/*/include/uapi/asm/Kbuild explicitly includes
the common Kbuild.asm file. Factor out the duplicated include directives
to scripts/Makefile.asm-generic so that no architecture would opt out
of the mandatory-y mechanism.

um is not forced to include mandatory-y since it is a very exceptional
case which does not support UAPI.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-03-17 12:56:32 +09:00
Linus Torvalds
a9dce6679d Merge tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd system call from Christian Brauner:
 "This introduces the ability to use file descriptors from /proc/<pid>/
  as stable handles on struct pid. Even if a pid is recycled the handle
  will not change. For a start these fds can be used to send signals to
  the processes they refer to.

  With the ability to use /proc/<pid> fds as stable handles on struct
  pid we can fix a long-standing issue where after a process has exited
  its pid can be reused by another process. If a caller sends a signal
  to a reused pid it will end up signaling the wrong process.

  With this patchset we enable a variety of use cases. One obvious
  example is that we can now safely delegate an important part of
  process management - sending signals - to processes other than the
  parent of a given process by sending file descriptors around via scm
  rights and not fearing that the given process will have been recycled
  in the meantime. It also allows for easy testing whether a given
  process is still alive or not by sending signal 0 to a pidfd which is
  quite handy.

  There has been some interest in this feature e.g. from systems
  management (systemd, glibc) and container managers. I have requested
  and gotten comments from glibc to make sure that this syscall is
  suitable for their needs as well. In the future I expect it to take on
  most other pid-based signal syscalls. But such features are left for
  the future once they are needed.

  This has been sitting in linux-next for quite a while and has not
  caused any issues. It comes with selftests which verify basic
  functionality and also test that a recycled pid cannot be signaled via
  a pidfd.

  Jon has written about a prior version of this patchset. It should
  cover the basic functionality since not a lot has changed since then:

      https://lwn.net/Articles/773459/

  The commit message for the syscall itself is extensively documenting
  the syscall, including it's functionality and extensibility"

* tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  selftests: add tests for pidfd_send_signal()
  signal: add pidfd_send_signal() syscall
2019-03-16 13:47:14 -07:00
Linus Torvalds
f67e3fb489 Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull device-dax updates from Dan Williams:
 "New device-dax infrastructure to allow persistent memory and other
  "reserved" / performance differentiated memories, to be assigned to
  the core-mm as "System RAM".

  Some users want to use persistent memory as additional volatile
  memory. They are willing to cope with potential performance
  differences, for example between DRAM and 3D Xpoint, and want to use
  typical Linux memory management apis rather than a userspace memory
  allocator layered over an mmap() of a dax file. The administration
  model is to decide how much Persistent Memory (pmem) to use as System
  RAM, create a device-dax-mode namespace of that size, and then assign
  it to the core-mm. The rationale for device-dax is that it is a
  generic memory-mapping driver that can be layered over any "special
  purpose" memory, not just pmem. On subsequent boots udev rules can be
  used to restore the memory assignment.

  One implication of using pmem as RAM is that mlock() no longer keeps
  data off persistent media. For this reason it is recommended to enable
  NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
  at rest. We considered making this recommendation an actively enforced
  requirement, but in the end decided to leave it as a distribution /
  administrator policy to allow for emulation and test environments that
  lack security capable NVDIMMs.

  Summary:

   - Replace the /sys/class/dax device model with /sys/bus/dax, and
     include a compat driver so distributions can opt-in to the new ABI.

   - Allow for an alternative driver for the device-dax address-range

   - Introduce the 'kmem' driver to hotplug / assign a device-dax
     address-range to the core-mm.

   - Arrange for the device-dax target-node to be onlined so that the
     newly added memory range can be uniquely referenced by numa apis"

NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
we currently have special - and very annoying rules in the kernel about
accessing PMEM only with the "MC safe" accessors, because machine checks
inside the regular repeat string copy functions can be fatal in some
(not described) circumstances.

And apparently the PMEM modules can cause that a lot more than regular
RAM.  The argument is that this happens because PMEM doesn't necessarily
get scrubbed at boot like RAM does, but that is planned to be added for
the user space tooling.

Quoting Dan from another email:
 "The exposure can be reduced in the volatile-RAM case by scanning for
  and clearing errors before it is onlined as RAM. The userspace tooling
  for that can be in place before v5.1-final. There's also runtime
  notifications of errors via acpi_nfit_uc_error_notify() from
  background scrubbers on the DIMM devices. With that mechanism the
  kernel could proactively clear newly discovered poison in the volatile
  case, but that would be additional development more suitable for v5.2.

  I understand the concern, and the need to highlight this issue by
  tapping the brakes on feature development, but I don't see PMEM as RAM
  making the situation worse when the exposure is also there via DAX in
  the PMEM case. Volatile-RAM is arguably a safer use case since it's
  possible to repair pages where the persistent case needs active
  application coordination"

* tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: "Hotplug" persistent memory for use like normal RAM
  mm/resource: Let walk_system_ram_range() search child resources
  mm/memory-hotplug: Allow memory resources to be children
  mm/resource: Move HMM pr_debug() deeper into resource code
  mm/resource: Return real error codes from walk failures
  device-dax: Add a 'modalias' attribute to DAX 'bus' devices
  device-dax: Add a 'target_node' attribute
  device-dax: Auto-bind device after successful new_id
  acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
  device-dax: Add /sys/class/dax backwards compatibility
  device-dax: Add support for a dax override driver
  device-dax: Move resource pinning+mapping into the common driver
  device-dax: Introduce bus + driver model
  device-dax: Start defining a dax bus model
  device-dax: Remove multi-resource infrastructure
  device-dax: Kill dax_region base
  device-dax: Kill dax_region ida
2019-03-16 13:05:32 -07:00
Linus Torvalds
465c209db8 Merge tag 'nfs-for-5.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Bugfixes:
   - Fix an Oops in SUNRPC back channel tracepoints
   - Fix a SUNRPC client regression when handling oversized replies
   - Fix the minimal size for SUNRPC reply buffer allocation
   - rpc_decode_header() must always return a non-zero value on error
   - Fix a typo in pnfs_update_layout()

  Cleanup:
   - Remove redundant check for the reply length in call_decode()"

* tag 'nfs-for-5.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Remove redundant check for the reply length in call_decode()
  SUNRPC: Handle the SYSTEM_ERR rpc error
  SUNRPC: rpc_decode_header() must always return a non-zero value on error
  SUNRPC: Use the ENOTCONN error on socket disconnect
  SUNRPC: Fix the minimal size for reply buffer allocation
  SUNRPC: Fix a client regression when handling oversized replies
  pNFS: Fix a typo in pnfs_update_layout
  fix null pointer deref in tracepoints in back channel
2019-03-16 12:28:18 -07:00
David S. Miller
0aedadcf6b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2019-03-16

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a umem memory leak on cleanup in AF_XDP, from Björn.

2) Fix BTF to properly resolve forward-declared enums into their corresponding
   full enum definition types during deduplication, from Andrii.

3) Fix libbpf to reject invalid flags in xsk_socket__create(), from Magnus.

4) Fix accessing invalid pointer returned from bpf_tcp_sock() and
   bpf_sk_fullsock() after bpf_sk_release() was called, from Martin.

5) Fix generation of load/store DW instructions in PPC JIT, from Naveen.

6) Various fixes in BPF helper function documentation in bpf.h UAPI header
   used to bpf-helpers(7) man page, from Quentin.

7) Fix segfault in BPF test_progs when prog loading failed, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-16 12:20:08 -07:00
Björn Töpel
044175a067 xsk: fix umem memory leak on cleanup
When the umem is cleaned up, the task that created it might already be
gone. If the task was gone, the xdp_umem_release function did not free
the pages member of struct xdp_umem.

It turned out that the task lookup was not needed at all; The code was
a left-over when we moved from task accounting to user accounting [1].

This patch fixes the memory leak by removing the task lookup logic
completely.

[1] https://lore.kernel.org/netdev/20180131135356.19134-3-bjorn.topel@gmail.com/

Link: https://lore.kernel.org/netdev/c1cb2ca8-6a14-3980-8672-f3de0bb38dfd@suse.cz/
Fixes: c0c77d8fb7 ("xsk: add user memory registration support sockopt")
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-16 01:27:51 +01:00
Pedro Tammela
8a3c245c03 net: add documentation to socket.c
Adds missing sphinx documentation to the
socket.c's functions. Also fixes some whitespaces.

I also changed the style of older documentation as an
effort to have an uniform documentation style.

Signed-off-by: Pedro Tammela <pctammela@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-15 15:29:47 -07:00
Linus Torvalds
636deed6c0 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "ARM:
   - some cleanups
   - direct physical timer assignment
   - cache sanitization for 32-bit guests

  s390:
   - interrupt cleanup
   - introduction of the Guest Information Block
   - preparation for processor subfunctions in cpu models

  PPC:
   - bug fixes and improvements, especially related to machine checks
     and protection keys

  x86:
   - many, many cleanups, including removing a bunch of MMU code for
     unnecessary optimizations
   - AVIC fixes

  Generic:
   - memcg accounting"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits)
  kvm: vmx: fix formatting of a comment
  KVM: doc: Document the life cycle of a VM and its resources
  MAINTAINERS: Add KVM selftests to existing KVM entry
  Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"
  KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
  KVM: PPC: Fix compilation when KVM is not enabled
  KVM: Minor cleanups for kvm_main.c
  KVM: s390: add debug logging for cpu model subfunctions
  KVM: s390: implement subfunction processor calls
  arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
  KVM: arm/arm64: Remove unused timer variable
  KVM: PPC: Book3S: Improve KVM reference counting
  KVM: PPC: Book3S HV: Fix build failure without IOMMU support
  Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
  x86: kvmguest: use TSC clocksource if invariant TSC is exposed
  KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start
  KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter
  KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns
  KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes()
  KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children
  ...
2019-03-15 15:00:28 -07:00
Linus Torvalds
aa2e3ac64a Merge tag 'trace-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes and cleanups from Steven Rostedt:
 "This contains a series of last minute clean ups, small fixes and error
  checks"

* tag 'trace-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/probe: Verify alloc_trace_*probe() result
  tracing/probe: Check event/group naming rule at parsing
  tracing/probe: Check the size of argument name and body
  tracing/probe: Check event name length correctly
  tracing/probe: Check maxactive error cases
  tracing: kdb: Fix ftdump to not sleep
  trace/probes: Remove kernel doc style from non kernel doc comment
  tracing/probes: Make reserved_field_names static
2019-03-15 14:47:02 -07:00
Linus Torvalds
0be2886307 Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:

 - An improvement from Ard Biesheuvel, who noted that the identity map
   setup was taking a long time due to flush_cache_louis().

 - Update a comment about dma_ops from Wolfram Sang.

 - Remove use of "-p" with ld, where this flag has been a no-op since
   2004.

 - Remove the printing of the virtual memory layout, which is no longer
   useful since we hide pointers.

 - Correct SCU help text.

 - Remove legacy TWD registration method.

 - Add pgprot_device() implementation for mapping PCI sysfs resource
   files.

 - Initialise PFN limits earlier for kmemleak.

 - Fix argument count to match macro definition (affects clang builds)

 - Use unified assembler language almost everywhere for clang, and other
   clang improvements (from Stefan Agner, Nathan Chancellor).

 - Support security extension for noMMU and other noMMU cleanups (from
   Vladimir Murzin).

 - Remove unnecessary SMP bringup code (which was incorrectly copy'n'
   pasted from the ARM platform implementations) and remove it from the
   arch code to discourge further copys of it appearing.

 - Add Cortex A9 erratum preventing kexec working on some SoCs.

 - AMBA bus identification updates from Mike Leach.

 - More use of raw spinlocks to avoid -RT kernel issues (from Yang Shi
   and Sebastian Andrzej Siewior).

 - MCPM hyp/svc mode mismatch fixes from Marek Szyprowski.

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (32 commits)
  ARM: 8849/1: NOMMU: Fix encodings for PMSAv8's PRBAR4/PRLAR4
  ARM: 8848/1: virt: Align GIC version check with arm64 counterpart
  ARM: 8847/1: pm: fix HYP/SVC mode mismatch when MCPM is used
  ARM: 8845/1: use unified assembler in c files
  ARM: 8844/1: use unified assembler in assembly files
  ARM: 8843/1: use unified assembler in headers
  ARM: 8841/1: use unified assembler in macros
  ARM: 8840/1: use a raw_spinlock_t in unwind
  ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t
  ARM: 8837/1: coresight: etmv4: Update ID register table to add UCI support
  ARM: 8836/1: drivers: amba: Update component matching to use the CoreSight UCI values.
  ARM: 8838/1: drivers: amba: Updates to component identification for driver matching.
  ARM: 8833/1: Ensure that NEON code always compiles with Clang
  ARM: avoid Cortex-A9 livelock on tight dmb loops
  ARM: smp: remove arch-provided "pen_release"
  ARM: actions: remove boot_lock and pen_release
  ARM: oxnas: remove CPU hotplug implementation
  ARM: qcom: remove unnecessary boot_lock
  ARM: 8832/1: NOMMU: Limit visibility for CONFIG_FLASH_{MEM_BASE,SIZE}
  ARM: 8831/1: NOMMU: pmsa-v8: remove unneeded semicolon
  ...
2019-03-15 14:37:46 -07:00
Linus Torvalds
e8a71a3866 Merge tag 'ntb-5.1' of git://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:

 - fixes for switchtec debugability and mapping table entries

 - NTB transport improvements

 - a reworking of the peer_db_addr for better abstraction

* tag 'ntb-5.1' of git://github.com/jonmason/ntb:
  NTB: add new parameter to peer_db_addr() db_bit and db_data
  NTB: ntb_transport: Ensure the destination buffer is mapped for TX DMA
  NTB: ntb_transport: Free MWs in ntb_transport_link_cleanup()
  ntb_hw_switchtec: Added support of >=4G memory windows
  ntb_hw_switchtec: NT req id mapping table register entry number should be 512
  ntb_hw_switchtec: debug print 64bit aligned crosslink BAR Numbers
2019-03-15 14:32:59 -07:00
Linus Torvalds
2dbb0e6c19 Merge tag 'sound-fix-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "Some cleaning after the first batch; mostly about HD-audio quirks but
  also some NULL dereference fixes in corner cases and a random build
  error fix, too"

* tag 'sound-fix-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek - Add support headset mode for New DELL WYSE NB
  ALSA: hda/realtek - Add support headset mode for DELL WYSE AIO
  ALSA: hda/realtek: merge alc_fixup_headset_jack to alc295_fixup_chromebook
  ALSA: pcm: Fix function name in kernel-doc comment
  ALSA: hda: hdmi - add Icelake support
  ALSA: hda - add more quirks for HP Z2 G4 and HP Z240
  ALSA: hda/realtek - Fixed Headset Mic JD not stable
  ALSA: hda/realtek: Enable headset MIC of Acer TravelMate X514-51T with ALC255
  ALSA: hda/tegra: avoid build error without CONFIG_PM
  ALSA: usx2y: Fix potential NULL pointer dereference
  ALSA: hda: Avoid NULL pointer dereference at snd_hdac_stream_start()
2019-03-15 14:05:00 -07:00
Linus Torvalds
8264fd046a Merge tag 'drm-next-2019-03-15' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes and updates from Dave Airlie:
 "A few various fixes pulls and one late etnaviv pull but it was nearly
  all fixes anyways.

  etnaviv:
   - late next pull
   - mmu mapping fix
   - build non-ARM arches
   - misc fixes

  i915:
   - HDCP state handling fix
   - shrinker interaction fix
   - atomic state leak fix

  qxl:
   - kick out framebuffers early fix

  amdgpu:
   - Powerplay fixes
   - DC fixes
   - BACO turned off for now on vega20
   - Locking fix
   - KFD MQD fix
   - gfx9 golden register updates"

* tag 'drm-next-2019-03-15' of git://anongit.freedesktop.org/drm/drm: (43 commits)
  drm/amdgpu: Update gc golden setting for vega family
  drm/amd/powerplay: correct power reading on fiji
  drm/amd/powerplay: set max fan target temperature as 105C
  drm/i915: Relax mmap VMA check
  drm/i915: Fix atomic state leak when resetting HDMI link
  drm/i915: Acquire breadcrumb ref before cancelling
  drm/i915/selftests: Always free spinner on __sseu_prepare error
  drm/i915: Reacquire priolist cache after dropping the engine lock
  drm/i915: Protect i915_active iterators from the shrinker
  drm/i915: HDCP state handling in ddi_update_pipe
  drm/qxl: remove conflicting framebuffers earlier
  drm/fb-helper: call vga_remove_vgacon automatically.
  drm: move i915_kick_out_vgacon to vgaarb
  drm/amd/display: don't call dm_pp_ function from an fpu block
  drm: add __user attribute to ptr_to_compat()
  drm/amdgpu: clear PDs/PTs only after initializing them
  drm/amd/display: Pass app_tf by value rather than by reference
  Revert "drm/amdgpu: use BACO reset on vega20 if platform support"
  drm/amd/powerplay: show the right override pcie parameters
  drm/amd/powerplay: honor the OD settings
  ...
2019-03-15 13:58:35 -07:00
Linus Torvalds
5160bcce5c Merge tag 'f2fs-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "We've continued mainly to fix bugs in this round, as f2fs has been
  shipped in more devices. Especially, we've focused on stabilizing
  checkpoint=disable feature, and provided some interfaces for QA.

  Enhancements:
   - expose FS_NOCOW_FL for pin_file
   - run discard jobs at unmount time with timeout
   - tune discarding thread to avoid idling which consumes power
   - some checking codes to address vulnerabilities
   - give random value to i_generation
   - shutdown with more flags for QA

  Bug fixes:
   - clean up stale objects when mount is failed along with
     checkpoint=disable
   - fix system being stuck due to wrong count by atomic writes
   - handle some corrupted disk cases
   - fix a deadlock in f2fs_read_inline_dir

  We've also added some minor build error fixes and clean-up patches"

* tag 'f2fs-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (53 commits)
  f2fs: set pin_file under CAP_SYS_ADMIN
  f2fs: fix to avoid deadlock in f2fs_read_inline_dir()
  f2fs: fix to adapt small inline xattr space in __find_inline_xattr()
  f2fs: fix to do sanity check with inode.i_inline_xattr_size
  f2fs: give some messages for inline_xattr_size
  f2fs: don't trigger read IO for beyond EOF page
  f2fs: fix to add refcount once page is tagged PG_private
  f2fs: remove wrong comment in f2fs_invalidate_page()
  f2fs: fix to use kvfree instead of kzfree
  f2fs: print more parameters in trace_f2fs_map_blocks
  f2fs: trace f2fs_ioc_shutdown
  f2fs: fix to avoid deadlock of atomic file operations
  f2fs: fix to dirty inode for i_mode recovery
  f2fs: give random value to i_generation
  f2fs: no need to take page lock in readdir
  f2fs: fix to update iostat correctly in IPU path
  f2fs: fix encrypted page memory leak
  f2fs: make fault injection covering __submit_flush_wait()
  f2fs: fix to retry fill_super only if recovery failed
  f2fs: silence VM_WARN_ON_ONCE in mempool_alloc
  ...
2019-03-15 13:42:53 -07:00
Linus Torvalds
f91f2ee54a Merge branch 'akpm' (rest of patches from Andrew)
Merge the left-over patches from Andrew Morton.

This merges the remaining two patches from Andrew's pile of "little bit
more MM".  I mulled it over, and we emailed back and forth with Josef,
and he pointed out where I was wrong.

Rule #51 of kernel maintenance: when somebody makes it clear that they
know the code better than you did, stop arguing and just apply the damn
patch.

Add a third patch by me to add a comment for the case that I had thought
was buggy and Josef corrected me on.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  filemap: add a comment about FAULT_FLAG_RETRY_NOWAIT behavior
  filemap: drop the mmap_sem for all blocking operations
  filemap: kill page_cache_read usage in filemap_fault
2019-03-15 12:00:45 -07:00
YueHaibing
9804501fa1 appletalk: Fix potential NULL pointer dereference in unregister_snap_client
register_snap_client may return NULL, all the callers
check it, but only print a warning. This will result in
NULL pointer dereference in unregister_snap_client and other
places.

It has always been used like this since v2.6

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-15 11:25:48 -07:00
Josef Bacik
a75d4c3337 filemap: kill page_cache_read usage in filemap_fault
Patch series "drop the mmap_sem when doing IO in the fault path", v6.

Now that we have proper isolation in place with cgroups2 we have started
going through and fixing the various priority inversions.  Most are all
gone now, but this one is sort of weird since it's not necessarily a
priority inversion that happens within the kernel, but rather because of
something userspace does.

We have giant applications that we want to protect, and parts of these
giant applications do things like watch the system state to determine how
healthy the box is for load balancing and such.  This involves running
'ps' or other such utilities.  These utilities will often walk
/proc/<pid>/whatever, and these files can sometimes need to
down_read(&task->mmap_sem).  Not usually a big deal, but we noticed when
we are stress testing that sometimes our protected application has latency
spikes trying to get the mmap_sem for tasks that are in lower priority
cgroups.

This is because any down_write() on a semaphore essentially turns it into
a mutex, so even if we currently have it held for reading, any new readers
will not be allowed on to keep from starving the writer.  This is fine,
except a lower priority task could be stuck doing IO because it has been
throttled to the point that its IO is taking much longer than normal.  But
because a higher priority group depends on this completing it is now stuck
behind lower priority work.

In order to avoid this particular priority inversion we want to use the
existing retry mechanism to stop from holding the mmap_sem at all if we
are going to do IO.  This already exists in the read case sort of, but
needed to be extended for more than just grabbing the page lock.  With
io.latency we throttle at submit_bio() time, so the readahead stuff can
block and even page_cache_read can block, so all these paths need to have
the mmap_sem dropped.

The other big thing is ->page_mkwrite.  btrfs is particularly shitty here
because we have to reserve space for the dirty page, which can be a very
expensive operation.  We use the same retry method as the read path, and
simply cache the page and verify the page is still setup properly the next
pass through ->page_mkwrite().

I've tested these patches with xfstests and there are no regressions.

This patch (of 3):

If we do not have a page at filemap_fault time we'll do this weird forced
page_cache_read thing to populate the page, and then drop it again and
loop around and find it.  This makes for 2 ways we can read a page in
filemap_fault, and it's not really needed.  Instead add a FGP_FOR_MMAP
flag so that pagecache_get_page() will return a unlocked page that's in
pagecache.  Then use the normal page locking and readpage logic already in
filemap_fault.  This simplifies the no page in page cache case
significantly.

[akpm@linux-foundation.org: fix comment text]
[josef@toxicpanda.com: don't unlock null page in FGP_FOR_MMAP case]
  Link: http://lkml.kernel.org/r/20190312201742.22935-1-josef@toxicpanda.com
Link: http://lkml.kernel.org/r/20181211173801.29535-2-josef@toxicpanda.com
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-15 11:21:25 -07:00
Linus Torvalds
f261c4e529 Merge branch 'akpm' (patches from Andrew)
Merge misc patches from Andrew Morton:

- a little bit more MM

- a few fixups

[ The "little bit more MM" is actually just one of the three patches
  Andrew sent for mm/filemap.c, I'm still mulling over two more of them
  from Josef Bacik     - Linus ]

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
  tools/testing/selftests/proc/proc-pid-vm.c: test with vsyscall in mind
  zram: default to lzo-rle instead of lzo
  filemap: pass vm_fault to the mmap ra helpers
2019-03-14 15:10:10 -07:00
Pi-Hsun Shih
a4046c06be include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
Use offsetof() to calculate offset of a field to take advantage of
compiler built-in version when possible, and avoid UBSAN warning when
compiling with Clang:

  UBSAN: Undefined behaviour in mm/swapfile.c:3010:38
  member access within null pointer of type 'union swap_header'
  CPU: 6 PID: 1833 Comm: swapon Tainted: G S                4.19.23 #43
  Call trace:
   dump_backtrace+0x0/0x194
   show_stack+0x20/0x2c
   __dump_stack+0x20/0x28
   dump_stack+0x70/0x94
   ubsan_epilogue+0x14/0x44
   ubsan_type_mismatch_common+0xf4/0xfc
   __ubsan_handle_type_mismatch_v1+0x34/0x54
   __se_sys_swapon+0x654/0x1084
   __arm64_sys_swapon+0x1c/0x24
   el0_svc_common+0xa8/0x150
   el0_svc_compat_handler+0x2c/0x38
   el0_svc_compat+0x8/0x18

Link: http://lkml.kernel.org/r/20190312081902.223764-1-pihsun@chromium.org
Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-14 14:36:20 -07:00
Quentin Monnet
0eb0978528 bpf: add documentation for helpers bpf_spin_lock(), bpf_spin_unlock()
Add documentation for the BPF spinlock-related helpers to the doc in
bpf.h. I added the constraints and restrictions coming with the use of
spinlocks for BPF: not all of it is directly related to the use of the
helper, but I thought it would be nice for users to find them in the man
page.

This list of restrictions is nearly a verbatim copy of the list in
Alexei's commit log for those helpers.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-14 14:03:21 -07:00
Quentin Monnet
62369db2df bpf: fix documentation for eBPF helpers
Another round of minor fixes for the documentation of the BPF helpers
located in the UAPI bpf.h header file. Changes include:

- Moving around description of some helpers, to keep the descriptions in
  the same order as helpers are declared (bpf_map_push_elem(), leftover
  from commit 90b1023f68 ("bpf: fix documentation for eBPF helpers"),
  bpf_rc_keydown(), and bpf_skb_ancestor_cgroup_id()).
- Fixing typos ("contex" -> "context").
- Harmonising return types ("void* " -> "void *", "uint64_t" -> "u64").
- Addition of the "bpf_" prefix to bpf_get_storage().
- Light additions of RST markup on some keywords.
- Empty line deletion between description and return value for
  bpf_tcp_sock().
- Edit for the description for bpf_skb_ecn_set_ce() (capital letters,
  acronym expansion, no effect if ECT not set, more details on return
  value).

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-14 14:03:21 -07:00
Linus Torvalds
9352ca585b Merge tag 'pm-5.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
 "These are mostly fixes and cleanups on top of the previously merged
  power management material for 5.1-rc1 with one cpupower utility update
  that wasn't pushed earlier due to unfortunate timing.

  Specifics:

   - Fix registration of new cpuidle governors partially broken during
     the 5.0 development cycle by mistake (Rafael Wysocki).

   - Avoid integer overflows in the menu cpuidle governor by making it
     discard the overflowing data points upfront (Rafael Wysocki).

   - Fix minor mistake in the recent update of the iowait boost
     computation in the intel_pstate driver (Rafael Wysocki).

   - Drop incorrect __init annotation from one function in the pxa2xx
     cpufreq driver (Arnd Bergmann).

   - Fix the operating performance points (OPP) framework initialization
     for devices in multiple power domains if only one of them is
     scalable (Rajendra Nayak).

   - Fix mistake in dev_pm_opp_set_rate() which causes it to skip
     updating the performance state if the new frequency is the same as
     the old one (Viresh Kumar).

   - Rework the cancellation of wakeup source timers to avoid potential
     issues with it and do some cleanups unlocked by that change (Viresh
     Kumar, Rafael Wysocki).

   - Clean up the code computing the active/suspended time of devices in
     the PM-runtime framework after recent changes (Ulf Hansson).

   - Make the power management infrastructure code use pr_fmt()
     consistently (Joe Perches).

   - Clean up the generic power domains (genpd) framework somewhat
     (Aisheng Dong).

   - Improve kerneldoc comments for two functions in the cpufreq core
     (Rafael Wysocki).

   - Fix typo in a PM QoS file description comment (Aisheng Dong).

   - Update the handling of CPU boost frequencies in the cpupower
     utility (Abhishek Goel)"

* tag 'pm-5.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: governor: Add new governors to cpuidle_governors again
  cpufreq: intel_pstate: Fix up iowait_boost computation
  PM / OPP: Update performance state when freq == old_freq
  PM / wakeup: Drop wakeup_source_drop()
  PM / wakeup: Rework wakeup source timer cancellation
  PM / domains: Remove one unnecessary blank line
  PM / Domains: Return early for all errors in _genpd_power_off()
  PM / Domains: Improve warn for multiple states but no governor
  OPP: Fix handling of multiple power domains
  PM / QoS: Fix typo in file description
  cpufreq: pxa2xx: remove incorrect __init annotation
  PM-runtime: Call pm_runtime_active|suspended_time() from sysfs
  PM-runtime: Consolidate code to get active/suspended time
  PM: Add and use pr_fmt()
  cpufreq: Improve kerneldoc comments for cpufreq_cpu_get/put()
  cpuidle: menu: Avoid overflows when computing variance
  tools/power/cpupower: Display boost frequency separately
2019-03-14 10:30:06 -07:00
Linus Torvalds
f3ca4c55a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "More fixes in the queue:

  1) Netfilter nat can erroneously register the device notifier twice,
     fix from Florian Westphal.

  2) Use after free in nf_tables, from Pablo Neira Ayuso.

  3) Parallel update of steering rule fix in mlx5 river, from Eli
     Britstein.

  4) RX processing panic in lan743x, fix from Bryan Whitehead.

  5) Use before initialization of TCP_SKB_CB, fix from Christoph Paasch.

  6) Fix locking in SRIOV mode of mlx4 driver, from Jack Morgenstein.

  7) Fix TX stalls in lan743x due to mishandling of interrupt ACKing
     modes, from Bryan Whitehead.

  8) Fix infoleak in l2tp_ip6_recvmsg(), from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  pptp: dst_release sk_dst_cache in pptp_sock_destruct
  MAINTAINERS: GENET & SYSTEMPORT: Add internal Broadcom list
  l2tp: fix infoleak in l2tp_ip6_recvmsg()
  net/tls: Inform user space about send buffer availability
  net_sched: return correct value for *notify* functions
  lan743x: Fix TX Stall Issue
  net/mlx4_core: Fix qp mtt size calculation
  net/mlx4_core: Fix locking in SRIOV mode when switching between events and polling
  net/mlx4_core: Fix reset flow when in command polling mode
  mlxsw: minimal: Initialize base_mac
  mlxsw: core: Prevent duplication during QSFP module initialization
  net: dwmac-sun8i: fix a missing check of of_get_phy_mode
  net: sh_eth: fix a missing check of of_get_phy_mode
  net: 8390: fix potential NULL pointer dereferences
  net: fujitsu: fix a potential NULL pointer dereference
  net: qlogic: fix a potential NULL pointer dereference
  isdn: hfcpci: fix potential NULL pointer dereference
  Documentation: devicetree: add a new optional property for port mac address
  net: rocker: fix a potential NULL pointer dereference
  net: qlge: fix a potential NULL pointer dereference
  ...
2019-03-14 09:28:12 -07:00