Commit Graph

949999 Commits

Author SHA1 Message Date
Will McVicker
1cc5ef91d2 netfilter: ctnetlink: add a range check for l3/l4 protonum
The indexes to the nf_nat_l[34]protos arrays come from userspace. So
check the tuple's family, e.g. l3num, when creating the conntrack in
order to prevent an OOB memory access during setup.  Here is an example
kernel panic on 4.14.180 when userspace passes in an index greater than
NFPROTO_NUMPROTO.

Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:...
Process poc (pid: 5614, stack limit = 0x00000000a3933121)
CPU: 4 PID: 5614 Comm: poc Tainted: G S      W  O    4.14.180-g051355490483
Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM
task: 000000002a3dfffe task.stack: 00000000a3933121
pc : __cfi_check_fail+0x1c/0x24
lr : __cfi_check_fail+0x1c/0x24
...
Call trace:
__cfi_check_fail+0x1c/0x24
name_to_dev_t+0x0/0x468
nfnetlink_parse_nat_setup+0x234/0x258
ctnetlink_parse_nat_setup+0x4c/0x228
ctnetlink_new_conntrack+0x590/0xc40
nfnetlink_rcv_msg+0x31c/0x4d4
netlink_rcv_skb+0x100/0x184
nfnetlink_rcv+0xf4/0x180
netlink_unicast+0x360/0x770
netlink_sendmsg+0x5a0/0x6a4
___sys_sendmsg+0x314/0x46c
SyS_sendmsg+0xb4/0x108
el0_svc_naked+0x34/0x38

This crash is not happening since 5.4+, however, ctnetlink still
allows for creating entries with unsupported layer 3 protocol number.

Fixes: c1d10adb4a ("[NETFILTER]: Add ctnetlink port for nf_conntrack")
Signed-off-by: Will McVicker <willmcvicker@google.com>
[pablo@netfilter.org: rebased original patch on top of nf.git]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-08 11:41:43 +02:00
Dexuan Cui
19162fd406 hv_netvsc: Fix hibernation for mlx5 VF driver
mlx5_suspend()/resume() keep the network interface, so during hibernation
netvsc_unregister_vf() and netvsc_register_vf() are not called, and hence
netvsc_resume() should call netvsc_vf_changed() to switch the data path
back to the VF after hibernation. Note: after we close and re-open the
vmbus channel of the netvsc NIC in netvsc_suspend() and netvsc_resume(),
the data path is implicitly switched to the netvsc NIC. Similarly,
netvsc_suspend() should not call netvsc_unregister_vf(), otherwise the VF
can no longer be used after hibernation.

For mlx4, since the VF network interafce is explicitly destroyed and
re-created during hibernation (see mlx4_suspend()/resume()), hv_netvsc
already explicitly switches the data path from and to the VF automatically
via netvsc_register_vf() and netvsc_unregister_vf(), so mlx4 doesn't need
this fix. Note: mlx4 can still work with the fix because in
netvsc_suspend()/resume() ndev_ctx->vf_netdev is NULL for mlx4.

Fixes: 0efeea5fb1 ("hv_netvsc: Add the support of hibernation")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 21:04:36 -07:00
Taehee Yoo
e1f469cd58 Revert "netns: don't disable BHs when locking "nsid_lock""
This reverts commit 8d7e5dee97.

To protect netns id, the nsid_lock is used when netns id is being
allocated and removed by peernet2id_alloc() and unhash_nsid().
The nsid_lock can be used in BH context but only spin_lock() is used
in this code.
Using spin_lock() instead of spin_lock_bh() can result in a deadlock in
the following scenario reported by the lockdep.
In order to avoid a deadlock, the spin_lock_bh() should be used instead
of spin_lock() to acquire nsid_lock.

Test commands:
    ip netns del nst
    ip netns add nst
    ip link add veth1 type veth peer name veth2
    ip link set veth1 netns nst
    ip netns exec nst ip link add name br1 type bridge vlan_filtering 1
    ip netns exec nst ip link set dev br1 up
    ip netns exec nst ip link set dev veth1 master br1
    ip netns exec nst ip link set dev veth1 up
    ip netns exec nst ip link add macvlan0 link br1 up type macvlan

Splat looks like:
[   33.615860][  T607] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
[   33.617194][  T607] 5.9.0-rc1+ #665 Not tainted
[ ... ]
[   33.670615][  T607] Chain exists of:
[   33.670615][  T607]   &mc->mca_lock --> &bridge_netdev_addr_lock_key --> &net->nsid_lock
[   33.670615][  T607]
[   33.673118][  T607]  Possible interrupt unsafe locking scenario:
[   33.673118][  T607]
[   33.674599][  T607]        CPU0                    CPU1
[   33.675557][  T607]        ----                    ----
[   33.676516][  T607]   lock(&net->nsid_lock);
[   33.677306][  T607]                                local_irq_disable();
[   33.678517][  T607]                                lock(&mc->mca_lock);
[   33.679725][  T607]                                lock(&bridge_netdev_addr_lock_key);
[   33.681166][  T607]   <Interrupt>
[   33.681791][  T607]     lock(&mc->mca_lock);
[   33.682579][  T607]
[   33.682579][  T607]  *** DEADLOCK ***
[ ... ]
[   33.922046][  T607] stack backtrace:
[   33.922999][  T607] CPU: 3 PID: 607 Comm: ip Not tainted 5.9.0-rc1+ #665
[   33.924099][  T607] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[   33.925714][  T607] Call Trace:
[   33.926238][  T607]  dump_stack+0x78/0xab
[   33.926905][  T607]  check_irq_usage+0x70b/0x720
[   33.927708][  T607]  ? iterate_chain_key+0x60/0x60
[   33.928507][  T607]  ? check_path+0x22/0x40
[   33.929201][  T607]  ? check_noncircular+0xcf/0x180
[   33.930024][  T607]  ? __lock_acquire+0x1952/0x1f20
[   33.930860][  T607]  __lock_acquire+0x1952/0x1f20
[   33.931667][  T607]  lock_acquire+0xaf/0x3a0
[   33.932366][  T607]  ? peernet2id_alloc+0x3a/0x170
[   33.933147][  T607]  ? br_port_fill_attrs+0x54c/0x6b0 [bridge]
[   33.934140][  T607]  ? br_port_fill_attrs+0x5de/0x6b0 [bridge]
[   33.935113][  T607]  ? kvm_sched_clock_read+0x14/0x30
[   33.935974][  T607]  _raw_spin_lock+0x30/0x70
[   33.936728][  T607]  ? peernet2id_alloc+0x3a/0x170
[   33.937523][  T607]  peernet2id_alloc+0x3a/0x170
[   33.938313][  T607]  rtnl_fill_ifinfo+0xb5e/0x1400
[   33.939091][  T607]  rtmsg_ifinfo_build_skb+0x8a/0xf0
[   33.939953][  T607]  rtmsg_ifinfo_event.part.39+0x17/0x50
[   33.940863][  T607]  rtmsg_ifinfo+0x1f/0x30
[   33.941571][  T607]  __dev_notify_flags+0xa5/0xf0
[   33.942376][  T607]  ? __irq_work_queue_local+0x49/0x50
[   33.943249][  T607]  ? irq_work_queue+0x1d/0x30
[   33.943993][  T607]  ? __dev_set_promiscuity+0x7b/0x1a0
[   33.944878][  T607]  __dev_set_promiscuity+0x7b/0x1a0
[   33.945758][  T607]  dev_set_promiscuity+0x1e/0x50
[   33.946582][  T607]  br_port_set_promisc+0x1f/0x40 [bridge]
[   33.947487][  T607]  br_manage_promisc+0x8b/0xe0 [bridge]
[   33.948388][  T607]  __dev_set_promiscuity+0x123/0x1a0
[   33.949244][  T607]  __dev_set_rx_mode+0x68/0x90
[   33.950021][  T607]  dev_uc_add+0x50/0x60
[   33.950720][  T607]  macvlan_open+0x18e/0x1f0 [macvlan]
[   33.951601][  T607]  __dev_open+0xd6/0x170
[   33.952269][  T607]  __dev_change_flags+0x181/0x1d0
[   33.953056][  T607]  rtnl_configure_link+0x2f/0xa0
[   33.953884][  T607]  __rtnl_newlink+0x6b9/0x8e0
[   33.954665][  T607]  ? __lock_acquire+0x95d/0x1f20
[   33.955450][  T607]  ? lock_acquire+0xaf/0x3a0
[   33.956193][  T607]  ? is_bpf_text_address+0x5/0xe0
[   33.956999][  T607]  rtnl_newlink+0x47/0x70

Acked-by: Guillaume Nault <gnault@redhat.com>
Fixes: 8d7e5dee97 ("netns: don't disable BHs when locking "nsid_lock"")
Reported-by: syzbot+3f960c64a104eaa2c813@syzkaller.appspotmail.com
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 14:32:39 -07:00
Jakub Kicinski
8ae4dff882 ibmvnic: add missing parenthesis in do_reset()
Indentation and logic clearly show that this code is missing
parenthesis.

Fixes: 9f13457377 ("ibmvnic fix NULL tx_pools and rx_tools issue at do_reset")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 14:03:11 -07:00
Randy Dunlap
ffa59b0b39 netdevice.h: fix xdp_state kernel-doc warning
Fix kernel-doc warning in <linux/netdevice.h>:

../include/linux/netdevice.h:2158: warning: Function parameter or member 'xdp_state' not described in 'net_device'

Fixes: 7f0a838254 ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 12:46:58 -07:00
Randy Dunlap
eb02d39ad3 netdevice.h: fix proto_down_reason kernel-doc warning
Fix kernel-doc warning in <linux/netdevice.h>:

../include/linux/netdevice.h:2158: warning: Function parameter or member 'proto_down_reason' not described in 'net_device'

Fixes: 829eb208e8 ("rtnetlink: add support for protodown reason")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 12:46:34 -07:00
Jakub Kicinski
72bbee2aea Merge branch 'bnxt_en-Two-bug-fixes'
Michael Chan says:

====================
bnxt_en: Two bug fixes.

The first patch fixes AER recovery by reducing the time from several
minutes to a more reasonable 20 - 30 seconds.  The second patch fixes
a possible NULL pointer crash during firmware reset.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 10:08:52 -07:00
Vasundhara Volam
b16939b59c bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()
bnxt_fw_reset_task() which runs from a workqueue can race with
bnxt_remove_one().  For example, if firmware reset and VF FLR are
happening at about the same time.

bnxt_remove_one() already cancels the workqueue and waits for it
to finish, but we need to do this earlier before the devlink
reporters are destroyed.  This will guarantee that
the devlink reporters will always be valid when bnxt_fw_reset_task()
is still running.

Fixes: b148bb238c ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 10:08:18 -07:00
Vasundhara Volam
b340dc680e bnxt_en: Avoid sending firmware messages when AER error is detected.
When the driver goes through PCIe AER reset in error state, all
firmware messages will timeout because the PCIe bus is no longer
accessible.  This can lead to AER reset taking many minutes to
complete as each firmware command takes time to timeout.

Define a new macro BNXT_NO_FW_ACCESS() to skip these firmware messages
when either firmware is in fatal error state or when
pci_channel_offline() is true.  It now takes a more reasonable 20 to
30 seconds to complete AER recovery.

Fixes: b4fff2079d ("bnxt_en: Do not send firmware messages if firmware is in error state.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 10:08:18 -07:00
Linus Walleij
4ddcaf1ebb net: dsa: rtl8366: Properly clear member config
When removing a port from a VLAN we are just erasing the
member config for the VLAN, which is wrong: other ports
can be using it.

Just mask off the port and only zero out the rest of the
member config once ports using of the VLAN are removed
from it.

Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Fixes: d8652956cf ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-06 12:32:07 -07:00
Parshuram Thombare
d7739b0b6d net: macb: fix for pause frame receive enable bit
PAE bit of NCFGR register, when set, pauses transmission
if a non-zero 802.3 classic pause frame is received.

Fixes: 7897b071ac ("net: macb: convert to phylink")
Signed-off-by: Parshuram Thombare <pthombar@cadence.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-06 11:58:59 -07:00
Ganji Aravind
94cc242a06 cxgb4: Fix offset when clearing filter byte counters
Pass the correct offset to clear the stale filter hit
bytes counter. Otherwise, the counter starts incrementing
from the stale information, instead of 0.

Fixes: 12b276fbf6 ("cxgb4: add support to create hash filters")
Signed-off-by: Ganji Aravind <ganji.aravind@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-05 15:48:42 -07:00
Jakub Kicinski
02146a93ba Merge branch 'hinic-BugFixes'
Luo bin says:

====================
hinic: BugFixes

The bugs fixed in this patchset have been present since the following
commits:
patch #1: Fixes: 00e57a6d4a ("net-next/hinic: Add Tx operation")
patch #2: Fixes: 5e126e7c4e ("hinic: add firmware update support")
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-05 15:24:55 -07:00
Luo bin
0c97ee5fbd hinic: bump up the timeout of UPDATE_FW cmd
Firmware erases the entire flash region which may take several
seconds before flashing, so we bump up the timeout to ensure this
cmd won't return failure.

Fixes: 5e126e7c4e ("hinic: add firmware update support")
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-05 15:24:39 -07:00
Luo bin
4e4269ebe7 hinic: bump up the timeout of SET_FUNC_STATE cmd
We free memory regardless of the return value of SET_FUNC_STATE
cmd in hinic_close function to avoid memory leak and this cmd may
timeout when fw is busy with handling other cmds, so we bump up the
timeout of this cmd to ensure it won't return failure.

Fixes: 00e57a6d4a ("net-next/hinic: Add Tx operation")
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-05 15:24:39 -07:00
Cong Wang
cc8e58f832 act_ife: load meta modules before tcf_idr_check_alloc()
The following deadlock scenario is triggered by syzbot:

Thread A:				Thread B:
tcf_idr_check_alloc()
...
populate_metalist()
  rtnl_unlock()
					rtnl_lock()
					...
  request_module()			tcf_idr_check_alloc()
  rtnl_lock()

At this point, thread A is waiting for thread B to release RTNL
lock, while thread B is waiting for thread A to commit the IDR
change with tcf_idr_insert() later.

Break this deadlock situation by preloading ife modules earlier,
before tcf_idr_check_alloc(), this is fine because we only need
to load modules we need potentially.

Reported-and-tested-by: syzbot+80e32b5d1f9923f8ace6@syzkaller.appspotmail.com
Fixes: 0190c1d452 ("net: sched: atomically check-allocate action")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-04 22:08:59 -07:00
Jing Xiangfeng
c2b947879c atm: eni: fix the missed pci_disable_device() for eni_init_one()
eni_init_one() misses to call pci_disable_device() in an error path.
Jump to err_disable to fix it.

Fixes: ede58ef28e ("atm: remove deprecated use of pci api")
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-04 21:42:57 -07:00
Xie He
44a049c426 drivers/net/wan/hdlc_fr: Add needed_headroom for PVC devices
PVC devices are virtual devices in this driver stacked on top of the
actual HDLC device. They are the devices normal users would use.
PVC devices have two types: normal PVC devices and Ethernet-emulating
PVC devices.

When transmitting data with PVC devices, the ndo_start_xmit function
will prepend a header of 4 or 10 bytes. Currently this driver requests
this headroom to be reserved for normal PVC devices by setting their
hard_header_len to 10. However, this does not work when these devices
are used with AF_PACKET/RAW sockets. Also, this driver does not request
this headroom for Ethernet-emulating PVC devices (but deals with this
problem by reallocating the skb when needed, which is not optimal).

This patch replaces hard_header_len with needed_headroom, and set
needed_headroom for Ethernet-emulating PVC devices, too. This makes
the driver to request headroom for all PVC devices in all cases.

Cc: Krzysztof Halasa <khc@pm.waw.pl>
Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-04 21:37:04 -07:00
Linus Torvalds
c70672d8d3 Merge tag 's390-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:

 - Fix GENERIC_LOCKBREAK dependency on PREEMPTION in Kconfig broken
   because of a typo

 - Update defconfigs

* tag 's390-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: update defconfigs
  s390: fix GENERIC_LOCKBREAK dependency typo in Kconfig
2020-09-04 13:46:33 -07:00
Linus Torvalds
09274aed90 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:

 - Fix the loading of modules built with binutils-2.35. This version
   produces writable and executable .text.ftrace_trampoline section
   which is rejected by the kernel.

 - Remove the exporting of cpu_logical_map() as the Tegra driver has now
   been fixed and no longer uses this function.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/module: set trampoline section flags regardless of CONFIG_DYNAMIC_FTRACE
  arm64: Remove exporting cpu_logical_map symbol
2020-09-04 13:40:59 -07:00
Linus Torvalds
16bf121b2d Merge tag 'mips_fixes_5.9_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
 "A few MIPS fixes:

   - fallthrough fallout fix

   - BMIPS fixes

   - MSA fix to avoid leaking MSA register contents

   - Loongson perf and cpu feature fix

   - SNI interrupt fix"

* tag 'mips_fixes_5.9_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: SNI: Fix SCSI interrupt
  MIPS: add missing MSACSR and upper MSA initialization
  MIPS: perf: Fix wrong check condition of Loongson event IDs
  mips/oprofile: Fix fallthrough placement
  MIPS: Loongson64: Remove unnecessary inclusion of boot_param.h
  MIPS: BMIPS: Also call bmips_cpu_setup() for secondary cores
  MIPS: mm: BMIPS5000 has inclusive physical caches
  MIPS: Loongson64: Do not override watch and ejtag feature
2020-09-04 13:37:19 -07:00
Linus Torvalds
41bef91c8a Merge tag 'kbuild-fixes-v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:

 - fix documents

 - fix warning in 'make localmodconfig'

* tag 'kbuild-fixes-v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: remove redundant assignment prompt = prompt
  kbuild: Documentation: clean up makefiles.rst
  kconfig: streamline_config.pl: check defined(ENV variable) before using it
  Documentation/llvm: Improve formatting of commands, variables, and arguments
2020-09-04 13:34:52 -07:00
Linus Torvalds
f162626a03 Merge tag 'pm-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These fix reference counting in the operating performance points (OPP)
  framework and address a few intel_pstate driver issues, mostly related
  to switching driver operation modes and similar with hardware-managed
  P-states (HWP) enabled.

  Specifics:

   - Fix reference counting of operating performance points (OPP) tables
     (Viresh Kumar).

   - Address intel_pstate driver interface issues, mostly related to
     switching operation modes and handling CPU offline and online and
     system-wide suspend/resume with hardware-managed P-states (HWP)
     enabled (Rafael Wysocki).

   - Fix the maximum frequency computation in the intel_pstate driver
     with turbo P-states disabled by the platform firmware and HWP
     enabled (Francisco Jerez)"

* tag 'pm-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled
  cpufreq: intel_pstate: Free memory only when turning off
  cpufreq: intel_pstate: Add ->offline and ->online callbacks
  cpufreq: intel_pstate: Tweak the EPP sysfs interface
  cpufreq: intel_pstate: Update cached EPP in the active mode
  cpufreq: intel_pstate: Refuse to turn off with HWP enabled
  opp: Don't drop reference for an OPP table that was never parsed
2020-09-04 13:27:24 -07:00
Linus Torvalds
d824e0809c Merge tag 'libata-5.9-2020-09-04' of git://git.kernel.dk/linux-block
Pull libata fixes from Jens Axboe:

 - improve Sandisks ATA_HORKAGE on NCQ (Tejun)

 - link printk cleanup (Xu)

* tag 'libata-5.9-2020-09-04' of git://git.kernel.dk/linux-block:
  libata: implement ATA_HORKAGE_MAX_TRIM_128M and apply to Sandisks
  ata: ahci: use ata_link_info() instead of ata_link_printk()
2020-09-04 13:19:19 -07:00
Linus Torvalds
8075fc3b11 Merge tag 'block-5.9-2020-09-04' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A bit larger than usual this week, mostly due to the NVMe fixes
  arriving late for -rc3 and hence didn't make last weeks pull request.

   - NVMe:
        - instance leak and io boundary fixes from Keith
        - fc locking fix from Christophe
        - various tcp/rdma reset during traffic fixes from Sagi
        - pci use-after-free fix from Tong
        - tcp target null deref fix from Ziye

   - Locking fix for partition removal (Christoph)

   - Ensure bdi->io_pages is always set (me)

   - Fixup for hd struct reference (Ming)

   - Fix for zero length bvecs (Ming)

   - Two small blk-iocost fixes (Tejun)"

* tag 'block-5.9-2020-09-04' of git://git.kernel.dk/linux-block:
  block: allow for_each_bvec to support zero len bvec
  blk-stat: make q->stats->lock irqsafe
  blk-iocost: ioc_pd_free() shouldn't assume irq disabled
  block: fix locking in bdev_del_partition
  block: release disk reference in hd_struct_free_work
  block: ensure bdi->io_pages is always initialized
  nvme-pci: cancel nvme device request before disabling
  nvme: only use power of two io boundaries
  nvme: fix controller instance leak
  nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'
  nvme: Fix NULL dereference for pci nvme controllers
  nvme-rdma: fix reset hang if controller died in the middle of a reset
  nvme-rdma: fix timeout handler
  nvme-rdma: serialize controller teardown sequences
  nvme-tcp: fix reset hang if controller died in the middle of a reset
  nvme-tcp: fix timeout handler
  nvme-tcp: serialize controller teardown sequences
  nvme: have nvme_wait_freeze_timeout return if it timed out
  nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance
  nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu
2020-09-04 13:04:51 -07:00
Linus Torvalds
d849ca483d Merge tag 'io_uring-5.9-2020-09-04' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:

 - EAGAIN with O_NONBLOCK retry fix

 - Two small fixes for registered files (Jiufei)

* tag 'io_uring-5.9-2020-09-04' of git://git.kernel.dk/linux-block:
  io_uring: no read/write-retry on -EAGAIN error and O_NONBLOCK marked file
  io_uring: set table->files[i] to NULL when io_sqe_file_register failed
  io_uring: fix removing the wrong file in __io_sqe_files_update()
2020-09-04 12:55:22 -07:00
Linus Torvalds
2fb547911c Merge tag 'thermal-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal fixes from Daniel Lezcano:

 - Fix bogus thermal shutdowns for omap4430 where bogus values resulting
   from an incorrect ADC conversion are too high and fire an emergency
   shutdown (Tony Lindgren)

 - Don't suppress negative temp for qcom spmi as they are valid and
   userspace needs them (Veera Vegivada)

 - Fix use-after-free in thermal_zone_device_unregister reported by
   Kasan (Dmitry Osipenko)

* tag 'thermal-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal: core: Fix use-after-free in thermal_zone_device_unregister()
  thermal: qcom-spmi-temp-alarm: Don't suppress negative temp
  thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430
2020-09-04 12:49:03 -07:00
Linus Torvalds
e2dacf6cd1 Merge tag 'dmaengine-fix-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
 "A couple of core fixes and odd driver fixes for dmaengine subsystem:

  Core:
   - drop ACPI CSRT table reference after using it
   - fix of_dma_router_xlate() error handling

  Drivers fixes in idxd, at_hdmac, pl330, dw-edma and jz478"

* tag 'dmaengine-fix-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: ti: k3-udma: Update rchan_oes_offset for am654 SYSFW ABI 3.0
  drivers/dma/dma-jz4780: Fix race condition between probe and irq handler
  dmaengine: dw-edma: Fix scatter-gather address calculation
  dmaengine: ti: k3-udma: Fix the TR initialization for prep_slave_sg
  dmaengine: pl330: Fix burst length if burst size is smaller than bus width
  dmaengine: at_hdmac: add missing kfree() call in at_dma_xlate()
  dmaengine: at_hdmac: add missing put_device() call in at_dma_xlate()
  dmaengine: at_hdmac: check return value of of_find_device_by_node() in at_dma_xlate()
  dmaengine: of-dma: Fix of_dma_router_xlate's of_dma_xlate handling
  dmaengine: idxd: reset states after device disable or reset
  dmaengine: acpi: Put the CSRT table after using it
2020-09-04 12:12:39 -07:00
Linus Torvalds
86edf52e7c Merge tag 'sound-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "A collection of small changes, nothing intrusive:

   - remaining tasklet API conversions, now all sound stuff have been
     converted

   - a few HD-audio and USB-audio quirks and minor fixes

   - FireWire Tascam and Digi00xx fixes

   - drop a kernel WARNING from PCM OSS for syzkaller"

* tag 'sound-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (29 commits)
  ALSA: hda/realtek - Improved routing for Thinkpad X1 7th/8th Gen
  ALSA: hda: use consistent HDAudio spelling in comments/docs
  ALSA: hda: add dev_dbg log when driver is not selected
  ALSA: hda: fix a runtime pm issue in SOF when integrated GPU is disabled
  ALSA: hda: hdmi - add Rocketlake support
  ALSA: ua101: convert tasklets to use new tasklet_setup() API
  ALSA: usb-audio: convert tasklets to use new tasklet_setup() API
  ASoC: txx9: convert tasklets to use new tasklet_setup() API
  ASoC: siu: convert tasklets to use new tasklet_setup() API
  ASoC: fsl_esai: convert tasklets to use new tasklet_setup() API
  ALSA: hdsp: convert tasklets to use new tasklet_setup() API
  ALSA: riptide: convert tasklets to use new tasklet_setup() API
  ALSA: pci/asihpi: convert tasklets to use new tasklet_setup() API
  ALSA: firewire: convert tasklets to use new tasklet_setup() API
  ALSA: core: convert tasklets to use new tasklet_setup() API
  ALSA: pcm: oss: Remove superfluous WARN_ON() for mulaw sanity check
  ALSA: hda - Fix silent audio output and corrupted input on MSI X570-A PRO
  ALSA: hda/hdmi: always check pin power status in i915 pin fixup
  ALSA: hda/realtek: Add quirk for Samsung Galaxy Book Ion NT950XCJ-X716A
  ALSA: usb-audio: Add basic capture support for Pioneer DJ DJM-250MK2
  ...
2020-09-04 12:05:25 -07:00
Linus Torvalds
cf85f5de83 Merge tag 'drm-fixes-2020-09-04' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "Not much going on this week, nouveau has a display hw bug workaround,
  amdgpu has some PM fixes and CIK regression fixes, one single radeon
  PLL fix, and a couple of i915 display fixes.

  amdgpu:
   - Fix for 32bit systems
   - SW CTF fix
   - Update for Sienna Cichlid
   - CIK bug fixes

  radeon:
   - PLL fix

  i915:
   - Clang build warning fix
   - HDCP fixes

  nouveau:
   - display fixes"

* tag 'drm-fixes-2020-09-04' of git://anongit.freedesktop.org/drm/drm:
  drm/nouveau/kms/nv50-gp1xx: add WAR for EVO push buffer HW bug
  drm/nouveau/kms/nv50-gp1xx: disable notifies again after core update
  drm/nouveau/kms/nv50-: add some whitespace before debug message
  drm/nouveau/kms/gv100-: Include correct push header in crcc37d.c
  drm/radeon: Prefer lower feedback dividers
  drm/amdgpu: Fix bug in reporting voltage for CIK
  drm/amdgpu: Specify get_argument function for ci_smu_funcs
  drm/amd/pm: enable MP0 DPM for sienna_cichlid
  drm/amd/pm: avoid false alarm due to confusing softwareshutdowntemp setting
  drm/amd/pm: fix is_dpm_running() run error on 32bit system
  drm/i915: Clear the repeater bit on HDCP disable
  drm/i915: Fix sha_text population code
  drm/i915/display: Ensure that ret is always initialized in icl_combo_phy_verify_state
2020-09-04 11:59:44 -07:00
Or Cohen
acf69c9462 net/packet: fix overflow in tpacket_rcv
Using tp_reserve to calculate netoff can overflow as
tp_reserve is unsigned int and netoff is unsigned short.

This may lead to macoff receving a smaller value then
sizeof(struct virtio_net_hdr), and if po->has_vnet_hdr
is set, an out-of-bounds write will occur when
calling virtio_net_hdr_from_skb.

The bug is fixed by converting netoff to unsigned int
and checking if it exceeds USHRT_MAX.

This addresses CVE-2020-14386

Fixes: 8913336a7e ("packet: add PACKET_RESERVE sockopt")
Signed-off-by: Or Cohen <orcohen@paloaltonetworks.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-04 11:56:02 -07:00
Linus Torvalds
b25d1dc947 Merge branch 'simplify-do_wp_page'
Merge emailed patches from Peter Xu:
 "This is a small series that I picked up from Linus's suggestion to
  simplify cow handling (and also make it more strict) by checking
  against page refcounts rather than mapcounts.

  This makes uffd-wp work again (verified by running upmapsort)"

Note: this is horrendously bad timing, and making this kind of
fundamental vm change after -rc3 is not at all how things should work.
The saving grace is that it really is a a nice simplification:

 8 files changed, 29 insertions(+), 120 deletions(-)

The reason for the bad timing is that it turns out that commit
17839856fd ("gup: document and work around 'COW can break either way'
issue" broke not just UFFD functionality (as Peter noticed), but Mikulas
Patocka also reports that it caused issues for strace when running in a
DAX environment with ext4 on a persistent memory setup.

And we can't just revert that commit without re-introducing the original
issue that is a potential security hole, so making COW stricter (and in
the process much simpler) is a step to then undoing the forced COW that
broke other uses.

Link: https://lore.kernel.org/lkml/alpine.LRH.2.02.2009031328040.6929@file01.intranet.prod.int.rdu2.redhat.com/

* emailed patches from Peter Xu <peterx@redhat.com>:
  mm: Add PGREUSE counter
  mm/gup: Remove enfornced COW mechanism
  mm/ksm: Remove reuse_ksm_page()
  mm: do_wp_page() simplification
2020-09-04 09:31:54 -07:00
Rafael J. Wysocki
f7ce2c3afc Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled
  cpufreq: intel_pstate: Free memory only when turning off
  cpufreq: intel_pstate: Add ->offline and ->online callbacks
  cpufreq: intel_pstate: Tweak the EPP sysfs interface
  cpufreq: intel_pstate: Update cached EPP in the active mode
  cpufreq: intel_pstate: Refuse to turn off with HWP enabled
2020-09-04 18:31:25 +02:00
Peter Xu
798a6b87ec mm: Add PGREUSE counter
This accounts for wp_page_reuse() case, where we reused a page for COW.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-04 09:25:20 -07:00
Peter Xu
a308c71bf1 mm/gup: Remove enfornced COW mechanism
With the more strict (but greatly simplified) page reuse logic in
do_wp_page(), we can safely go back to the world where cow is not
enforced with writes.

This essentially reverts commit 17839856fd ("gup: document and work
around 'COW can break either way' issue").  There are some context
differences due to some changes later on around it:

  2170ecfa76 ("drm/i915: convert get_user_pages() --> pin_user_pages()", 2020-06-03)
  376a34efa4 ("mm/gup: refactor and de-duplicate gup_fast() code", 2020-06-03)

Some lines moved back and forth with those, but this revert patch should
have striped out and covered all the enforced cow bits anyways.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-04 09:25:20 -07:00
Peter Xu
1a0cf26323 mm/ksm: Remove reuse_ksm_page()
Remove the function as the last reference has gone away with the do_wp_page()
changes.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-04 09:25:20 -07:00
Linus Torvalds
09854ba94c mm: do_wp_page() simplification
How about we just make sure we're the only possible valid user fo the
page before we bother to reuse it?

Simplify, simplify, simplify.

And get rid of the nasty serialization on the page lock at the same time.

[peterx: add subject prefix]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-04 09:25:20 -07:00
Leon Romanovsky
cfc905f158 gcov: Disable gcov build with GCC 10
GCOV built with GCC 10 doesn't initialize n_function variable.  This
produces different kernel panics as was seen by Colin in Ubuntu and me
in FC 32.

As a workaround, let's disable GCOV build for broken GCC 10 version.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1891288
Link: https://lore.kernel.org/lkml/20200827133932.3338519-1-leon@kernel.org
Link: https://lore.kernel.org/lkml/CAHk-=whbijeSdSvx-Xcr0DPMj0BiwhJ+uiNnDSVZcr_h_kg7UA@mail.gmail.com/
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-04 09:19:49 -07:00
Barret Rhoden
7b81ce7cdc init: fix error check in clean_path()
init_stat() returns 0 on success, same as vfs_lstat().  When it replaced
vfs_lstat(), the '!' was dropped.

Fixes: 716308a533 ("init: add an init_stat helper")
Signed-off-by: Barret Rhoden <brho@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-04 09:16:58 -07:00
Dmitry Osipenko
a5f785ce60 thermal: core: Fix use-after-free in thermal_zone_device_unregister()
The user-after-free bug in thermal_zone_device_unregister() is reported by
KASAN. It happens because struct thermal_zone_device is released during of
device_unregister() invocation, and hence the "tz" variable shouldn't be
touched by thermal_notify_tz_delete(tz->id).

Fixes: 55cdf0a283 ("thermal: core: Add notifications call in the framework")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200817235854.26816-1-digetx@gmail.com
2020-09-04 11:52:54 +02:00
Veera Vegivada
0ffdab6f2d thermal: qcom-spmi-temp-alarm: Don't suppress negative temp
Currently driver is suppressing the negative temperature
readings from the vadc. Consumers of the thermal zones need
to read the negative temperature too. Don't suppress the
readings.

Fixes: c610afaa21 ("thermal: Add QPNP PMIC temperature alarm driver")
Signed-off-by: Veera Vegivada <vvegivad@codeaurora.org>
Signed-off-by: Guru Das Srinagesh <gurus@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/944856eb819081268fab783236a916257de120e4.1596040416.git.gurus@codeaurora.org
2020-09-04 11:52:54 +02:00
Tony Lindgren
30d24faba0 thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430
We can sometimes get bogus thermal shutdowns on omap4430 at least with
droid4 running idle with a battery charger connected:

thermal thermal_zone0: critical temperature reached (143 C), shutting down

Dumping out the register values shows we can occasionally get a 0x7f value
that is outside the TRM listed values in the ADC conversion table. And then
we get a normal value when reading again after that. Reading the register
multiple times does not seem help avoiding the bogus values as they stay
until the next sample is ready.

Looking at the TRM chapter "18.4.10.2.3 ADC Codes Versus Temperature", we
should have values from 13 to 107 listed with a total of 95 values. But
looking at the omap4430_adc_to_temp array, the values are off, and the
end values are missing. And it seems that the 4430 ADC table is similar
to omap3630 rather than omap4460.

Let's fix the issue by using values based on the omap3630 table and just
ignoring invalid values. Compared to the 4430 TRM, the omap3630 table has
the missing values added while the TRM table only shows every second
value.

Note that sometimes the ADC register values within the valid table can
also be way off for about 1 out of 10 values. But it seems that those
just show about 25 C too low values rather than too high values. So those
do not cause a bogus thermal shutdown.

Fixes: 1a31270e54 ("staging: omap-thermal: add OMAP4 data structures")
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200706183338.25622-1-tony@atomide.com
2020-09-04 11:52:46 +02:00
Linus Torvalds
59126901f2 Merge tag 'perf-tools-fixes-for-v5.9-2020-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tools fixes from Arnaldo Carvalho de Melo:

 - Use uintptr_t when casting numbers to pointers

 - Keep output expected by 3rd parties: Turn off summary for interval
   mode by default.

 - BPF is in kernel space, make sure do_validate_kcore_modules() knows
   about that.

 - Explicitly call out event modifiers in the documentation.

 - Fix jevents() allocation of space for regular expressions.

 - Address libtraceevent build warnings on 32-bit arches.

 - Fix checking of functions returns using ERR_PTR() in 'perf bench'.

* tag 'perf-tools-fixes-for-v5.9-2020-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf tools: Add bpf image check to __map__is_kmodule
  perf record/stat: Explicitly call out event modifiers in the documentation
  perf bench: The do_run_multi_threaded() function must use IS_ERR(perf_session__new())
  perf stat: Turn off summary for interval mode by default
  libtraceevent: Fix build warning on 32-bit arches
  perf jevents: Fix suspicious code in fixregex()
  perf parse-events: Use uintptr_t when casting numbers to pointers
2020-09-03 19:10:43 -07:00
Linus Torvalds
3e8d3bdc2a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
    Kivilinna.

 2) Fix loss of RTT samples in rxrpc, from David Howells.

 3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.

 4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.

 5) We disable BH for too lokng in sctp_get_port_local(), add a
    cond_resched() here as well, from Xin Long.

 6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.

 7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
    Yonghong Song.

 8) Missing of_node_put() in mt7530 DSA driver, from Sumera
    Priyadarsini.

 9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.

10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.

11) Memory leak in rxkad_verify_response, from Dinghao Liu.

12) In tipc, don't use smp_processor_id() in preemptible context. From
    Tuong Lien.

13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.

14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.

15) Fix ABI mismatch between driver and firmware in nfp, from Louis
    Peens.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
  net/smc: fix sock refcounting in case of termination
  net/smc: reset sndbuf_desc if freed
  net/smc: set rx_off for SMCR explicitly
  net/smc: fix toleration of fake add_link messages
  tg3: Fix soft lockup when tg3_reset_task() fails.
  doc: net: dsa: Fix typo in config code sample
  net: dp83867: Fix WoL SecureOn password
  nfp: flower: fix ABI mismatch between driver and firmware
  tipc: fix shutdown() of connectionless socket
  ipv6: Fix sysctl max for fib_multipath_hash_policy
  drivers/net/wan/hdlc: Change the default of hard_header_len to 0
  net: gemini: Fix another missing clk_disable_unprepare() in probe
  net: bcmgenet: fix mask check in bcmgenet_validate_flow()
  amd-xgbe: Add support for new port mode
  net: usb: dm9601: Add USB ID of Keenetic Plus DSL
  vhost: fix typo in error message
  net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
  pktgen: fix error message with wrong function name
  net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
  cxgb4: fix thermal zone device registration
  ...
2020-09-03 18:50:48 -07:00
Linus Torvalds
8381979dfa Merge branch 'gate-page-refcount' (patches from Dave Hansen)
Merge gate page refcount fix from Dave Hansen:
 "During the conversion over to pin_user_pages(), gate pages were missed.

  The fix is pretty simple, and is accompanied by a new test from Andy
  which probably would have caught this earlier"

* emailed patches from Dave Hansen <dave.hansen@linux.intel.com>:
  selftests/x86/test_vsyscall: Improve the process_vm_readv() test
  mm: fix pin vs. gup mismatch with gate pages
2020-09-03 18:43:06 -07:00
Andy Lutomirski
8891adc61d selftests/x86/test_vsyscall: Improve the process_vm_readv() test
The existing code accepted process_vm_readv() success or failure as long
as it didn't return garbage.  This is too weak: if the vsyscall page is
readable, then process_vm_readv() should succeed and, if the page is not
readable, then it should fail.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-03 18:36:55 -07:00
Dave Hansen
9fa2dd9467 mm: fix pin vs. gup mismatch with gate pages
Gate pages were missed when converting from get to pin_user_pages().
This can lead to refcount imbalances.  This is reliably and quickly
reproducible running the x86 selftests when vsyscall=emulate is enabled
(the default).  Fix by using try_grab_page() with appropriate flags
passed.

The long story:

Today, pin_user_pages() and get_user_pages() are similar interfaces for
manipulating page reference counts.  However, "pins" use a "bias" value
and manipulate the actual reference count by 1024 instead of 1 used by
plain "gets".

That means that pin_user_pages() must be matched with unpin_user_pages()
and can't be mixed with a plain put_user_pages() or put_page().

Enter gate pages, like the vsyscall page.  They are pages usually in the
kernel image, but which are mapped to userspace.  Userspace is allowed
access to them, including interfaces using get/pin_user_pages().  The
refcount of these kernel pages is manipulated just like a normal user
page on the get/pin side so that the put/unpin side can work the same
for normal user pages or gate pages.

get_gate_page() uses try_get_page() which only bumps the refcount by
1, not 1024, even if called in the pin_user_pages() path.  If someone
pins a gate page, this happens:

	pin_user_pages()
		get_gate_page()
			try_get_page() // bump refcount +1
	... some time later
	unpin_user_pages()
		page_ref_sub_and_test(page, 1024))

... and boom, we get a refcount off by 1023.  This is reliably and
quickly reproducible running the x86 selftests when booted with
vsyscall=emulate (the default).  The selftests use ptrace(), but I
suspect anything using pin_user_pages() on gate pages could hit this.

To fix it, simply use try_grab_page() instead of try_get_page(), and
pass 'gup_flags' in so that FOLL_PIN can be respected.

This bug traces back to the very beginning of the FOLL_PIN support in
commit 3faa52c03f ("mm/gup: track FOLL_PIN pages"), which showed up in
the 5.7 release.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: 3faa52c03f ("mm/gup: track FOLL_PIN pages")
Reported-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-03 18:36:55 -07:00
Dave Airlie
d37d569200 Merge branch 'linux-5.9' of git://github.com/skeggsb/linux into drm-fixes
A couple of minor fixes to the display changes that went in for 5.9.
The most important of which is a workaround for a HW bug that was
exposed by better push buffer space management, leading to
random(ish...) display engine hangs.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ <CACAvsv5QDxyMihrxbPk+-sORnaYtjR6_dbM68gEhb2wxht_G1w@mail.gmail.com
2020-09-04 11:14:49 +10:00
Dave Airlie
0f8aeef1a5 Merge tag 'drm-intel-fixes-2020-09-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.9-rc4:
- Clang build warning fix
- HDCP fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87sgbz2pnx.fsf@intel.com
2020-09-04 11:00:48 +10:00
Dave Airlie
b596649fd1 Merge tag 'amd-drm-fixes-5.9-2020-09-03' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.9-2020-09-03:

amdgpu:
- Fix for 32bit systems
- SW CTF fix
- Update for Sienna Cichlid
- CIK bug fixes

radeon:
- PLL fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200903050022.3960-1-alexander.deucher@amd.com
2020-09-04 10:51:28 +10:00