Commit Graph

128060 Commits

Author SHA1 Message Date
Linus Torvalds
002322402d Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 patches.

  Subsystems affected by this patch series: mm (hugetlb, kasan, gup,
  selftests, z3fold, kfence, memblock, and highmem), squashfs, ia64,
  gcov, and mailmap"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mailmap: update Andrey Konovalov's email address
  mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
  mm: memblock: fix section mismatch warning again
  kfence: make compatible with kmemleak
  gcov: fix clang-11+ support
  ia64: fix format strings for err_inject
  ia64: mca: allocate early mca with GFP_ATOMIC
  squashfs: fix xattr id and id lookup sanity checks
  squashfs: fix inode lookup sanity checks
  z3fold: prevent reclaim/free race for headless pages
  selftests/vm: fix out-of-tree build
  mm/mmu_notifiers: ensure range_end() is paired with range_start()
  kasan: fix per-page tags for non-page_alloc pages
  hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
2021-03-25 11:43:43 -07:00
Mike Rapoport
a024b7c285 mm: memblock: fix section mismatch warning again
Commit 34dc2efb39 ("memblock: fix section mismatch warning") marked
memblock_bottom_up() and memblock_set_bottom_up() as __init, but they
could be referenced from non-init functions like
memblock_find_in_range_node() on architectures that enable
CONFIG_ARCH_KEEP_MEMBLOCK.

For such builds kernel test robot reports:

   WARNING: modpost: vmlinux.o(.text+0x74fea4): Section mismatch in reference from the function memblock_find_in_range_node() to the function .init.text:memblock_bottom_up()
   The function memblock_find_in_range_node() references the function __init memblock_bottom_up().
   This is often because memblock_find_in_range_node lacks a __init  annotation or the annotation of memblock_bottom_up is wrong.

Replace __init annotations with __init_memblock annotations so that the
appropriate section will be selected depending on
CONFIG_ARCH_KEEP_MEMBLOCK.

Link: https://lore.kernel.org/lkml/202103160133.UzhgY0wt-lkp@intel.com
Link: https://lkml.kernel.org/r/20210316171347.14084-1-rppt@kernel.org
Fixes: 34dc2efb39 ("memblock: fix section mismatch warning")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Sean Christopherson
c2655835fd mm/mmu_notifiers: ensure range_end() is paired with range_start()
If one or more notifiers fails .invalidate_range_start(), invoke
.invalidate_range_end() for "all" notifiers.  If there are multiple
notifiers, those that did not fail are expecting _start() and _end() to
be paired, e.g.  KVM's mmu_notifier_count would become imbalanced.
Disallow notifiers that can fail _start() from implementing _end() so
that it's unnecessary to either track which notifiers rejected _start(),
or had already succeeded prior to a failed _start().

Note, the existing behavior of calling _start() on all notifiers even
after a previous notifier failed _start() was an unintented "feature".
Make it canon now that the behavior is depended on for correctness.

As of today, the bug is likely benign:

  1. The only caller of the non-blocking notifier is OOM kill.
  2. The only notifiers that can fail _start() are the i915 and Nouveau
     drivers.
  3. The only notifiers that utilize _end() are the SGI UV GRU driver
     and KVM.
  4. The GRU driver will never coincide with the i195/Nouveau drivers.
  5. An imbalanced kvm->mmu_notifier_count only causes soft lockup in the
     _guest_, and the guest is already doomed due to being an OOM victim.

Fix the bug now to play nice with future usage, e.g.  KVM has a
potential use case for blocking memslot updates in KVM while an
invalidation is in-progress, and failure to unblock would result in said
updates being blocked indefinitely and hanging.

Found by inspection.  Verified by adding a second notifier in KVM that
periodically returns -EAGAIN on non-blockable ranges, triggering OOM,
and observing that KVM exits with an elevated notifier count.

Link: https://lkml.kernel.org/r/20210311180057.1582638-1-seanjc@google.com
Fixes: 93065ac753 ("mm, oom: distinguish blockable mode for mmu notifiers")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Andrey Konovalov
cf10bd4c4a kasan: fix per-page tags for non-page_alloc pages
To allow performing tag checks on page_alloc addresses obtained via
page_address(), tag-based KASAN modes store tags for page_alloc
allocations in page->flags.

Currently, the default tag value stored in page->flags is 0x00.
Therefore, page_address() returns a 0x00ffff...  address for pages that
were not allocated via page_alloc.

This might cause problems.  A particular case we encountered is a
conflict with KFENCE.  If a KFENCE-allocated slab object is being freed
via kfree(page_address(page) + offset), the address passed to kfree()
will get tagged with 0x00 (as slab pages keep the default per-page
tags).  This leads to is_kfence_address() check failing, and a KFENCE
object ending up in normal slab freelist, which causes memory
corruptions.

This patch changes the way KASAN stores tag in page-flags: they are now
stored xor'ed with 0xff.  This way, KASAN doesn't need to initialize
per-page flags for every created page, which might be slow.

With this change, page_address() returns natively-tagged (with 0xff)
pointers for pages that didn't have tags set explicitly.

This patch fixes the encountered conflict with KFENCE and prevents more
similar issues that can occur in the future.

Link: https://lkml.kernel.org/r/1a41abb11c51b264511d9e71c303bb16d5cb367b.1615475452.git.andreyknvl@google.com
Fixes: 2813b9c029 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Miaohe Lin
d85aecf284 hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
The current implementation of hugetlb_cgroup for shared mappings could
have different behavior.  Consider the following two scenarios:

 1.Assume initial css reference count of hugetlb_cgroup is 1:
  1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference
      count is 2 associated with 1 file_region.
  1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference
      count is 3 associated with 2 file_region.
  1.3 coalesce_file_region will coalesce these two file_regions into
      one. So css reference count is 3 associated with 1 file_region
      now.

 2.Assume initial css reference count of hugetlb_cgroup is 1 again:
  2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference
      count is 2 associated with 1 file_region.

Therefore, we might have one file_region while holding one or more css
reference counts. This inconsistency could lead to imbalanced css_get()
and css_put() pair. If we do css_put one by one (i.g. hole punch case),
scenario 2 would put one more css reference. If we do css_put all
together (i.g. truncate case), scenario 1 will leak one css reference.

The imbalanced css_get() and css_put() pair would result in a non-zero
reference when we try to destroy the hugetlb cgroup. The hugetlb cgroup
directory is removed __but__ associated resource is not freed. This
might result in OOM or can not create a new hugetlb cgroup in a busy
workload ultimately.

In order to fix this, we have to make sure that one file_region must
hold exactly one css reference. So in coalesce_file_region case, we
should release one css reference before coalescence. Also only put css
reference when the entire file_region is removed.

The last thing to note is that the caller of region_add() will only hold
one reference to h_cg->css for the whole contiguous reservation region.
But this area might be scattered when there are already some
file_regions reside in it. As a result, many file_regions may share only
one h_cg->css reference. In order to ensure that one file_region must
hold exactly one css reference, we should do css_get() for each
file_region and release the reference held by caller when they are done.

[linmiaohe@huawei.com: fix imbalanced css_get and css_put pair for shared mappings]
  Link: https://lkml.kernel.org/r/20210316023002.53921-1-linmiaohe@huawei.com

Link: https://lkml.kernel.org/r/20210301120540.37076-1-linmiaohe@huawei.com
Fixes: 075a61d07a ("hugetlb_cgroup: add accounting for shared mappings")
Reported-by: kernel test robot <lkp@intel.com> (auto build test ERROR)
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Linus Torvalds
e138138003 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "Various fixes, all over:

   1) Fix overflow in ptp_qoriq_adjfine(), from Yangbo Lu.

   2) Always store the rx queue mapping in veth, from Maciej
      Fijalkowski.

   3) Don't allow vmlinux btf in map_create, from Alexei Starovoitov.

   4) Fix memory leak in octeontx2-af from Colin Ian King.

   5) Use kvalloc in bpf x86 JIT for storing jit'd addresses, from
      Yonghong Song.

   6) Fix tx ptp stats in mlx5, from Aya Levin.

   7) Check correct ip version in tun decap, fropm Roi Dayan.

   8) Fix rate calculation in mlx5 E-Switch code, from arav Pandit.

   9) Work item memork leak in mlx5, from Shay Drory.

  10) Fix ip6ip6 tunnel crash with bpf, from Daniel Borkmann.

  11) Lack of preemptrion awareness in macvlan, from Eric Dumazet.

  12) Fix data race in pxa168_eth, from Pavel Andrianov.

  13) Range validate stab in red_check_params(), from Eric Dumazet.

  14) Inherit vlan filtering setting properly in b53 driver, from
      Florian Fainelli.

  15) Fix rtnl locking in igc driver, from Sasha Neftin.

  16) Pause handling fixes in igc driver, from Muhammad Husaini
      Zulkifli.

  17) Missing rtnl locking in e1000_reset_task, from Vitaly Lifshits.

  18) Use after free in qlcnic, from Lv Yunlong.

  19) fix crash in fritzpci mISDN, from Tong Zhang.

  20) Premature rx buffer reuse in igb, from Li RongQing.

  21) Missing termination of ip[a driver message handler arrays, from
      Alex Elder.

  22) Fix race between "x25_close" and "x25_xmit"/"x25_rx" in hdlc_x25
      driver, from Xie He.

  23) Use after free in c_can_pci_remove(), from Tong Zhang.

  24) Uninitialized variable use in nl80211, from Jarod Wilson.

  25) Off by one size calc in bpf verifier, from Piotr Krysiuk.

  26) Use delayed work instead of deferrable for flowtable GC, from
      Yinjun Zhang.

  27) Fix infinite loop in NPC unmap of octeontx2 driver, from
      Hariprasad Kelam.

  28) Fix being unable to change MTU of dwmac-sun8i devices due to lack
      of fifo sizes, from Corentin Labbe.

  29) DMA use after free in r8169 with WoL, fom Heiner Kallweit.

  30) Mismatched prototypes in isdn-capi, from Arnd Bergmann.

  31) Fix psample UAPI breakage, from Ido Schimmel"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (171 commits)
  psample: Fix user API breakage
  math: Export mul_u64_u64_div_u64
  ch_ktls: fix enum-conversion warning
  octeontx2-af: Fix memory leak of object buf
  ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation
  net: bridge: don't notify switchdev for local FDB addresses
  net/sched: act_ct: clear post_ct if doing ct_clear
  net: dsa: don't assign an error value to tag_ops
  isdn: capi: fix mismatched prototypes
  net/mlx5: SF, do not use ecpu bit for vhca state processing
  net/mlx5e: Fix division by 0 in mlx5e_select_queue
  net/mlx5e: Fix error path for ethtool set-priv-flag
  net/mlx5e: Offload tuple rewrite for non-CT flows
  net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP
  net/mlx5: Add back multicast stats for uplink representor
  net: ipconfig: ic_dev can be NULL in ic_close_devs
  MAINTAINERS: Combine "QLOGIC QLGE 10Gb ETHERNET DRIVER" sections into one
  docs: networking: Fix a typo
  r8169: fix DMA being used after buffer free if WoL is enabled
  net: ipa: fix init header command validation
  ...
2021-03-24 18:16:04 -07:00
Ido Schimmel
e43accba9b psample: Fix user API breakage
Cited commit added a new attribute before the existing group reference
count attribute, thereby changing its value and breaking existing
applications on new kernels.

Before:

 # psample -l
 libpsample ERROR psample_group_foreach: failed to recv message: Operation not supported

After:

 # psample -l
 Group Num       Refcount        Group Seq
 1               1               0

Fix by restoring the value of the old attribute and remove the
misleading comments from the enumerator to avoid future bugs.

Cc: stable@vger.kernel.org
Fixes: d8bed686ab ("net: psample: Add tunnel support")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Adiel Bidani <adielb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 16:44:31 -07:00
Matthew Wilcox (Oracle)
e5dbd33218 mm/writeback: Add wait_on_page_writeback_killable
This is the killable version of wait_on_page_writeback.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing@auristor.com
cc: linux-afs@lists.infradead.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20210320054104.1300774-3-willy@infradead.org
2021-03-23 20:54:29 +00:00
Matthew Wilcox (Oracle)
39f985c8f6 fs/cachefiles: Remove wait_bit_key layout dependency
Cachefiles was relying on wait_page_key and wait_bit_key being the
same layout, which is fragile.  Now that wait_page_key is exposed in
the pagemap.h header, we can remove that fragility

A comment on the need to maintain structure layout equivalence was added by
Linus[1] and that is no longer applicable.

Fixes: 6290602709 ("mm: add PageWaiters indicating tasks are waiting for a page bit")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing@auristor.com
cc: linux-cachefs@redhat.com
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20210320054104.1300774-2-willy@infradead.org/
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3510ca20ece0150af6b10c77a74ff1b5c198e3e2 [1]
2021-03-23 20:54:29 +00:00
Linus Torvalds
3001c3554f Merge tag 'usb-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB and Thunderbolt driver fixes from Greg KH:
 "Here are some small Thunderbolt and USB driver fixes for some reported
  issues:

   - thunderbolt fixes for minor problems

   - typec fixes for power issues

   - usb-storage quirk addition

   - usbip bugfix

   - dwc3 bugfix when stopping transfers

   - cdnsp bugfix for isoc transfers

   - gadget use-after-free fix

  All have been in linux-next this week with no reported issues"

* tag 'usb-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: tcpm: Skip sink_cap query only when VDM sm is busy
  usb: dwc3: gadget: Prevent EP queuing while stopping transfers
  usb: typec: tcpm: Invoke power_supply_changed for tcpm-source-psy-
  usb: typec: Remove vdo[3] part of tps6598x_rx_identity_reg struct
  usb-storage: Add quirk to defeat Kindle's automatic unload
  usb: gadget: configfs: Fix KASAN use-after-free
  usbip: Fix incorrect double assignment to udc->ud.tcp_rx
  usb: cdnsp: Fixes incorrect value in ISOC TRB
  thunderbolt: Increase runtime PM reference count on DP tunnel discovery
  thunderbolt: Initialize HopID IDAs in tb_switch_alloc()
2021-03-21 11:49:16 -07:00
Linus Torvalds
5ba33b488a Merge tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:

 - Get static calls & modules right. Hopefully.

 - WW mutex fixes

* tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  static_call: Fix static_call_update() sanity check
  static_call: Align static_call_is_init() patching condition
  static_call: Fix static_call_set_init()
  locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini()
  locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling
2021-03-21 11:19:29 -07:00
Linus Torvalds
92ed88cb4d Merge tag 'efi-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:

 - another missing RT_PROP table related fix, to ensure that the
   efivarfs pseudo filesystem fails gracefully if variable services
   are unsupported

 - use the correct alignment for literal EFI GUIDs

 - fix a use after unmap issue in the memreserve code

* tag 'efi-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: use 32-bit alignment for efi_guid_t literals
  firmware/efi: Fix a use after bug in efi_mem_reserve_persistent
  efivars: respect EFI_UNSUPPORTED return from firmware
2021-03-21 11:11:22 -07:00
Linus Torvalds
5e3ddf96e7 Merge tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
 "The freshest pile of shiny x86 fixes for 5.12:

   - Add the arch-specific mapping between physical and logical CPUs to
     fix devicetree-node lookups

   - Restore the IRQ2 ignore logic

   - Fix get_nr_restart_syscall() to return the correct restart syscall
     number. Split in a 4-patches set to avoid kABI breakage when
     backporting to dead kernels"

* tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/of: Fix CPU devicetree-node lookups
  x86/ioapic: Ignore IRQ2 again
  x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART
  x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall()
  x86: Move TS_COMPAT back to asm/thread_info.h
  kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
2021-03-21 11:04:20 -07:00
David S. Miller
e56c53d194 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2021-03-20

The following pull-request contains BPF updates for your *net* tree.

We've added 5 non-merge commits during the last 3 day(s) which contain
a total of 8 files changed, 155 insertions(+), 12 deletions(-).

The main changes are:

1) Use correct nops in fexit trampoline, from Stanislav.

2) Fix BTF dump, from Jean-Philippe.

3) Fix umd memory leak, from Zqiang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-20 15:18:06 -07:00
Linus Torvalds
0ada2dad8b Merge tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "Quieter week this time, which was both expected and desired. About
  half of the below is fixes for this release, the other half are just
  fixes in general. In detail:

   - Fix the freezing of IO threads, by making the freezer not send them
     fake signals. Make them freezable by default.

   - Like we did for personalities, move the buffer IDR to xarray. Kills
     some code and avoids a use-after-free on teardown.

   - SQPOLL cleanups and fixes (Pavel)

   - Fix linked timeout race (Pavel)

   - Fix potential completion post use-after-free (Pavel)

   - Cleanup and move internal structures outside of general kernel view
     (Stefan)

   - Use MSG_SIGNAL for send/recv from io_uring (Stefan)"

* tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block:
  io_uring: don't leak creds on SQO attach error
  io_uring: use typesafe pointers in io_uring_task
  io_uring: remove structures from include/linux/io_uring.h
  io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
  io_uring: fix sqpoll cancellation via task_work
  io_uring: add generic callback_head helpers
  io_uring: fix concurrent parking
  io_uring: halt SQO submission on ctx exit
  io_uring: replace sqd rw_semaphore with mutex
  io_uring: fix complete_post use ctx after free
  io_uring: fix ->flags races by linked timeouts
  io_uring: convert io_buffer_idr to XArray
  io_uring: allow IO worker threads to be frozen
  kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing
2021-03-19 17:01:09 -07:00
Zqiang
f60a85cad6 bpf: Fix umd memory leak in copy_process()
The syzbot reported a memleak as follows:

BUG: memory leak
unreferenced object 0xffff888101b41d00 (size 120):
  comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s)
  backtrace:
    [<ffffffff8125dc56>] alloc_pid+0x66/0x560
    [<ffffffff81226405>] copy_process+0x1465/0x25e0
    [<ffffffff81227943>] kernel_clone+0xf3/0x670
    [<ffffffff812281a1>] kernel_thread+0x61/0x80
    [<ffffffff81253464>] call_usermodehelper_exec_work
    [<ffffffff81253464>] call_usermodehelper_exec_work+0xc4/0x120
    [<ffffffff812591c9>] process_one_work+0x2c9/0x600
    [<ffffffff81259ab9>] worker_thread+0x59/0x5d0
    [<ffffffff812611c8>] kthread+0x178/0x1b0
    [<ffffffff8100227f>] ret_from_fork+0x1f/0x30

unreferenced object 0xffff888110ef5c00 (size 232):
  comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s)
  backtrace:
    [<ffffffff8154a0cf>] kmem_cache_zalloc
    [<ffffffff8154a0cf>] __alloc_file+0x1f/0xf0
    [<ffffffff8154a809>] alloc_empty_file+0x69/0x120
    [<ffffffff8154a8f3>] alloc_file+0x33/0x1b0
    [<ffffffff8154ab22>] alloc_file_pseudo+0xb2/0x140
    [<ffffffff81559218>] create_pipe_files+0x138/0x2e0
    [<ffffffff8126c793>] umd_setup+0x33/0x220
    [<ffffffff81253574>] call_usermodehelper_exec_async+0xb4/0x1b0
    [<ffffffff8100227f>] ret_from_fork+0x1f/0x30

After the UMD process exits, the pipe_to_umh/pipe_from_umh and
tgid need to be released.

Fixes: d71fa5c976 ("bpf: Add kernel module with user mode driver that populates bpffs.")
Reported-by: syzbot+44908bb56d2bfe56b28e@syzkaller.appspotmail.com
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210317030915.2865-1-qiang.zhang@windriver.com
2021-03-19 22:23:19 +01:00
Bhaskar Chowdhury
8a2dc6af67 sch_red: Fix a typo
s/recalcultion/recalculation/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-19 11:51:03 -07:00
Linus Torvalds
278924cb99 Merge tag 'trace-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull workqueue tracing fix from Steven Rostedt:
 "Fix workqueue trace event unsafe string reference

  After adding a verifier to test all strings printed in trace events to
  make sure they either point to a string on the ring buffer, or to read
  only core kernel memory, it triggered on a workqueue trace event. The
  trace event workqueue_queue_work references the allocated name of the
  workqueue in the output. If the workqueue is freed before the trace is
  read, then the trace will dereference freed memory.

  Update the trace event to use the __string(), __assign_str(), and
  __get_str() helpers to handle such cases"

* tag 'trace-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  workqueue/tracing: Copy workqueue name to buffer in trace event
2021-03-19 10:06:30 -07:00
Ingo Molnar
429257a430 Merge tag 'efi-urgent-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/urgent
Pull EFI fixes from Ard Biesheuvel:

 "- another missing RT_PROP table related fix, to ensure that the efivarfs
    pseudo filesystem fails gracefully if variable services are unsupported
  - use the correct alignment for literal EFI GUIDs
  - fix a use after unmap issue in the memreserve code"

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-19 14:23:46 +01:00
Ard Biesheuvel
fb98cc0b3a efi: use 32-bit alignment for efi_guid_t literals
Commit 494c704f9a ("efi: Use 32-bit alignment for efi_guid_t") updated
the type definition of efi_guid_t to ensure that it always appears
sufficiently aligned (the UEFI spec is ambiguous about this, but given
the fact that its EFI_GUID type is defined in terms of a struct carrying
a uint32_t, the natural alignment is definitely >= 32 bits).

However, we missed the EFI_GUID() macro which is used to instantiate
efi_guid_t literals: that macro is still based on the guid_t type,
which does not have a minimum alignment at all. This results in warnings
such as

  In file included from drivers/firmware/efi/mokvar-table.c:35:
  include/linux/efi.h:1093:34: warning: passing 1-byte aligned argument to
      4-byte aligned parameter 2 of 'get_var' may result in an unaligned pointer
      access [-Walign-mismatch]
          status = get_var(L"SecureBoot", &EFI_GLOBAL_VARIABLE_GUID, NULL, &size,
                                          ^
  include/linux/efi.h:1101:24: warning: passing 1-byte aligned argument to
      4-byte aligned parameter 2 of 'get_var' may result in an unaligned pointer
      access [-Walign-mismatch]
          get_var(L"SetupMode", &EFI_GLOBAL_VARIABLE_GUID, NULL, &size, &setupmode);

The distinction only matters on CPUs that do not support misaligned loads
fully, but 32-bit ARM's load-multiple instructions fall into that category,
and these are likely to be emitted by the compiler that built the firmware
for loading word-aligned 128-bit GUIDs from memory

So re-implement the initializer in terms of our own efi_guid_t type, so that
the alignment becomes a property of the literal's type.

Fixes: 494c704f9a ("efi: Use 32-bit alignment for efi_guid_t")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1327
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2021-03-19 07:44:28 +01:00
David S. Miller
84f4aced67 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Several patches to testore use of memory barriers instead of RCU to
   ensure consistent access to ruleset, from Mark Tomlinson.

2) Fix dump of expectation via ctnetlink, from Florian Westphal.

3) GRE helper works for IPv6, from Ludovic Senecaux.

4) Set error on unsupported flowtable flags.

5) Use delayed instead of deferrable workqueue in the flowtable,
   from Yinjun Zhang.

6) Fix spurious EEXIST in case of add-after-delete flowtable in
   the same batch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:19:06 -07:00
Linus Torvalds
8b12a62a4e Merge tag 'drm-fixes-2021-03-19' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "Regular fixes pull, pretty small set of fixes, a couple of i915 and
  amdgpu, one ttm, one nouveau and one omap. Probably smaller than usual
  for this time, so we'll see if something pops up next week or if this
  will continue to stay small.

  Summary:

  ttm:
   - Make ttm_bo_unpin() not wraparound on too many unpins

  omap:
   - Fix coccicheck warning in omap

  amdgpu:
   - DCN 3.0 gamma fixes
   - DCN 2.1 corrupt screen fix

  i915:
   - Workaround async flip + VT-d frame corruption on HSW/BDW
   - Fix NMI watchdog crash due to uninitialized OA buffer use on gen12+

  nouveau:
   - workaround oops with bo syncing"

* tag 'drm-fixes-2021-03-19' of git://anongit.freedesktop.org/drm/drm:
  nouveau: Skip unvailable ttm page entries
  drm/amd/display: Remove MPC gamut remap logic for DCN30
  drm/amd/display: Correct algorithm for reversed gamma
  drm/omap: dsi: fix unsigned expression compared with zero
  i915/perf: Start hrtimer only if sampling the OA buffer
  drm/i915: Workaround async flip + VT-d corruption on HSW/BDW
  drm/amd/display: Copy over soc values before bounding box creation
  drm/ttm: make ttm_bo_unpin more defensive
2021-03-18 17:57:34 -07:00
Dave Airlie
db3f0d8ee9 Merge tag 'drm-misc-fixes-2021-03-18' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.12-rc4:
- Make ttm_bo_unpin() not wraparound on too many unpins.
- Fix coccicheck warning in omap.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/a0e13bbb-6ba6-ff24-4db8-0e02e605de18@linux.intel.com
2021-03-19 10:21:48 +10:00
Linus Torvalds
dc0337999d Merge tag 'vfio-v5.12-rc4' of git://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:

 - Fix 32-bit issue with new unmap-all flag (Steve Sistare)

 - Various Kconfig changes for better coverage (Jason Gunthorpe)

 - Fix to batch pinning support (Daniel Jordan)

* tag 'vfio-v5.12-rc4' of git://github.com/awilliam/linux-vfio:
  vfio/type1: fix vaddr_get_pfns() return in vfio_pin_page_external()
  vfio: Depend on MMU
  ARM: amba: Allow some ARM_AMBA users to compile with COMPILE_TEST
  vfio-platform: Add COMPILE_TEST to VFIO_PLATFORM
  vfio: IOMMU_API should be selected
  vfio/type1: fix unmap all on ILP32
2021-03-18 12:37:05 -07:00
Linus Torvalds
bf152b0b41 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
 "Some fixes and cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost-vdpa: set v->config_ctx to NULL if eventfd_ctx_fdget() fails
  vhost-vdpa: fix use-after-free of v->config_ctx
  vhost: Fix vhost_vq_reset()
  vhost_vdpa: fix the missing irq_bypass_unregister_producer() invocation
  vdpa_sim: Skip typecasting from void*
  virtio: remove export for virtio_config_{enable, disable}
  virtio-mmio: Use to_virtio_mmio_device() to simply code
  vdpa: set the virtqueue num during register
2021-03-18 11:20:35 -07:00
Steven Rostedt (VMware)
83b62687a0 workqueue/tracing: Copy workqueue name to buffer in trace event
The trace event "workqueue_queue_work" references an unsafe string in
dereferencing the name of the workqueue. As the name is allocated, it
could later be freed, and the pointer to that string could stay on the
tracing buffer. If the trace buffer is read after the string is freed, it
will reference an unsafe pointer.

I added a new verifier to make sure that all strings referenced in the
output of the trace buffer is safe to read and this triggered on the
workqueue_queue_work trace event:

workqueue_queue_work: work struct=00000000b2b235c7 function=gc_worker workqueue=(0xffff888100051160:events_power_efficient)[UNSAFE-MEMORY] req_cpu=256 cpu=1
workqueue_queue_work: work struct=00000000c344caec function=flush_to_ldisc workqueue=(0xffff888100054d60:events_unbound)[UNSAFE-MEMORY] req_cpu=256 cpu=4294967295
workqueue_queue_work: work struct=00000000b2b235c7 function=gc_worker workqueue=(0xffff888100051160:events_power_efficient)[UNSAFE-MEMORY] req_cpu=256 cpu=1
workqueue_queue_work: work struct=000000000b238b3f function=vmstat_update workqueue=(0xffff8881000c3760:mm_percpu_wq)[UNSAFE-MEMORY] req_cpu=1 cpu=1

Also, if this event is read via a user space application like perf or
trace-cmd, the name would only be an address and useless information:

workqueue_queue_work: work struct=0xffff953f80b4b918 function=disk_events_workfn workqueue=ffff953f8005d378 req_cpu=8192 cpu=5

Cc: Zqiang <qiang.zhang@windriver.com>
Cc: Tejun Heo <tj@kernel.org>
Fixes: 7bf9c4a88e ("workqueue: tracing the name of the workqueue instead of it's address")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18 12:57:37 -04:00
Stefan Metzmacher
53e043b2b4 io_uring: remove structures from include/linux/io_uring.h
Link: https://lore.kernel.org/r/8c1d14f3748105f4caeda01716d47af2fa41d11c.1615809009.git.metze@samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-18 09:44:35 -06:00
David S. Miller
e65eaded4c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-03-18

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 4 day(s) which contain
a total of 14 files changed, 336 insertions(+), 94 deletions(-).

The main changes are:

1) Fix fexit/fmod_ret trampoline for sleepable programs, and also fix a ftrace
   splat in modify_ftrace_direct() on address change, from Alexei Starovoitov.

2) Fix two oob speculation possibilities that allows unprivileged to leak mem
   via side-channel, from Piotr Krysiuk and Daniel Borkmann.

3) Fix libbpf's netlink handling wrt SOCK_CLOEXEC, from Kumar Kartikeya Dwivedi.

4) Fix libbpf's error handling on failure in getting section names, from Namhyung Kim.

5) Fix tunnel collect_md BPF selftest wrt Geneve option handling, from Hangbin Liu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17 18:36:34 -07:00
Pablo Neira Ayuso
7b35582cd0 netfilter: nftables: allow to update flowtable flags
Honor flowtable flags from the control update path. Disallow disabling
to toggle hardware offload support though.

Fixes: 8bb69f3b29 ("netfilter: nf_tables: add flowtable offload control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:32:21 +01:00
Alexei Starovoitov
e21aa34178 bpf: Fix fexit trampoline.
The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
The synchronize_rcu_tasks() will not wait for such tasks to complete.
In such case the trampoline image will be freed and when the task
wakes up the return IP will point to freed memory causing the crash.
Solve this by adding percpu_ref_get/put for the duration of trampoline
and separate trampoline vs its image life times.
The "half page" optimization has to be removed, since
first_half->second_half->first_half transition cannot be guaranteed to
complete in deterministic time. Every trampoline update becomes a new image.
The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
call_rcu_tasks. Together they will wait for the original function and
trampoline asm to complete. The trampoline is patched from nop to jmp to skip
fexit progs. They are freed independently from the trampoline. The image with
fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
will wait for both sleepable and non-sleepable progs to complete.

Fixes: fec56f5890 ("bpf: Introduce BPF trampoline")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Paul E. McKenney <paulmck@kernel.org>  # for RCU
Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
2021-03-18 00:22:51 +01:00
Wei Wang
cb03835793 net: fix race between napi kthread mode and busy poll
Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
determine if the kthread owns this napi and could call napi->poll() on
it. However, if socket busy poll is enabled, it is possible that the
busy poll thread grabs this SCHED bit (after the previous napi->poll()
invokes napi_complete_done() and clears SCHED bit) and tries to poll
on the same napi. napi_disable() could grab the SCHED bit as well.
This patch tries to fix this race by adding a new bit
NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
____napi_schedule() if the threaded mode is enabled, and gets cleared
in napi_complete_done(), and we only poll the napi in kthread if this
bit is set. This helps distinguish the ownership of the napi between
kthread and other scenarios and fixes the race issue.

Fixes: 29863d41bb ("net: implement threaded-able napi poll loop support")
Reported-by: Martin Zaharinov <micron10@gmail.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17 14:31:17 -07:00
Alan Stern
546aa0e4ea usb-storage: Add quirk to defeat Kindle's automatic unload
Matthias reports that the Amazon Kindle automatically removes its
emulated media if it doesn't receive another SCSI command within about
one second after a SYNCHRONIZE CACHE.  It does so even when the host
has sent a PREVENT MEDIUM REMOVAL command.  The reason for this
behavior isn't clear, although it's not hard to make some guesses.

At any rate, the results can be unexpected for anyone who tries to
access the Kindle in an unusual fashion, and in theory they can lead
to data loss (for example, if one file is closed and synchronized
while other files are still in the middle of being written).

To avoid such problems, this patch creates a new usb-storage quirks
flag telling the driver always to issue a REQUEST SENSE following a
SYNCHRONIZE CACHE command, and adds an unusual_devs entry for the
Kindle with the flag set.  This is sufficient to prevent the Kindle
from doing its automatic unload, without interfering with proper
operation.

Another possible way to deal with this would be to increase the
frequency of TEST UNIT READY polling that the kernel normally carries
out for removable-media storage devices.  However that would increase
the overall load on the system and it is not as reliable, because the
user can override the polling interval.  Changing the driver's
behavior is safer and has minimal overhead.

CC: <stable@vger.kernel.org>
Reported-and-tested-by: Matthias Schwarzott <zzam@gentoo.org>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20210317190654.GA497856@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17 21:30:15 +01:00
Leon Romanovsky
6417f03132 module: remove never implemented MODULE_SUPPORTED_DEVICE
MODULE_SUPPORTED_DEVICE was added in pre-git era and never was
implemented. We can safely remove it, because the kernel has grown
to have many more reliable mechanisms to determine if device is
supported or not.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-17 13:16:18 -07:00
Waiman Long
bee645788e locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini()
In ww_acquire_init(), mutex_acquire() is gated by CONFIG_DEBUG_LOCK_ALLOC.
The dep_map in the ww_acquire_ctx structure is also gated by the
same config. However mutex_release() in ww_acquire_fini() is gated by
CONFIG_DEBUG_MUTEXES. It is possible to set CONFIG_DEBUG_MUTEXES without
setting CONFIG_DEBUG_LOCK_ALLOC though it is an unlikely configuration.
That may cause a compilation error as dep_map isn't defined in this
case. Fix this potential problem by enclosing mutex_release() inside
CONFIG_DEBUG_LOCK_ALLOC.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210316153119.13802-3-longman@redhat.com
2021-03-17 09:56:45 +01:00
wenxu
d29334c15d net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct
When openvswitch conntrack offload with act_ct action. The first rule
do conntrack in the act_ct in tc subsystem. And miss the next rule in
the tc and fallback to the ovs datapath but miss set post_ct flag
which will lead the ct_state_key with -trk flag.

Fixes: 7baf2429a1 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:22:18 -07:00
Oleg Nesterov
b2e9df850c x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART
Save the current_thread_info()->status of X86 in the new
restart_block->arch_data field so TS_COMPAT_RESTART can be removed again.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210201174716.GA17898@redhat.com
2021-03-16 22:13:11 +01:00
Oleg Nesterov
5abbe51a52 kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
Preparation for fixing get_nr_restart_syscall() on X86 for COMPAT.

Add a new helper which sets restart_block->fn and calls a dummy
arch_set_restart_data() helper.

Fixes: 609c19a385 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210201174641.GA17871@redhat.com
2021-03-16 22:13:10 +01:00
Linus Torvalds
1df27313f5 Merge tag 'fuse-fixes-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
 "Fix a deadlock and a couple of other bugs"

* tag 'fuse-fixes-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: 32-bit user space ioctl compat for fuse device
  virtiofs: Fail dax mount if device does not support it
  fuse: fix live lock in fuse_iget()
2021-03-16 10:29:45 -07:00
Linus Torvalds
4108e10197 Merge tag 'nfsd-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
 "Miscellaneous NFSD fixes for v5.12-rc"

* tag 'nfsd-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  svcrdma: Revert "svcrdma: Reduce Receive doorbell rate"
  NFSD: fix error handling in NFSv4.0 callbacks
  NFSD: fix dest to src mount in inter-server COPY
  Revert "nfsd4: a client's own opens needn't prevent delegations"
  Revert "nfsd4: remove check_conflicting_opens warning"
  rpc: fix NULL dereference on kmalloc failure
  sunrpc: fix refcount leak for rpc auth modules
  NFSD: Repair misuse of sv_lock in 5.10.16-rt30.
  nfsd: don't abort copies early
  fs: nfsd: fix kconfig dependency warning for NFSD_V4
  svcrdma: disable timeouts on rdma backchannel
  nfsd: Don't keep looking up unhashed files in the nfsd file cache
2021-03-16 10:22:50 -07:00
Jason Gunthorpe
3b49dfb08c ARM: amba: Allow some ARM_AMBA users to compile with COMPILE_TEST
CONFIG_VFIO_AMBA has a light use of AMBA, adding some inline fallbacks
when AMBA is disabled will allow it to be compiled under COMPILE_TEST and
make VFIO easier to maintain.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <3-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:39:28 -06:00
Alessio Balsini
f8425c9396 fuse: 32-bit user space ioctl compat for fuse device
With a 64-bit kernel build the FUSE device cannot handle ioctl requests
coming from 32-bit user space.  This is due to the ioctl command
translation that generates different command identifiers that thus cannot
be used for direct comparisons without proper manipulation.

Explicitly extract type and number from the ioctl command to enable 32-bit
user space compatibility on 64-bit kernel builds.

Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-03-16 15:20:16 +01:00
Martin Willi
3a5ca85707 can: dev: Move device back to init netns on owning netns delete
When a non-initial netns is destroyed, the usual policy is to delete
all virtual network interfaces contained, but move physical interfaces
back to the initial netns. This keeps the physical interface visible
on the system.

CAN devices are somewhat special, as they define rtnl_link_ops even
if they are physical devices. If a CAN interface is moved into a
non-initial netns, destroying that netns lets the interface vanish
instead of moving it back to the initial netns. default_device_exit()
skips CAN interfaces due to having rtnl_link_ops set. Reproducer:

  ip netns add foo
  ip link set can0 netns foo
  ip netns delete foo

WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
Workqueue: netns cleanup_net
[<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14)
[<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8)
[<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114)
[<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac)
[<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60)
[<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380)
[<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438)
[<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8)
[<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c)
[<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)

To properly restore physical CAN devices to the initial netns on owning
netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
For CAN devices setting this flag, default_device_exit() considers them
non-virtual, applying the usual namespace move.

The issue was introduced in the commit mentioned below, as at that time
CAN devices did not have a dellink() operation.

Fixes: e008b5fc8d ("net: Simplfy default_device_exit and improve batching.")
Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-16 08:40:04 +01:00
Alexander Ovechkin
7233da8669 tcp: relookup sock for RST+ACK packets handled by obsolete req sock
Currently tcp_check_req can be called with obsolete req socket for which big
socket have been already created (because of CPU race or early demux
assigning req socket to multiple packets in gro batch).

Commit e0f9759f53 ("tcp: try to keep packet if SYN_RCV race
is lost") added retry in case when tcp_check_req is called for PSH|ACK packet.
But if client sends RST+ACK immediatly after connection being
established (it is performing healthcheck, for example) retry does not
occur. In that case tcp_check_req tries to close req socket,
leaving big socket active.

Fixes: e0f9759f53 ("tcp: try to keep packet if SYN_RCV race is lost")
Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru>
Reported-by: Oleg Senin <olegsenin@yandex-team.ru>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 14:34:29 -07:00
Mark Tomlinson
175e476b8c netfilter: x_tables: Use correct memory barriers.
When a new table value was assigned, it was followed by a write memory
barrier. This ensured that all writes before this point would complete
before any writes after this point. However, to determine whether the
rules are unused, the sequence counter is read. To ensure that all
writes have been done before these reads, a full memory barrier is
needed, not just a write memory barrier. The same argument applies when
incrementing the counter, before the rules are read.

Changing to using smp_mb() instead of smp_wmb() fixes the kernel panic
reported in cc00bcaa58 (which is still present), while still
maintaining the same speed of replacing tables.

The smb_mb() barriers potentially slow the packet path, however testing
has shown no measurable change in performance on a 4-core MIPS64
platform.

Fixes: 7f5c6d4f66 ("netfilter: get rid of atomic ops in fast path")
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-15 18:42:01 +01:00
Mark Tomlinson
d3d40f2374 Revert "netfilter: x_tables: Switch synchronization to RCU"
This reverts commit cc00bcaa58.

This (and the preceding) patch basically re-implemented the RCU
mechanisms of patch 784544739a. That patch was replaced because of the
performance problems that it created when replacing tables. Now, we have
the same issue: the call to synchronize_rcu() makes replacing tables
slower by as much as an order of magnitude.

Prior to using RCU a script calling "iptables" approx. 200 times was
taking 1.16s. With RCU this increased to 11.59s.

Revert these patches and fix the issue in a different way.

Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-15 18:42:01 +01:00
Christian König
6c5403173a drm/ttm: make ttm_bo_unpin more defensive
We seem to have some more driver bugs than thought.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: deb0814b43 ("drm/ttm: add ttm_bo_pin()/ttm_bo_unpin() v2")
Acked-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210312093810.2202-1-christian.koenig@amd.com
2021-03-15 16:02:30 +01:00
Linus Torvalds
70404fe303 Merge tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "A set of irqchip updates:

   - Make the GENERIC_IRQ_MULTI_HANDLER configuration correct

   - Add a missing DT compatible string for the Ingenic driver

   - Remove the pointless debugfs_file pointer from struct irqdomain"

* tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/ingenic: Add support for the JZ4760
  dt-bindings/irq: Add compatible string for the JZ4760B
  irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER
  ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly
  irqdomain: Remove debugfs_file from struct irq_domain
2021-03-14 13:33:33 -07:00
Linus Torvalds
fa509ff879 Merge tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
 "A couple of locking fixes:

   - A fix for the static_call mechanism so it handles unaligned
     addresses correctly.

   - Make u64_stats_init() a macro so every instance gets a seperate
     lockdep key.

   - Make seqcount_latch_init() a macro as well to preserve the static
     variable which is used for the lockdep key"

* tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  seqlock,lockdep: Fix seqcount_latch_init()
  u64_stats,lockdep: Fix u64_stats_init() vs lockdep
  static_call: Fix the module key fixup
2021-03-14 13:03:21 -07:00
Linus Torvalds
75013c6c52 Merge tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:

 - Make sure PMU internal buffers are flushed for per-CPU events too and
   properly handle PID/TID for large PEBS.

 - Handle the case properly when there's no PMU and therefore return an
   empty list of perf MSRs for VMX to switch instead of reading random
   garbage from the stack.

* tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case
  perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
  perf/core: Flush PMU internal buffers for per-CPU events
2021-03-14 12:57:17 -07:00
Linus Torvalds
9d0c8e793f Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "More fixes for ARM and x86"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: LAPIC: Advancing the timer expiration on guest initiated write
  KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
  KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
  kvm: x86: annotate RCU pointers
  KVM: arm64: Fix exclusive limit for IPA size
  KVM: arm64: Reject VM creation when the default IPA size is unsupported
  KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
  KVM: arm64: Don't use cbz/adr with external symbols
  KVM: arm64: Fix range alignment when walking page tables
  KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
  KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
  KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
  KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
  KVM: arm64: Fix nVHE hyp panic host context restore
  KVM: arm64: Avoid corrupting vCPU context register in guest exit
  KVM: arm64: nvhe: Save the SPE context early
  kvm: x86: use NULL instead of using plain integer as pointer
  KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
  KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
2021-03-14 12:35:02 -07:00