Commit Graph

135904 Commits

Author SHA1 Message Date
Christoph Hellwig
27674ef6c7 mm: remove the extra ZONE_DEVICE struct page refcount
ZONE_DEVICE struct pages have an extra reference count that complicates
the code for put_page() and several places in the kernel that need to
check the reference count to see that a page is not being used (gup,
compaction, migration, etc.). Clean up the code so the reference count
doesn't need to be treated specially for ZONE_DEVICE pages.

Note that this excludes the special idle page wakeup for fsdax pages,
which still happens at refcount 1.  This is a separate issue and will
be sorted out later.  Given that only fsdax pages require the
notifiacation when the refcount hits 1 now, the PAGEMAP_OPS Kconfig
symbol can go away and be replaced with a FS_DAX check for this hook
in the put_page fastpath.

Based on an earlier patch from Ralph Campbell <rcampbell@nvidia.com>.

Link: https://lkml.kernel.org/r/20220210072828.2930359-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Christian Knig <christian.koenig@amd.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03 12:47:33 -05:00
Christoph Hellwig
dc90f0846d mm: don't include <linux/memremap.h> in <linux/mm.h>
Move the check for the actual pgmap types that need the free at refcount
one behavior into the out of line helper, and thus avoid the need to
pull memremap.h into mm.h.

Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03 12:47:33 -05:00
Christoph Hellwig
895749455f mm: simplify freeing of devmap managed pages
Make put_devmap_managed_page return if it took charge of the page
or not and remove the separate page_is_devmap_managed helper.

Link: https://lkml.kernel.org/r/20220210072828.2930359-6-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian Knig <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03 12:47:33 -05:00
Christoph Hellwig
75e55d8a10 mm: move free_devmap_managed_page to memremap.c
free_devmap_managed_page has nothing to do with the code in swap.c,
move it to live with the rest of the code for devmap handling.

Link: https://lkml.kernel.org/r/20220210072828.2930359-5-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian Knig <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03 12:47:33 -05:00
Christoph Hellwig
730ff52194 mm: remove pointless includes from <linux/hmm.h>
hmm.h pulls in the world for no good reason at all.  Remove the
includes and push a few ones into the users instead.

Link: https://lkml.kernel.org/r/20220210072828.2930359-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian Knig <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03 12:47:33 -05:00
Christoph Hellwig
5c3f1f9cc4 mm: remove the __KERNEL__ guard from <linux/mm.h>
__KERNEL__ ifdefs don't make sense outside of include/uapi/.

Link: https://lkml.kernel.org/r/20220210072828.2930359-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian Knig <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03 12:47:33 -05:00
Hugh Dickins
07ca760673 mm/munlock: maintain page->mlock_count while unevictable
Previous patches have been preparatory: now implement page->mlock_count.
The ordering of the "Unevictable LRU" is of no significance, and there is
no point holding unevictable pages on a list: place page->mlock_count to
overlay page->lru.prev (since page->lru.next is overlaid by compound_head,
which needs to be even so as not to satisfy PageTail - though 2 could be
added instead of 1 for each mlock, if that's ever an improvement).

But it's only safe to rely on or modify page->mlock_count while lruvec
lock is held and page is on unevictable "LRU" - we can save lots of edits
by continuing to pretend that there's an imaginary LRU here (there is an
unevictable count which still needs to be maintained, but not a list).

The mlock_count technique suffers from an unreliability much like with
page_mlock(): while someone else has the page off LRU, not much can
be done.  As before, err on the safe side (behave as if mlock_count 0),
and let try_to_unlock_one() move the page to unevictable if reclaim finds
out later on - a few misplaced pages don't matter, what we want to avoid
is imbalancing reclaim by flooding evictable lists with unevictable pages.

I am not a fan of "if (!isolate_lru_page(page)) putback_lru_page(page);":
if we have taken lruvec lock to get the page off its present list, then
we save everyone trouble (and however many extra atomic ops) by putting
it on its destination list immediately.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-02-17 11:56:57 -05:00
Hugh Dickins
cea86fe246 mm/munlock: rmap call mlock_vma_page() munlock_vma_page()
Add vma argument to mlock_vma_page() and munlock_vma_page(), make them
inline functions which check (vma->vm_flags & VM_LOCKED) before calling
mlock_page() and munlock_page() in mm/mlock.c.

Add bool compound to mlock_vma_page() and munlock_vma_page(): this is
because we have understandable difficulty in accounting pte maps of THPs,
and if passed a PageHead page, mlock_page() and munlock_page() cannot
tell whether it's a pmd map to be counted or a pte map to be ignored.

Add vma arg to page_add_file_rmap() and page_remove_rmap(), like the
others, and use that to call mlock_vma_page() at the end of the page
adds, and munlock_vma_page() at the end of page_remove_rmap() (end or
beginning? unimportant, but end was easier for assertions in testing).

No page lock is required (although almost all adds happen to hold it):
delete the "Serialize with page migration" BUG_ON(!PageLocked(page))s.
Certainly page lock did serialize with page migration, but I'm having
difficulty explaining why that was ever important.

Mlock accounting on THPs has been hard to define, differed between anon
and file, involved PageDoubleMap in some places and not others, required
clear_page_mlock() at some points.  Keep it simple now: just count the
pmds and ignore the ptes, there is no reason for ptes to undo pmd mlocks.

page_add_new_anon_rmap() callers unchanged: they have long been calling
lru_cache_add_inactive_or_unevictable(), which does its own VM_LOCKED
handling (it also checks for not VM_SPECIAL: I think that's overcautious,
and inconsistent with other checks, that mmap_region() already prevents
VM_LOCKED on VM_SPECIAL; but haven't quite convinced myself to change it).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-02-17 11:56:48 -05:00
Hugh Dickins
b67bf49ce7 mm/munlock: delete FOLL_MLOCK and FOLL_POPULATE
If counting page mlocks, we must not double-count: follow_page_pte() can
tell if a page has already been Mlocked or not, but cannot tell if a pte
has already been counted or not: that will have to be done when the pte
is mapped in (which lru_cache_add_inactive_or_unevictable() already tracks
for new anon pages, but there's no such tracking yet for others).

Delete all the FOLL_MLOCK code - faulting in the missing pages will do
all that is necessary, without special mlock_vma_page() calls from here.

But then FOLL_POPULATE turns out to serve no purpose - it was there so
that its absence would tell faultin_page() not to faultin page when
setting up VM_LOCKONFAULT areas; but if there's no special work needed
here for mlock, then there's no work at all here for VM_LOCKONFAULT.

Have I got that right?  I've not looked into the history, but see that
FOLL_POPULATE goes back before VM_LOCKONFAULT: did it serve a different
purpose before?  Ah, yes, it was used to skip the old stack guard page.

And is it intentional that COW is not broken on existing pages when
setting up a VM_LOCKONFAULT area?  I can see that being argued either
way, and have no reason to disagree with current behaviour.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-02-17 11:56:36 -05:00
Hugh Dickins
ebcbc6ea7d mm/munlock: delete page_mlock() and all its works
We have recommended some applications to mlock their userspace, but that
turns out to be counter-productive: when many processes mlock the same
file, contention on rmap's i_mmap_rwsem can become intolerable at exit: it
is needed for write, to remove any vma mapping that file from rmap's tree;
but hogged for read by those with mlocks calling page_mlock() (formerly
known as try_to_munlock()) on *each* page mapped from the file (the
purpose being to find out whether another process has the page mlocked,
so therefore it should not be unmlocked yet).

Several optimizations have been made in the past: one is to skip
page_mlock() when mapcount tells that nothing else has this page
mapped; but that doesn't help at all when others do have it mapped.
This time around, I initially intended to add a preliminary search
of the rmap tree for overlapping VM_LOCKED ranges; but that gets
messy with locking order, when in doubt whether a page is actually
present; and risks adding even more contention on the i_mmap_rwsem.

A solution would be much easier, if only there were space in struct page
for an mlock_count... but actually, most of the time, there is space for
it - an mlocked page spends most of its life on an unevictable LRU, but
since 3.18 removed the scan_unevictable_pages sysctl, that "LRU" has
been redundant.  Let's try to reuse its page->lru.

But leave that until a later patch: in this patch, clear the ground by
removing page_mlock(), and all the infrastructure that has gathered
around it - which mostly hinders understanding, and will make reviewing
new additions harder.  Don't mind those old comments about THPs, they
date from before 4.5's refcounting rework: splitting is not a risk here.

Just keep a minimal version of munlock_vma_page(), as reminder of what it
should attend to (in particular, the odd way PGSTRANDED is counted out of
PGMUNLOCKED), and likewise a stub for munlock_vma_pages_range().  Move
unchanged __mlock_posix_error_return() out of the way, down to above its
caller: this series then makes no further change after mlock_fixup().

After this and each following commit, the kernel builds, boots and runs;
but with deficiencies which may show up in testing of mlock and munlock.
The system calls succeed or fail as before, and mlock remains effective
in preventing page reclaim; but meminfo's Unevictable and Mlocked amounts
may be shown too low after mlock, grow, then stay too high after munlock:
with previously mlocked pages remaining unevictable for too long, until
finally unmapped and freed and counts corrected. Normal service will be
resumed in "mm/munlock: mlock_pte_range() when mlocking or munlocking".

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-02-17 11:56:13 -05:00
Linus Torvalds
c24449b321 Merge tag 'hyperv-fixes-signed-20220215' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:

 - Rework use of DMA_BIT_MASK in vmbus to work around a clang bug
   (Michael Kelley)

 - Fix NUMA topology (Long Li)

 - Fix a memory leak in vmbus (Miaoqian Lin)

 - One minor clean-up patch (Cai Huoqing)

* tag 'hyperv-fixes-signed-20220215' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  Drivers: hv: utils: Make use of the helper macro LIST_HEAD()
  Drivers: hv: vmbus: Rework use of DMA_BIT_MASK(64)
  Drivers: hv: vmbus: Fix memory leak in vmbus_add_channel_kobj
  PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology
2022-02-15 09:05:01 -08:00
Linus Torvalds
42964a18f8 Merge tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
 "Fix a case where objtool would mistakenly warn about instructions
  being unreachable"

* tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm
2022-02-13 09:43:34 -08:00
Linus Torvalds
9917ff5f31 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "5 patches.

  Subsystems affected by this patch series: binfmt, procfs, and mm
  (vmscan, memcg, and kfence)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kfence: make test case compatible with run time set sample interval
  mm: memcg: synchronize objcg lists with a dedicated spinlock
  mm: vmscan: remove deadlock due to throttling failing to make progress
  fs/proc: task_mmu.c: don't read mapcount for migration entry
  fs/binfmt_elf: fix PT_LOAD p_align values for loaders
2022-02-12 08:57:37 -08:00
Peng Liu
8913c61001 kfence: make test case compatible with run time set sample interval
The parameter kfence_sample_interval can be set via boot parameter and
late shell command, which is convenient for automated tests and KFENCE
parameter optimization.  However, KFENCE test case just uses
compile-time CONFIG_KFENCE_SAMPLE_INTERVAL, which will make KFENCE test
case not run as users desired.  Export kfence_sample_interval, so that
KFENCE test case can use run-time-set sample interval.

Link: https://lkml.kernel.org/r/20220207034432.185532-1-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Christian Knig <christian.koenig@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-11 17:55:00 -08:00
Roman Gushchin
0764db9b49 mm: memcg: synchronize objcg lists with a dedicated spinlock
Alexander reported a circular lock dependency revealed by the mmap1 ltp
test:

  LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
          WARNING: possible circular locking dependency detected
          5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
          ------------------------------------------------------
          mmap1/202299 is trying to acquire lock:
          00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
          but task is already holding lock:
          00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
          which lock already depends on the new lock.
          the existing dependency chain (in reverse order) is:
          -> #1 (&sighand->siglock){-.-.}-{2:2}:
                 __lock_acquire+0x604/0xbd8
                 lock_acquire.part.0+0xe2/0x238
                 lock_acquire+0xb0/0x200
                 _raw_spin_lock_irqsave+0x6a/0xd8
                 __lock_task_sighand+0x90/0x190
                 cgroup_freeze_task+0x2e/0x90
                 cgroup_migrate_execute+0x11c/0x608
                 cgroup_update_dfl_csses+0x246/0x270
                 cgroup_subtree_control_write+0x238/0x518
                 kernfs_fop_write_iter+0x13e/0x1e0
                 new_sync_write+0x100/0x190
                 vfs_write+0x22c/0x2d8
                 ksys_write+0x6c/0xf8
                 __do_syscall+0x1da/0x208
                 system_call+0x82/0xb0
          -> #0 (css_set_lock){..-.}-{2:2}:
                 check_prev_add+0xe0/0xed8
                 validate_chain+0x736/0xb20
                 __lock_acquire+0x604/0xbd8
                 lock_acquire.part.0+0xe2/0x238
                 lock_acquire+0xb0/0x200
                 _raw_spin_lock_irqsave+0x6a/0xd8
                 obj_cgroup_release+0x4a/0xe0
                 percpu_ref_put_many.constprop.0+0x150/0x168
                 drain_obj_stock+0x94/0xe8
                 refill_obj_stock+0x94/0x278
                 obj_cgroup_charge+0x164/0x1d8
                 kmem_cache_alloc+0xac/0x528
                 __sigqueue_alloc+0x150/0x308
                 __send_signal+0x260/0x550
                 send_signal+0x7e/0x348
                 force_sig_info_to_task+0x104/0x180
                 force_sig_fault+0x48/0x58
                 __do_pgm_check+0x120/0x1f0
                 pgm_check_handler+0x11e/0x180
          other info that might help us debug this:
           Possible unsafe locking scenario:
                 CPU0                    CPU1
                 ----                    ----
            lock(&sighand->siglock);
                                         lock(css_set_lock);
                                         lock(&sighand->siglock);
            lock(css_set_lock);
           *** DEADLOCK ***
          2 locks held by mmap1/202299:
           #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
           #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
          stack backtrace:
          CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
          Hardware name: IBM 3906 M04 704 (LPAR)
          Call Trace:
            dump_stack_lvl+0x76/0x98
            check_noncircular+0x136/0x158
            check_prev_add+0xe0/0xed8
            validate_chain+0x736/0xb20
            __lock_acquire+0x604/0xbd8
            lock_acquire.part.0+0xe2/0x238
            lock_acquire+0xb0/0x200
            _raw_spin_lock_irqsave+0x6a/0xd8
            obj_cgroup_release+0x4a/0xe0
            percpu_ref_put_many.constprop.0+0x150/0x168
            drain_obj_stock+0x94/0xe8
            refill_obj_stock+0x94/0x278
            obj_cgroup_charge+0x164/0x1d8
            kmem_cache_alloc+0xac/0x528
            __sigqueue_alloc+0x150/0x308
            __send_signal+0x260/0x550
            send_signal+0x7e/0x348
            force_sig_info_to_task+0x104/0x180
            force_sig_fault+0x48/0x58
            __do_pgm_check+0x120/0x1f0
            pgm_check_handler+0x11e/0x180
          INFO: lockdep is turned off.

In this example a slab allocation from __send_signal() caused a
refilling and draining of a percpu objcg stock, resulted in a releasing
of another non-related objcg.  Objcg release path requires taking the
css_set_lock, which is used to synchronize objcg lists.

This can create a circular dependency with the sighandler lock, which is
taken with the locked css_set_lock by the freezer code (to freeze a
task).

In general it seems that using css_set_lock to synchronize objcg lists
makes any slab allocations and deallocation with the locked css_set_lock
and any intervened locks risky.

To fix the problem and make the code more robust let's stop using
css_set_lock to synchronize objcg lists and use a new dedicated spinlock
instead.

Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
Fixes: bf4f059954 ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-11 17:55:00 -08:00
Linus Torvalds
83e3966411 Merge tag 'soc-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
 "This is a fairly large set of bugfixes, most of which had been sent a
  while ago but only now made it into the soc tree:

  Maintainer file updates:

   - Claudiu Beznea now co-maintains the at91 soc family, replacing
     Ludovic Desroches.

   - Michael Walle maintains the sl28cpld drivers

   - Alain Volmat and Raphael Gallais-Pou take over some drivers for ST
     platforms

   - Alim Akhtar is an additional reviewer for Samsung platforms

  Code fixes:

   - Op-tee had a problem with object lifetime that needs a slightly
     complex fix, as well as another bug with error handling.

   - Several minor issues for the OMAP platform, including a regression
     with the timer

   - A Kconfig change to fix a build-time issue on Intel SoCFPGA

  Device tree fixes:

   - The Amlogic Meson platform fixes a boot regression on am1-odroid, a
     spurious interrupt, and a problem with reserved memory regions

   - In the i.MX platform, several bug fixes are needed to make devices
     work correctly: SD card detection, alarmtimer, and sound card on
     some board. One patch for the GPU got in there by accident and gets
     reverted again.

   - TI K3 needs a fix for J721S2 serial port numbers

   - ux500 needs a fix to mount the SD card as root on the Skomer phone"

* tag 'soc-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (46 commits)
  Revert "arm64: dts: imx8mn-venice-gw7902: disable gpu"
  arm64: Remove ARCH_VULCAN
  MAINTAINERS: add myself as a maintainer for the sl28cpld
  MAINTAINERS: add IRC to ARM sub-architectures and Devicetree
  MAINTAINERS: arm: samsung: add Git tree and IRC
  ARM: dts: Fix boot regression on Skomer
  ARM: dts: spear320: Drop unused and undocumented 'irq-over-gpio' property
  soc: aspeed: lpc-ctrl: Block error printing on probe defer cases
  docs/ABI: testing: aspeed-uart-routing: Escape asterisk
  MAINTAINERS: update drm/stm drm/sti and cec/sti maintainers
  MAINTAINERS: Update Benjamin Gaignard maintainer status
  ARM: socfpga: fix missing RESET_CONTROLLER
  arm64: dts: meson-sm1-odroid: fix boot loop after reboot
  arm64: dts: meson-g12: drop BL32 region from SEI510/SEI610
  arm64: dts: meson-g12: add ATF BL32 reserved-memory region
  arm64: dts: meson-gx: add ATF BL32 reserved-memory region
  arm64: dts: meson-sm1-bananapi-m5: fix wrong GPIO domain for GPIOE_2
  arm64: dts: meson-sm1-odroid: use correct enable-gpio pin for tf-io regulator
  arm64: dts: meson-g12b-odroid-n2: fix typo 'dio2133'
  optee: use driver internal tee_context for some rpc
  ...
2022-02-11 13:40:03 -08:00
Linus Torvalds
883fd0aba1 Merge tag 'acpi-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
 "These revert two commits that turned out to be problematic and fix two
  issues related to wakeup from suspend-to-idle on x86.

  Specifics:

   - Revert a recent change that attempted to avoid issues with
     conflicting address ranges during PCI initialization, because it
     turned out to introduce a regression (Hans de Goede).

   - Revert a change that limited EC GPE wakeups from suspend-to-idle to
     systems based on Intel hardware, because it turned out that systems
     based on hardware from other vendors depended on that functionality
     too (Mario Limonciello).

   - Fix two issues related to the handling of wakeup interrupts and
     wakeup events signaled through the EC GPE during suspend-to-idle on
     x86 (Rafael Wysocki)"

* tag 'acpi-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  x86/PCI: revert "Ignore E820 reservations for bridge windows on newer systems"
  PM: s2idle: ACPI: Fix wakeup interrupts handling
  ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE
  ACPI: PM: Revert "Only mark EC GPE for wakeup on Intel systems"
2022-02-11 11:48:13 -08:00
Linus Torvalds
f1baf68e13 Merge tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter and can.

Current release - new code bugs:

   - sparx5: fix get_stat64 out-of-bound access and crash

   - smc: fix netdev ref tracker misuse

  Previous releases - regressions:

   - eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid
     overflows

   - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP
     over IP

   - bonding: fix rare link activation misses in 802.3ad mode

  Previous releases - always broken:

   - tcp: fix tcp sock mem accounting in zero-copy corner cases

   - remove the cached dst when uncloning an skb dst and its metadata,
     since we only have one ref it'd lead to an UaF

   - netfilter:
      - conntrack: don't refresh sctp entries in closed state
      - conntrack: re-init state for retransmitted syn-ack, avoid
        connection establishment getting stuck with strange stacks
      - ctnetlink: disable helper autoassign, avoid it getting lost
      - nft_payload: don't allow transport header access for fragments

   - dsa: fix use of devres for mdio throughout drivers

   - eth: amd-xgbe: disable interrupts during pci removal

   - eth: dpaa2-eth: unregister netdev before disconnecting the PHY

   - eth: ice: fix IPIP and SIT TSO offload"

* tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
  net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
  net: mscc: ocelot: fix mutex lock error during ethtool stats read
  ice: Avoid RTNL lock when re-creating auxiliary device
  ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
  ice: fix IPIP and SIT TSO offload
  ice: fix an error code in ice_cfg_phy_fec()
  net: mpls: Fix GCC 12 warning
  dpaa2-eth: unregister the netdev before disconnecting from the PHY
  skbuff: cleanup double word in comment
  net: macb: Align the dma and coherent dma masks
  mptcp: netlink: process IPv6 addrs in creating listening sockets
  selftests: mptcp: add missing join check
  net: usb: qmi_wwan: Add support for Dell DW5829e
  vlan: move dev_put into vlan_dev_uninit
  vlan: introduce vlan_dev_free_egress_priority
  ax25: fix UAF bugs of net_device caused by rebinding operation
  net: dsa: fix panic when DSA master device unbinds on shutdown
  net: amd-xgbe: disable interrupts during pci removal
  tipc: rate limit warning for received illegal binding update
  net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
  ...
2022-02-10 16:01:22 -08:00
Linus Torvalds
f4bc5bbb5f Merge tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull more nfsd fixes from Chuck Lever:
 "Ensure that NFS clients cannot send file size or offset values that
  can cause the NFS server to crash or to return incorrect or surprising
  results.

  In particular, fix how the NFS server handles values larger than
  OFFSET_MAX"

* tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Deprecate NFS_OFFSET_MAX
  NFSD: Fix offset type in I/O trace points
  NFSD: COMMIT operations must not return NFS?ERR_INVAL
  NFSD: Clamp WRITE offsets
  NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes
  NFSD: Fix ia_size underflow
  NFSD: Fix the behavior of READ near OFFSET_MAX
2022-02-09 09:56:57 -08:00
Chuck Lever
c306d73769 NFSD: Deprecate NFS_OFFSET_MAX
NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there
was a kernel-wide OFFSET_MAX value. As a clean up, replace the last
few uses of it with its generic equivalent, and get rid of it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09 09:24:40 -05:00
Antoine Tenart
9eeabdf17f net: fix a memleak when uncloning an skb dst and its metadata
When uncloning an skb dst and its associated metadata, a new
dst+metadata is allocated and later replaces the old one in the skb.
This is helpful to have a non-shared dst+metadata attached to a specific
skb.

The issue is the uncloned dst+metadata is initialized with a refcount of
1, which is increased to 2 before attaching it to the skb. When
tun_dst_unclone returns, the dst+metadata is only referenced from a
single place (the skb) while its refcount is 2. Its refcount will never
drop to 0 (when the skb is consumed), leading to a memory leak.

Fix this by removing the call to dst_hold in tun_dst_unclone, as the
dst+metadata refcount is already 1.

Fixes: fc4099f172 ("openvswitch: Fix egress tunnel info.")
Cc: Pravin B Shelar <pshelar@ovn.org>
Reported-by: Vlad Buslov <vladbu@nvidia.com>
Tested-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09 11:41:47 +00:00
Antoine Tenart
cfc56f85e7 net: do not keep the dst cache when uncloning an skb dst and its metadata
When uncloning an skb dst and its associated metadata a new dst+metadata
is allocated and the tunnel information from the old metadata is copied
over there.

The issue is the tunnel metadata has references to cached dst, which are
copied along the way. When a dst+metadata refcount drops to 0 the
metadata is freed including the cached dst entries. As they are also
referenced in the initial dst+metadata, this ends up in UaFs.

In practice the above did not happen because of another issue, the
dst+metadata was never freed because its refcount never dropped to 0
(this will be fixed in a subsequent patch).

Fix this by initializing the dst cache after copying the tunnel
information from the old metadata to also unshare the dst cache.

Fixes: d71785ffc7 ("net: add dst_cache to ovs vxlan lwtunnel")
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: Vlad Buslov <vladbu@nvidia.com>
Tested-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09 11:41:47 +00:00
Linus Torvalds
e6251ab455 Merge tag 'nfs-for-5.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
 "Stable Fixes:

   - Fix initialization of nfs_client cl_flags

  Other Fixes:

   - Fix performance issues with uncached readdir calls

   - Fix potential pointer dereferences in rpcrdma_ep_create

   - Fix nfs4_proc_get_locations() kernel-doc comment

   - Fix locking during sunrpc sysfs reads

   - Update my email address in the MAINTAINERS file to my new
     kernel.org email"

* tag 'nfs-for-5.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  SUNRPC: lock against ->sock changing during sysfs read
  MAINTAINERS: Update my email address
  NFS: Fix nfs4_proc_get_locations() kernel-doc comment
  xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create
  NFS: Fix initialisation of nfs_client cl_flags field
  NFS: Avoid duplicate uncached readdir calls on eof
  NFS: Don't skip directory entries when doing uncached readdir
  NFS: Don't overfill uncached readdir pages
2022-02-08 12:03:07 -08:00
Rafael J. Wysocki
cb1f65c1e1 PM: s2idle: ACPI: Fix wakeup interrupts handling
After commit e3728b50cd ("ACPI: PM: s2idle: Avoid possible race
related to the EC GPE") wakeup interrupts occurring immediately after
the one discarded by acpi_s2idle_wake() may be missed.  Moreover, if
the SCI triggers again immediately after the rearming in
acpi_s2idle_wake(), that wakeup may be missed too.

The problem is that pm_system_irq_wakeup() only calls pm_system_wakeup()
when pm_wakeup_irq is 0, but that's not the case any more after the
interrupt causing acpi_s2idle_wake() to run until pm_wakeup_irq is
cleared by the pm_wakeup_clear() call in s2idle_loop().  However,
there may be wakeup interrupts occurring in that time frame and if
that happens, they will be missed.

To address that issue first move the clearing of pm_wakeup_irq to
the point at which it is known that the interrupt causing
acpi_s2idle_wake() to tun will be discarded, before rearming the SCI
for wakeup.  Moreover, because that only reduces the size of the
time window in which the issue may manifest itself, allow
pm_system_irq_wakeup() to register two second wakeup interrupts in
a row and, when discarding the first one, replace it with the second
one.  [Of course, this assumes that only one wakeup interrupt can be
discarded in one go, but currently that is the case and I am not
aware of any plans to change that.]

Fixes: e3728b50cd ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE")
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-07 21:02:31 +01:00
Michael Kelley
6bf625a414 Drivers: hv: vmbus: Rework use of DMA_BIT_MASK(64)
Using DMA_BIT_MASK(64) as an initializer for a global variable
causes problems with Clang 12.0.1. The compiler doesn't understand
that value 64 is excluded from the shift at compile time, resulting
in a build error.

While this is a compiler problem, avoid the issue by setting up
the dma_mask memory as part of struct hv_device, and initialize
it using dma_set_mask().

Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Vitaly Chikunov <vt@altlinux.org>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 743b237c3a ("scsi: storvsc: Add Isolation VM support for storvsc driver")
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/1644176216-12531-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-02-07 17:55:30 +00:00
Arnd Bergmann
486343d372 Merge tag 'omap-for-v5.17/fixes-for-merge-window-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
Fixes for omaps

A series of fixes for omap variants for minor issues, and a fix for a timer
regression for some omap3 beagleboard versions.

The timer fix needs to patch both the dts and the timer code because
otherwise the timer quirk handling for old dtbs will prevent the dts fix
from working.

The other changes are for issues found by automated analysis, a macasp
typo fix, and two cosmetic fixes for clocks.

* tag 'omap-for-v5.17/fixes-for-merge-window-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: Don't use legacy clock defines for dra7 clkctrl
  clk: ti: Move dra7 clock devices out of the legacy section
  ARM: dts: Fix timer regression for beagleboard revision c
  ARM: dts: am335x-wega: Fix typo in mcasp property rx-num-evt
  ARM: OMAP2+: adjust the location of put_device() call in omapdss_init_of
  ARM: OMAP2+: hwmod: Add of_node_put() before break

Link: https://lore.kernel.org/r/pull-1641801310-149268@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-07 17:42:44 +01:00
Damien Le Moal
fda17afc61 ata: libata-core: Fix ata_dev_config_cpr()
The concurrent positioning ranges log page 47h is a general purpose log
page and not a subpage of the indentify device log. Using
ata_identify_page_supported() to test for concurrent positioning ranges
support is thus wrong. ata_log_supported() must be used.

Furthermore, unlike other advanced ATA features (e.g. NCQ priority),
accesses to the concurrent positioning ranges log page are not gated by
a feature bit from the device IDENTIFY data. Since many older drives
react badly to the READ LOG EXT and/or READ LOG DMA EXT commands isued
to read device log pages, avoid problems with older drives by limiting
the concurrent positioning ranges support detection to drives
implementing at least the ACS-4 ATA standard (major version 11). This
additional condition effectively turns ata_dev_config_cpr() into a nop
for older drives, avoiding problems in the field.

Fixes: fe22e1c2f7 ("libata: support concurrent positioning ranges log")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215519
Cc: stable@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Abderraouf Adjal <adjal.arf@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-02-07 22:38:02 +09:00
Linus Torvalds
d8ad2ce873 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
 "Various bug fixes for ext4 fast commit and inline data handling.

  Also fix regression introduced as part of moving to the new mount API"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  fs/ext4: fix comments mentioning i_mutex
  ext4: fix incorrect type issue during replay_del_range
  jbd2: fix kernel-doc descriptions for jbd2_journal_shrink_{scan,count}()
  ext4: fix potential NULL pointer dereference in ext4_fill_super()
  jbd2: refactor wait logic for transaction updates into a common function
  jbd2: cleanup unused functions declarations from jbd2.h
  ext4: fix error handling in ext4_fc_record_modified_inode()
  ext4: remove redundant max inline_size check in ext4_da_write_inline_data_begin()
  ext4: fix error handling in ext4_restore_inline_data()
  ext4: fast commit may miss file actions
  ext4: fast commit may not fallback for ineligible commit
  ext4: modify the logic of ext4_mb_new_blocks_simple
  ext4: prevent used blocks from being allocated during fast commit replay
2022-02-06 10:34:45 -08:00
Linus Torvalds
c3bf8a1440 Merge tag 'perf_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:

 - Intel/PT: filters could crash the kernel

 - Intel: default disable the PMU for SMM, some new-ish EFI firmware has
   started using CPL3 and the PMU CPL filters don't discriminate against
   SMM, meaning that CPL3 (userspace only) events now also count EFI/SMM
   cycles.

 - Fixup for perf_event_attr::sig_data

* tag 'perf_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/pt: Fix crash with stop filters in single-range mode
  perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures
  selftests/perf_events: Test modification of perf_event_attr::sig_data
  perf: Copy perf_event_attr::sig_data on modification
  x86/perf: Default set FREEZE_ON_SMI for all
2022-02-06 10:11:14 -08:00
Linus Torvalds
90c9e950c0 Merge tag 'for-linus-5.17a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:

 - documentation fixes related to Xen

 - enable x2apic mode when available when running as hardware
   virtualized guest under Xen

 - cleanup and fix a corner case of vcpu enumeration when running a
   paravirtualized Xen guest

* tag 'for-linus-5.17a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/Xen: streamline (and fix) PV CPU enumeration
  xen: update missing ioctl magic numers documentation
  Improve docs for IOCTL_GNTDEV_MAP_GRANT_REF
  xen: xenbus_dev.h: delete incorrect file name
  xen/x2apic: enable x2apic mode when supported for HVM
2022-02-05 10:40:17 -08:00
Linus Torvalds
5fdb26213f Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - A couple of fixes when handling an exception while a SError has
     been delivered

   - Workaround for Cortex-A510's single-step erratum

  RISC-V:

   - Make CY, TM, and IR counters accessible in VU mode

   - Fix SBI implementation version

  x86:

   - Report deprecation of x87 features in supported CPUID

   - Preparation for fixing an interrupt delivery race on AMD hardware

   - Sparse fix

  All except POWER and s390:

   - Rework guest entry code to correctly mark noinstr areas and fix
     vtime' accounting (for x86, this was already mostly correct but not
     entirely; for ARM, MIPS and RISC-V it wasn't)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointer
  KVM: x86: Report deprecated x87 features in supported CPUID
  KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
  KVM: arm64: Stop handle_exit() from handling HVC twice when an SError occurs
  KVM: arm64: Avoid consuming a stale esr value when SError occur
  RISC-V: KVM: Fix SBI implementation version
  RISC-V: KVM: make CY, TM, and IR counters accessible in VU mode
  kvm/riscv: rework guest entry logic
  kvm/arm64: rework guest entry logic
  kvm/x86: rework guest entry logic
  kvm/mips: rework guest entry logic
  kvm: add guest_state_{enter,exit}_irqoff()
  KVM: x86: Move delivery of non-APICv interrupt into vendor code
  kvm: Move KVM_GET_XSAVE2 IOCTL definition at the end of kvm.h
2022-02-05 09:55:59 -08:00
Linus Torvalds
524446e217 Merge tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fix from Darrick Wong:
 "A single bugfix for iomap.

  The fix should eliminate occasional complaints about stall warnings
  when a lot of writeback IO completes all at once and we have to then
  go clearing status on a large number of folios.

  Summary:

   - Limit the length of ioend chains in writeback so that we don't trip
     the softlockup watchdog and to limit long tail latency on clearing
     PageWriteback"

* tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs, iomap: limit individual ioend chain lengths in writeback
2022-02-05 09:04:43 -08:00
Paolo Bonzini
7e6a6b400d Merge tag 'kvmarm-fixes-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.17, take #2

- A couple of fixes when handling an exception while a SError has been
  delivered

- Workaround for Cortex-A510's single-step[ erratum
2022-02-05 00:58:25 -05:00
Linus Torvalds
494a2c2b27 Merge tag 'ata-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fixes from Damien Le Moal:

 - Sergey volunteered to be a reviewer for the Renesas R-Car SATA driver
   and PATA drivers. Update the MAINTAINERS file accordingly.

 - Regression fix: add a horkage flag to prevent accessing the log
   directory log page with SATADOM-ML 3ME SATA devices as they react
   badly to reading that log page (from Anton).

* tag 'ata-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkage
  MAINTAINERS: add myself as Renesas R-Car SATA driver reviewer
  MAINTAINERS: add myself as PATA drivers reviewer
2022-02-04 11:52:37 -08:00
Linus Torvalds
ba6ef8af0f Merge tag 'random-5.17-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:
 "For this week, we have:

   - A fix to make more frequent use of hwgenerator randomness, from
     Dominik.

   - More cleanups to the boot initialization sequence, from Dominik.

   - A fix for an old shortcoming with the ZAP ioctl, from me.

   - A workaround for a still unfixed Clang CFI/FullLTO compiler bug,
     from me. On one hand, it's a bummer to commit workarounds for
     experimental compiler features that have bugs. But on the other, I
     think this actually improves the code somewhat, independent of the
     bug. So a win-win"

* tag 'random-5.17-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: only call crng_finalize_init() for primary_crng
  random: access primary_pool directly rather than through pointer
  random: wake up /dev/random writers after zap
  random: continually use hwgenerator randomness
  lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI
2022-02-04 11:38:01 -08:00
Linus Torvalds
0a566d43c8 Merge tag 'sound-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "A collection of small fixes.

  The major changes are ASoC core fixes, addressing the DPCM locking
  issue after the recent code changes and the potentially invalid
  register accesses via control API. Also, HD-audio got a core fix for
  Oops at dynamic unbinding.

  The rest are device-specific small fixes, including the usual stuff
  like HD-audio and USB-audio quirks"

* tag 'sound-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (31 commits)
  ALSA: hda: Skip codec shutdown in case the codec is not registered
  ALSA: usb-audio: Correct quirk for VF0770
  ALSA: Replace acpi_bus_get_device()
  Input: wm97xx: Simplify resource management
  ALSA: hda/realtek: Add quirk for ASUS GU603
  ALSA: hda/realtek: Fix silent output on Gigabyte X570 Aorus Xtreme after reboot from Windows
  ALSA: hda/realtek: Fix silent output on Gigabyte X570S Aorus Master (newer chipset)
  ALSA: hda/realtek: Add missing fixup-model entry for Gigabyte X570 ALC1220 quirks
  ALSA: hda: realtek: Fix race at concurrent COEF updates
  ASoC: ops: Check for negative values before reading them
  ASoC: rt5682: Fix deadlock on resume
  ASoC: hdmi-codec: Fix OOB memory accesses
  ASoC: soc-pcm: Move debugfs removal out of spinlock
  ASoC: soc-pcm: Fix DPCM lockdep warning due to nested stream locks
  ASoC: fsl: Add missing error handling in pcm030_fabric_probe
  ALSA: hda: Fix signedness of sscanf() arguments
  ALSA: usb-audio: initialize variables that could ignore errors
  ALSA: hda: Fix UAF of leds class devs at unbinding
  ASoC: qdsp6: q6apm-dai: only stop graphs that are started
  ASoC: codecs: wcd938x: fix return value of mixer put function
  ...
2022-02-04 11:24:28 -08:00
Linus Torvalds
31462d9e47 Merge tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "Regular fixes for the week. Daniel has agreed to bring back the fbcon
  hw acceleration under a CONFIG option for the non-drm fbdev users, we
  don't advise turning this on unless you are in the niche that is old
  fbdev drivers, Since it's essentially a revert and shouldn't be high
  impact seemed like a good time to do it now.

  Otherwise, i915 and amdgpu fixes are most of it, along with some minor
  fixes elsewhere.

  fbdev:
   - readd fbcon acceleration

  i915:
   - fix DP monitor via type-c dock
   - fix for engine busyness and read timeout with GuC
   - use ALLOW_FAIL for error capture buffer allocs
   - don't use interruptible lock on error paths
   - smatch fix to reject zero sized overlays.

  amdgpu:
   - mGPU fan boost fix for beige goby
   - S0ix fixes
   - Cyan skillfish hang fix
   - DCN fixes for DCN 3.1
   - DCN fixes for DCN 3.01
   - Apple retina panel fix
   - ttm logic inversion fix

  dma-buf:
   - heaps: fix potential spectre v1 gadget

  kmb:
   - fix potential oob access

  mxsfb:
   - fix NULL ptr deref

  nouveau:
   - fix potential oob access during BIOS decode"

* tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drm: (24 commits)
  drm: mxsfb: Fix NULL pointer dereference
  drm/amdgpu: fix logic inversion in check
  drm/amd: avoid suspend on dGPUs w/ s2idle support when runtime PM enabled
  drm/amd/display: Force link_rate as LINK_RATE_RBR2 for 2018 15" Apple Retina panels
  drm/amd/display: revert "Reset fifo after enable otg"
  drm/amd/display: watermark latencies is not enough on DCN31
  drm/amd/display: Update watermark values for DCN301
  drm/amdgpu: fix a potential GPU hang on cyan skillfish
  drm/amd: Only run s3 or s0ix if system is configured properly
  drm/amd: add support to check whether the system is set to s3
  fbcon: Add option to enable legacy hardware acceleration
  Revert "fbcon: Disable accelerated scrolling"
  Revert "fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)"
  drm/i915/pmu: Fix KMD and GuC race on accessing busyness
  dma-buf: heaps: Fix potential spectre v1 gadget
  drm/amd: Warn users about potential s0ix problems
  drm/amd/pm: correct the MGpuFanBoost support for Beige Goby
  drm/nouveau: fix off by one in BIOS boundary checking
  drm/i915/adlp: Fix TypeC PHY-ready status readout
  drm/i915/pmu: Use PM timestamp instead of RING TIMESTAMP for reference
  ...
2022-02-04 11:13:54 -08:00
Linus Torvalds
f9aaa5b05e Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "10 patches.

  Subsystems affected by this patch series: ipc, MAINTAINERS, and mm
  (vmscan, debug, pagemap, kmemleak, and selftests)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner"
  MAINTAINERS: update rppt's email
  mm/kmemleak: avoid scanning potential huge holes
  ipc/sem: do not sleep with a spin lock held
  mm/pgtable: define pte_index so that preprocessor could recognize it
  mm/page_table_check: check entries at pmd levels
  mm/khugepaged: unify collapse pmd clear, flush and free
  mm/page_table_check: use unsigned long for page counters and cleanup
  mm/debug_vm_pgtable: remove pte entry from the page table
  Revert "mm/page_isolation: unset migratetype directly for non Buddy page"
2022-02-04 10:34:19 -08:00
Jason A. Donenfeld
d2a02e3c8b lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI
blake2s_compress_generic is weakly aliased by blake2s_compress. The
current harness for function selection uses a function pointer, which is
ordinarily inlined and resolved at compile time. But when Clang's CFI is
enabled, CFI still triggers when making an indirect call via a weak
symbol. This seems like a bug in Clang's CFI, as though it's bucketing
weak symbols and strong symbols differently. It also only seems to
trigger when "full LTO" mode is used, rather than "thin LTO".

[    0.000000][    T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444)
[    0.000000][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1
[    0.000000][    T0] Hardware name: MT6873 (DT)
[    0.000000][    T0] Call trace:
[    0.000000][    T0]  dump_backtrace+0xfc/0x1dc
[    0.000000][    T0]  dump_stack_lvl+0xa8/0x11c
[    0.000000][    T0]  panic+0x194/0x464
[    0.000000][    T0]  __cfi_check_fail+0x54/0x58
[    0.000000][    T0]  __cfi_slowpath_diag+0x354/0x4b0
[    0.000000][    T0]  blake2s_update+0x14c/0x178
[    0.000000][    T0]  _extract_entropy+0xf4/0x29c
[    0.000000][    T0]  crng_initialize_primary+0x24/0x94
[    0.000000][    T0]  rand_initialize+0x2c/0x6c
[    0.000000][    T0]  start_kernel+0x2f8/0x65c
[    0.000000][    T0]  __primary_switched+0xc4/0x7be4
[    0.000000][    T0] Rebooting in 5 seconds..

Nonetheless, the function pointer method isn't so terrific anyway, so
this patch replaces it with a simple boolean, which also gets inlined
away. This successfully works around the Clang bug.

In general, I'm not too keen on all of the indirection involved here; it
clearly does more harm than good. Hopefully the whole thing can get
cleaned up down the road when lib/crypto is overhauled more
comprehensively. But for now, we go with a simple bandaid.

Fixes: 6048fdcc5f ("lib/crypto: blake2s: include as built-in")
Link: https://github.com/ClangBuiltLinux/linux/issues/1567
Reported-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-04 19:22:32 +01:00
Linus Torvalds
cff7f2237c Merge tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
 "A patch to make it possible to disable zero copy path in the messenger
  to avoid checksum or authentication tag mismatches and ensuing session
  resets in case the destination buffer isn't guaranteed to be stable"

* tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client:
  libceph: optionally use bounce buffer on recv path in crc mode
  libceph: make recv path in secure mode work the same as send path
2022-02-04 09:54:02 -08:00
Linus Torvalds
633a8e8986 Merge tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "SMB3 client fixes including:

   - multiple fscache related fixes, reenabling ability to read/write to
     cached files for cifs.ko (that was temporarily disabled for cifs.ko
     a few weeks ago due to the recent fscache changes)

   - also includes a new fscache helper function ("query_occupancy")
     used by above

   - fix for multiuser mounts and NTLMSSP auth (workstation name) for
     stable

   - fix locking ordering problem in multichannel code

   - trivial malformed comment fix"

* tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix workstation_name for multiuser mounts
  Invalidate fscache cookie only when inode attributes are changed.
  cifs: Fix the readahead conversion to manage the batch when reading from cache
  cifs: Implement cache I/O by accessing the cache directly
  netfs, cachefiles: Add a method to query presence of data in the cache
  cifs: Transition from ->readpages() to ->readahead()
  cifs: unlock chan_lock before calling cifs_put_tcp_session
  Fix a warning about a malformed kernel doc comment in cifs
2022-02-04 09:34:37 -08:00
Mike Rapoport
314c459a6f mm/pgtable: define pte_index so that preprocessor could recognize it
Since commit 974b9b2c68 ("mm: consolidate pte_index() and
pte_offset_*() definitions") pte_index is a static inline and there is
no define for it that can be recognized by the preprocessor.  As a
result, vm_insert_pages() uses slower loop over vm_insert_page() instead
of insert_pages() that amortizes the cost of spinlock operations when
inserting multiple pages.

Link: https://lkml.kernel.org/r/20220111145457.20748-1-rppt@kernel.org
Fixes: 974b9b2c68 ("mm: consolidate pte_index() and pte_offset_*() definitions")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Christian Dietrich <stettberger@dokucode.de>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-04 09:25:05 -08:00
Pasha Tatashin
80110bbfbb mm/page_table_check: check entries at pmd levels
syzbot detected a case where the page table counters were not properly
updated.

  syzkaller login:  ------------[ cut here ]------------
  kernel BUG at mm/page_table_check.c:162!
  invalid opcode: 0000 [#1] PREEMPT SMP KASAN
  CPU: 0 PID: 3099 Comm: pasha Not tainted 5.16.0+ #48
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO4
  RIP: 0010:__page_table_check_zero+0x159/0x1a0
  Call Trace:
   free_pcp_prepare+0x3be/0xaa0
   free_unref_page+0x1c/0x650
   free_compound_page+0xec/0x130
   free_transhuge_page+0x1be/0x260
   __put_compound_page+0x90/0xd0
   release_pages+0x54c/0x1060
   __pagevec_release+0x7c/0x110
   shmem_undo_range+0x85e/0x1250
  ...

The repro involved having a huge page that is split due to uprobe event
temporarily replacing one of the pages in the huge page.  Later the huge
page was combined again, but the counters were off, as the PTE level was
not properly updated.

Make sure that when PMD is cleared and prior to freeing the level the
PTEs are updated.

Link: https://lkml.kernel.org/r/20220131203249.2832273-5-pasha.tatashin@soleen.com
Fixes: df4e817b71 ("mm: page table check")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-04 09:25:04 -08:00
Anton Lundin
ac9f0c8106 ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkage
06f6c4c6c3 ("ata: libata: add missing ata_identify_page_supported() calls")
introduced additional calls to ata_identify_page_supported(), thus also
adding indirectly accesses to the device log directory log page through
ata_log_supported(). Reading this log page causes SATADOM-ML 3ME devices
to lock up.

Introduce the horkage flag ATA_HORKAGE_NO_LOG_DIR to prevent accesses to
the log directory in ata_log_supported() and add a blacklist entry
with this flag for "SATADOM-ML 3ME" devices.

Fixes: 636f6e2af4 ("libata: add horkage for missing Identify Device log")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Anton Lundin <glance@acc.umu.se>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-02-04 16:44:23 +09:00
Dave Airlie
8ea2c5187d Merge tag 'drm-misc-fixes-2022-02-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
* dma-buf/heaps: Fix potential spectre v1 gadget
 * drm/kmb: Fix potential out-of-bounds access
 * drm/mxsfb: Fix NULL-pointer dereference
 * drm/nouveau: Fix potential out-of-bounds access in BIOS decoding
 * fbdev: Re-add support for fbcon hardware acceleration

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/Yfu8mTZQUNt1RwZd@linux-uq9g
2022-02-04 14:43:35 +10:00
Florian Westphal
d1ca60efc5 netfilter: ctnetlink: disable helper autoassign
When userspace, e.g. conntrackd, inserts an entry with a specified helper,
its possible that the helper is lost immediately after its added:

ctnetlink_create_conntrack
  -> nf_ct_helper_ext_add + assign helper
    -> ctnetlink_setup_nat
      -> ctnetlink_parse_nat_setup
         -> parse_nat_setup -> nfnetlink_parse_nat_setup
	                       -> nf_nat_setup_info
                                 -> nf_conntrack_alter_reply
                                   -> __nf_ct_try_assign_helper

... and __nf_ct_try_assign_helper will zero the helper again.

Set IPS_HELPER bit to bypass auto-assign logic, its unwanted, just like
when helper is assigned via ruleset.

Dropped old 'not strictly necessary' comment, it referred to use of
rcu_assign_pointer() before it got replaced by RCU_INIT_POINTER().

NB: Fixes tag intentionally incorrect, this extends the referenced commit,
but this change won't build without IPS_HELPER introduced there.

Fixes: 6714cf5465 ("netfilter: nf_conntrack: fix explicit helper attachment and NAT")
Reported-by: Pham Thanh Tuyen <phamtyn@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04 05:39:57 +01:00
Linus Torvalds
eb2eb5161c Merge tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, netfilter, and ieee802154.

  Current release - regressions:

   - Partially revert "net/smc: Add netlink net namespace support", fix
     uABI breakage

   - netfilter:
      - nft_ct: fix use after free when attaching zone template
      - nft_byteorder: track register operations

  Previous releases - regressions:

   - ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback

   - phy: qca8081: fix speeds lower than 2.5Gb/s

   - sched: fix use-after-free in tc_new_tfilter()

  Previous releases - always broken:

   - tcp: fix mem under-charging with zerocopy sendmsg()

   - tcp: add missing tcp_skb_can_collapse() test in
     tcp_shift_skb_data()

   - neigh: do not trigger immediate probes on NUD_FAILED from
     neigh_managed_work, avoid a deadlock

   - bpf: use VM_MAP instead of VM_ALLOC for ringbuf, avoid KASAN
     false-positives

   - netfilter: nft_reject_bridge: fix for missing reply from prerouting

   - smc: forward wakeup to smc socket waitqueue after fallback

   - ieee802154:
      - return meaningful error codes from the netlink helpers
      - mcr20a: fix lifs/sifs periods
      - at86rf230, ca8210: stop leaking skbs on error paths

   - macsec: add missing un-offload call for NETDEV_UNREGISTER of parent

   - ax25: add refcount in ax25_dev to avoid UAF bugs

   - eth: mlx5e:
      - fix SFP module EEPROM query
      - fix broken SKB allocation in HW-GRO
      - IPsec offload: fix tunnel mode crypto for non-TCP/UDP flows

   - eth: amd-xgbe:
      - fix skb data length underflow
      - ensure reset of the tx_timer_active flag, avoid Tx timeouts

   - eth: stmmac: fix runtime pm use in stmmac_dvr_remove()

   - eth: e1000e: handshake with CSME starts from Alder Lake platforms"

* tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  ax25: fix reference count leaks of ax25_dev
  net: stmmac: ensure PTP time register reads are consistent
  net: ipa: request IPA register values be retained
  dt-bindings: net: qcom,ipa: add optional qcom,qmp property
  tools/resolve_btfids: Do not print any commands when building silently
  bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
  net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_managed_work
  tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
  net: sparx5: do not refer to skb after passing it on
  Partially revert "net/smc: Add netlink net namespace support"
  net/mlx5e: Avoid field-overflowing memcpy()
  net/mlx5e: Use struct_group() for memcpy() region
  net/mlx5e: Avoid implicit modify hdr for decap drop rule
  net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
  net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic
  net/mlx5e: Don't treat small ceil values as unlimited in HTB offload
  net/mlx5: E-Switch, Fix uninitialized variable modact
  net/mlx5e: Fix handling of wrong devices during bond netevent
  net/mlx5e: Fix broken SKB allocation in HW-GRO
  net/mlx5e: Fix wrong calculation of header index in HW_GRO
  ...
2022-02-03 16:54:18 -08:00
Duoming Zhou
87563a043c ax25: fix reference count leaks of ax25_dev
The previous commit d01ffb9eee ("ax25: add refcount in ax25_dev
to avoid UAF bugs") introduces refcount into ax25_dev, but there
are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(),
ax25_rt_add(), ax25_rt_del() and ax25_rt_opt().

This patch uses ax25_dev_put() and adjusts the position of
ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev.

Fixes: d01ffb9eee ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 14:20:36 -08:00
Igor Pylypiv
67d6212afd Revert "module, async: async_synchronize_full() on module init iff async is used"
This reverts commit 774a1221e8.

We need to finish all async code before the module init sequence is
done.  In the reverted commit the PF_USED_ASYNC flag was added to mark a
thread that called async_schedule().  Then the PF_USED_ASYNC flag was
used to determine whether or not async_synchronize_full() needs to be
invoked.  This works when modprobe thread is calling async_schedule(),
but it does not work if module dispatches init code to a worker thread
which then calls async_schedule().

For example, PCI driver probing is invoked from a worker thread based on
a node where device is attached:

	if (cpu < nr_cpu_ids)
		error = work_on_cpu(cpu, local_pci_probe, &ddi);
	else
		error = local_pci_probe(&ddi);

We end up in a situation where a worker thread gets the PF_USED_ASYNC
flag set instead of the modprobe thread.  As a result,
async_synchronize_full() is not invoked and modprobe completes without
waiting for the async code to finish.

The issue was discovered while loading the pm80xx driver:
(scsi_mod.scan=async)

modprobe pm80xx                      worker
...
  do_init_module()
  ...
    pci_call_probe()
      work_on_cpu(local_pci_probe)
                                     local_pci_probe()
                                       pm8001_pci_probe()
                                         scsi_scan_host()
                                           async_schedule()
                                           worker->flags |= PF_USED_ASYNC;
                                     ...
      < return from worker >
  ...
  if (current->flags & PF_USED_ASYNC) <--- false
  	async_synchronize_full();

Commit 21c3c5d280 ("block: don't request module during elevator init")
fixed the deadlock issue which the reverted commit 774a1221e8
("module, async: async_synchronize_full() on module init iff async is
used") tried to fix.

Since commit 0fdff3ec6d ("async, kmod: warn on synchronous
request_module() from async workers") synchronous module loading from
async is not allowed.

Given that the original deadlock issue is fixed and it is no longer
allowed to call synchronous request_module() from async we can remove
PF_USED_ASYNC flag to make module init consistently invoke
async_synchronize_full() unless async module probe is requested.

Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03 11:20:34 -08:00
Ritesh Harjani
4f98186848 jbd2: refactor wait logic for transaction updates into a common function
No functionality change as such in this patch. This only refactors the
common piece of code which waits for t_updates to finish into a common
function named as jbd2_journal_wait_updates(journal_t *)

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/8c564f70f4b2591171677a2a74fccb22a7b6c3a4.1642416995.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-02-03 10:57:44 -05:00