Previously variable 'tmp' was initialized, but was not read later before
reassigning. So the initialization can be removed.
[akpm@linux-foundation.org: remove `tmp' altogether]
Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200904132422.17387-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In has_unmovable_pages(), the page parameter would not always be the first
page within a pageblock (see how the page pointer is passed in from
start_isolate_page_range() after call __first_valid_page()), so that would
cause checking unmovable pages span two pageblocks.
After this patch, the checking is enforced within one pageblock no matter
the page is first one or not, and obey the semantics of this function.
This issue is found by code inspection.
Michal said "this might lead to false negatives when an unrelated block
would cause an isolation failure".
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Link: https://lkml.kernel.org/r/20200824065811.383266-1-lixinhai.lxh@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When introducing virtio-mem, the semantics of ZONE_MOVABLE were rather
unclear, which is why we special-cased ZONE_MOVABLE such that partially
plugged blocks would never end up in ZONE_MOVABLE.
Now that the semantics are much clearer (and will be documented in a
follow-up patch including the new virtio-mem behavior), let's allow to
online partially plugged memory blocks to ZONE_MOVABLE and also consider
memory blocks that were onlined to ZONE_MOVABLE when unplugging memory.
While unplugged memory pages are, in general, unmovable, they can be
skipped when offlining memory.
virtio-mem only unplugs fairly big chunks (in the megabyte range) and
rather tries to shrink the memory region than randomly choosing memory.
In theory, if all other pages in the movable zone would be movable,
virtio-mem would only shrink that zone and not create any kind of
fragmentation.
In the future, we might want to remember the zone again and use the
information when (un)plugging memory. For now, let's keep it simple.
Note: Support for defragmentation is planned, to deal with fragmentation
after unplug due to memory chunks within memory blocks that could not get
unplugged before (e.g., somebody pinning pages within ZONE_MOVABLE for a
longer time).
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/20200816125333.7434-6-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Inside has_unmovable_pages(), we have a comment describing how unmovable
data could end up in ZONE_MOVABLE - via "movablecore". Also, besides
checking if the first page in the pageblock is reserved, we don't perform
any further checks in case of ZONE_MOVABLE.
In case of memory offlining, we set REPORT_FAILURE, properly dump_page()
the page and handle the error gracefully. alloc_contig_pages() users
currently never allocate from ZONE_MOVABLE. E.g., hugetlb uses
alloc_contig_pages() for the allocation of gigantic pages only, which will
never end up on the MOVABLE zone (see htlb_alloc_mask()).
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/20200816125333.7434-4-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm / virtio-mem: support ZONE_MOVABLE", v5.
When introducing virtio-mem, the semantics of ZONE_MOVABLE were rather
unclear, which is why we special-cased ZONE_MOVABLE such that partially
plugged blocks would never end up in ZONE_MOVABLE.
Now that the semantics are much clearer (and are documented in patch #6),
let's support partially plugged memory blocks in ZONE_MOVABLE, allowing
partially plugged memory blocks to be online to ZONE_MOVABLE and also
unplugging from such memory blocks. This avoids surprises when onlining
of memory blocks suddenly fails, just because they are not completely
populated by virtio-mem (yet).
This is especially helpful for testing, but also paves the way for
virtio-mem optimizations, allowing more memory to get reliably unplugged.
Cleanup has_unmovable_pages() and set_migratetype_isolate(), providing
better documentation of how ZONE_MOVABLE interacts with different kind of
unmovable pages (memory offlining vs. alloc_contig_range()).
This patch (of 6):
Let's move the split comment regarding bootmem allocations and memory
holes, especially in the context of ZONE_MOVABLE, to the PageReserved()
check.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/20200816125333.7434-1-david@redhat.com
Link: http://lkml.kernel.org/r/20200816125333.7434-2-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of converting adjust_next between bytes and pages number, let's
just store the virtual address into adjust_next.
Also, this patch fixes one typo in the comment of vma_adjust_trans_huge().
[vbabka@suse.cz: changelog tweak]
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Link: http://lkml.kernel.org/r/20200828081031.11306-1-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Fix PageDoubleMap".
This is a purely theoretical problem for now as none of the filesystems
which use PG_private_2 (ie PG_fscache) are being converted at this time,
but it's confusing to leave it like this.
This patch (of 2):
PG_private_2 is defined as being PF_ANY (applicable to tail pages as well
as regular & head pages). That means that the first tail page of a
double-map page will appear to have Private2 set. Use the Workingset bit
instead which is defined as PF_HEAD so any attempt to access the
Workingset bit on a tail page will redirect to the head page's Workingset
bit.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Link: https://lkml.kernel.org/r/20200629151933.15671-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20200629151933.15671-2-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Try to release mmap_lock temporarily in smaps_rollup", v4.
Recently, we have observed some janky issues caused by unpleasantly long
contention on mmap_lock which is held by smaps_rollup when probing large
processes. To address the problem, we let smaps_rollup detect if anyone
wants to acquire mmap_lock for write attempts. If yes, just release the
lock temporarily to ease the contention.
smaps_rollup is a procfs interface which allows users to summarize the
process's memory usage without the overhead of seq_* calls. Android uses
it to sample the memory usage of various processes to balance its memory
pool sizes. If no one wants to take the lock for write requests,
smaps_rollup with this patch will behave like the original one.
Although there are on-going mmap_lock optimizations like range-based
locks, the lock applied to smaps_rollup would be the coarse one, which is
hard to avoid the occurrence of aforementioned issues. So the detection
and temporary release for write attempts on mmap_lock in smaps_rollup is
still necessary.
This patch (of 3):
Add new API to query if someone wants to acquire mmap_lock for write
attempts.
Using this instead of rwsem_is_contended makes it more tolerant of future
changes to the lock type.
Signed-off-by: Chinwen Chang <chinwen.chang@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Michel Lespinasse <walken@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jimmy Assarsson <jimmyassarsson@gmail.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/1597715898-3854-1-git-send-email-chinwen.chang@mediatek.com
Link: http://lkml.kernel.org/r/1597715898-3854-2-git-send-email-chinwen.chang@mediatek.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid accidental wrong builds, due to built-in rules working just a little
bit too well--but not quite as well as required for our situation here.
In other words, "make userfaultfd" (for example) is supposed to fail to
build at all, because this Makefile only supports either "make" (all), or
"make /full/path". However, the built-in rules, if not suppressed, will
pick up CFLAGS and the initial LDLIBS (but not the target-specific LDLIBS,
because those are only set for the full path target!). This causes it to
get pretty far into building things despite using incorrect values such as
an *occasionally* incomplete LDLIBS value.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Link: https://lkml.kernel.org/r/20200915012901.1655280-3-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "selftests/vm: fix some minor aggravating factors in the Makefile".
This fixes a couple of minor aggravating factors that I ran across while
trying to do some changes in selftests/vm. These are simple things, but
like most things with GNU Make, it's rarely obvious what's wrong until you
understand *the entire Makefile and all of its includes*.
So while there is, of course, joy in learning those details, I thought I'd
fix these little things, so as to allow others to skip out on the Joy if
they so choose. :)
First of all, if you have an item (let's choose userfaultfd for an
example) that fails to build, you might do this:
$ make -j32
# ...you observe a failed item in the threaded output
# OK, let's get a closer look
$ make
# ...but now the build quietly "succeeds".
That's what Patch 0001 fixes.
Second, if you instead attempt this approach for your closer look (a casual
mistake, as it's not supported):
$ make userfaultfd
# ...userfaultfd fails to link, due to incomplete LDLIBS
That's what Patch 0002 fixes.
This patch (of 2):
If one or more of these selftest fail to build, then after the first
failure, subsequent invocations of "make" will make it appear that there
are no build failures, after all.
That's because the failed build products remain, with up-to-date
timestamps, thus tricking Make (and you!) into believing that there's
nothing else to build.
Fix this by telling Make to delete targets that didn't completely
succeed.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Link: https://lkml.kernel.org/r/20200915012901.1655280-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20200915012901.1655280-2-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The code in mc_handle_swap_pte() checks for non_swap_entry() and returns
NULL before checking is_device_private_entry() so device private pages are
never handled. Fix this by checking for non_swap_entry() after handling
device private swap PTEs.
I assume the memory cgroup accounting would be off somehow when moving
a process to another memory cgroup. Currently, the device private page
is charged like a normal anonymous page when allocated and is uncharged
when the page is freed so I think that path is OK.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Link: https://lkml.kernel.org/r/20201009215952.2726-1-rcampbell@nvidia.com
xFixes: c733a82874 ("mm/memcontrol: support MEMORY_DEVICE_PRIVATE")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>