linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-04 08:04:24 -04:00

Go to file

Oscar Salvador c8e28b47af mm,page_alloc: bail out earlier on -ENOMEM in alloc_contig_migrate_range

Patch series "Make alloc_contig_range handle Hugetlb pages", v10.

alloc_contig_range lacks the ability to handle HugeTLB pages.  This can
be problematic for some users, e.g: CMA and virtio-mem, where those
users will fail the call if alloc_contig_range ever sees a HugeTLB page,
even when those pages lay in ZONE_MOVABLE and are free.  That problem
can be easily solved by replacing the page in the free hugepage pool.

In-use HugeTLB are no exception though, as those can be isolated and
migrated as any other LRU or Movable page.

This aims to improve alloc_contig_range->isolate_migratepages_block, so
that HugeTLB pages can be recognized and handled.

Since we also need to start reporting errors down the chain (e.g:
-ENOMEM due to not be able to allocate a new hugetlb page),
isolate_migratepages_{range,block} interfaces need to change to start
reporting error codes instead of the pfn == 0 vs pfn != 0 scheme it is
using right now.  From now on, isolate_migratepages_block will not
return the next pfn to be scanned anymore, but -EINTR, -ENOMEM or 0, so
we the next pfn to be scanned will be recorded in cc->migrate_pfn field
(as it is already done in isolate_migratepages_range()).

Below is an insight from David (thanks), where the problem can clearly be
seen:

 "Start a VM with 4G. Hotplug 1G via virtio-mem and online it to
  ZONE_MOVABLE. Allocate 512 huge pages.

  [root@localhost ~]# cat /proc/meminfo
  MemTotal:        5061512 kB
  MemFree:         3319396 kB
  MemAvailable:    3457144 kB
  ...
  HugePages_Total:     512
  HugePages_Free:      512
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:       2048 kB

  The huge pages get partially allocate from ZONE_MOVABLE. Try unplugging
  1G via virtio-mem (remember, all ZONE_MOVABLE). Inside the guest:

  [  180.058992] alloc_contig_range: [1b8000, 1c0000) PFNs busy
  [  180.060531] alloc_contig_range: [1b8000, 1c0000) PFNs busy
  [  180.061972] alloc_contig_range: [1b8000, 1c0000) PFNs busy
  [  180.063413] alloc_contig_range: [1b8000, 1c0000) PFNs busy
  [  180.064838] alloc_contig_range: [1b8000, 1c0000) PFNs busy
  [  180.065848] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
  [  180.066794] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
  [  180.067738] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
  [  180.068669] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
  [  180.069598] alloc_contig_range: [1bfc00, 1c0000) PFNs busy"

And then with this patchset running:

 "Same experiment with ZONE_MOVABLE:

  a) Free huge pages: all memory can get unplugged again.

  b) Allocated/populated but idle huge pages: all memory can get unplugged
     again.

  c) Allocated/populated but all 512 huge pages are read/written in a
     loop: all memory can get unplugged again, but I get a single

     [  121.192345] alloc_contig_range: [180000, 188000) PFNs busy

     Most probably because it happened to try migrating a huge page
     while it was busy.  As virtio-mem retries on ZONE_MOVABLE a couple of
     times, it can deal with this temporary failure.

  Last but not least, I did something extreme:

  # cat /proc/meminfo
  MemTotal:        5061568 kB
  MemFree:          186560 kB
  MemAvailable:     354524 kB
  ...
  HugePages_Total:    2048
  HugePages_Free:     2048
  HugePages_Rsvd:        0
  HugePages_Surp:        0

  Triggering unplug would require to dissolve+alloc - which now fails
  when trying to allocate an additional ~512 huge pages (1G).

  As expected, I can properly see memory unplug not fully succeeding.  +
  I get a fairly continuous stream of

  [  226.611584] alloc_contig_range: [19f400, 19f800) PFNs busy
  ...

  But more importantly, the hugepage count remains stable, as configured
  by the admin (me):

  HugePages_Total:    2048
  HugePages_Free:     2048
  HugePages_Rsvd:        0
  HugePages_Surp:        0"

This patch (of 7):

Currently, __alloc_contig_migrate_range can generate -EINTR, -ENOMEM or
-EBUSY, and report them down the chain.  The problem is that when
migrate_pages() reports -ENOMEM, we keep going till we exhaust all the
try-attempts (5 at the moment) instead of bailing out.

migrate_pages() bails out right away on -ENOMEM because it is considered a
fatal error.  Do the same here instead of keep going and retrying.  Note
that this is not fixing a real issue, just a cosmetic change.  Although we
can save some cycles by backing off ealier

Link: https://lkml.kernel.org/r/20210419075413.1064-1-osalvador@suse.de
Link: https://lkml.kernel.org/r/20210419075413.1064-2-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2021-05-05 11:27:22 -07:00

arch

mm: generalize HUGETLB_PAGE_SIZE_VARIABLE

2021-05-05 11:27:20 -07:00

block

cgroup: rstat: punt root-level optimization to individual controllers

2021-04-30 11:20:37 -07:00

certs

certs: add 'x509_revocation_list' to gitignore

2021-04-26 10:48:07 -07:00

crypto

Merge tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-block

2021-04-28 14:39:37 -07:00

Documentation

mm/mmzone.h: fix existing kernel-doc comments and link them to core-api

2021-04-30 11:20:43 -07:00

drivers

mm/vmalloc: remove unmap_kernel_range

2021-04-30 11:20:40 -07:00

mm/hugetlb: remove unused variable pseudo_vma in remove_inode_hugepages()

2021-05-05 11:27:21 -07:00

include

hugetlb: add per-hstate mutex to synchronize user adjustments

2021-05-05 11:27:22 -07:00

init

mm: move mem_init_print_info() into mm_init()

2021-04-30 11:20:42 -07:00

ipc

fs: make helpers idmap mount aware

2021-01-24 14:27:20 +01:00

kernel

irq_work: record irq_work_queue() call stack

2021-04-30 11:20:42 -07:00

lib

kasan: detect false-positives in tests

2021-04-30 11:20:42 -07:00

LICENSES

LICENSES: Add the CC-BY-4.0 license

2020-12-08 10:33:27 -07:00

mm,page_alloc: bail out earlier on -ENOMEM in alloc_contig_migrate_range

2021-05-05 11:27:22 -07:00

net

net: page_pool: use alloc_pages_bulk in refill code path

2021-04-30 11:20:43 -07:00

samples

samples/vfio-mdev/mdpy: use remap_vmalloc_range

2021-04-30 11:20:39 -07:00

scripts

scripts: a new script for checking duplicate struct declaration

2021-04-30 11:20:35 -07:00

security

Merge tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

2021-04-29 11:57:23 -07:00

sound

Merge tag 'mfd-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

2021-04-28 15:59:13 -07:00

tools

mm: huge_memory: debugfs for file-backed THP split

2021-05-05 11:27:21 -07:00

usr

Merge tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

2021-02-25 10:17:31 -08:00

virt

KVM: x86/mmu: Consider the hva in mmu_notifier retry

2021-02-22 13:16:53 -05:00

.clang-format

Merge tag 'cxl-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

2021-02-24 09:38:36 -08:00

.cocciconfig

scripts: add Linux .cocciconfig for coccinelle

2016-07-22 12:13:39 +02:00

.get_maintainer.ignore

Opt out of scripts/get_maintainer.pl

2019-05-16 10:53:40 -07:00

.gitattributes

.gitattributes: use 'dts' diff driver for dts files

2019-12-04 19:44:11 -08:00

.gitignore

kbuild: generate Module.symvers only when vmlinux exists

2021-04-25 05:17:02 +09:00

.mailmap

Merge tag 'docs-5.13' of git://git.lwn.net/linux

2021-04-26 13:22:43 -07:00

COPYING

COPYING: state that all contributions really are covered by this file

2020-02-10 13:32:20 -08:00

CREDITS

Merge tag 'mfd-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

2021-04-28 15:59:13 -07:00

Kbuild

kbuild: rename hostprogs-y/always to hostprogs/always-y

2020-02-04 01:53:07 +09:00

Kconfig

kbuild: ensure full rebuild when the compiler is updated

2020-05-12 13:28:33 +09:00

MAINTAINERS

MAINTAINERS: assign pagewalk.h to MEMORY MANAGEMENT

2021-04-30 11:20:41 -07:00

Makefile

Merge tag 'kconfig-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

2021-04-29 14:32:00 -07:00

README

Drop all 00-INDEX files from Documentation/

2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.

Languages

C 97%

Assembly 1%

Shell 0.6%

Rust 0.5%

Python 0.4%

Other 0.3%