linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-01 06:04:48 -04:00

Go to file

Adrian Huang 409faf8c97 mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area

When running the vmalloc stress on a 448-core system, observe the average
latency of purge_vmap_node() is about 2 seconds by using the eBPF/bcc
'funclatency.py' tool [1].

  # /your-git-repo/bcc/tools/funclatency.py -u purge_vmap_node & pid1=$! && sleep 8 && modprobe test_vmalloc nr_threads=$(nproc) run_test_mask=0x7; kill -SIGINT $pid1

     usecs             : count    distribution
        0 -> 1         : 0       |                                        |
        2 -> 3         : 29      |                                        |
        4 -> 7         : 19      |                                        |
        8 -> 15        : 56      |                                        |
       16 -> 31        : 483     |****                                    |
       32 -> 63        : 1548    |************                            |
       64 -> 127       : 2634    |*********************                   |
      128 -> 255       : 2535    |*********************                   |
      256 -> 511       : 1776    |**************                          |
      512 -> 1023      : 1015    |********                                |
     1024 -> 2047      : 573     |****                                    |
     2048 -> 4095      : 488     |****                                    |
     4096 -> 8191      : 1091    |*********                               |
     8192 -> 16383     : 3078    |*************************               |
    16384 -> 32767     : 4821    |****************************************|
    32768 -> 65535     : 3318    |***************************             |
    65536 -> 131071    : 1718    |**************                          |
   131072 -> 262143    : 2220    |******************                      |
   262144 -> 524287    : 1147    |*********                               |
   524288 -> 1048575   : 1179    |*********                               |
  1048576 -> 2097151   : 822     |******                                  |
  2097152 -> 4194303   : 906     |*******                                 |
  4194304 -> 8388607   : 2148    |*****************                       |
  8388608 -> 16777215  : 4497    |*************************************   |
 16777216 -> 33554431  : 289     |**                                      |

  avg = 2041714 usecs, total: 78381401772 usecs, count: 38390

  The worst case is over 16-33 seconds, so soft lockup is triggered [2].

[Root Cause]
1) Each purge_list has the long list. The following shows the number of
   vmap_area is purged.

   crash> p vmap_nodes
   vmap_nodes = $27 = (struct vmap_node *) 0xff2de5a900100000
   crash> vmap_node 0xff2de5a900100000 128 | grep nr_purged
     nr_purged = 663070
     ...
     nr_purged = 821670
     nr_purged = 692214
     nr_purged = 726808
     ...

2) atomic_long_sub() employs the 'lock' prefix to ensure the atomic
   operation when purging each vmap_area. However, the iteration is over
   600000 vmap_area (See 'nr_purged' above).

   Here is objdump output:

     $ objdump -D vmlinux
     ffffffff813e8c80 <purge_vmap_node>:
     ...
     ffffffff813e8d70:  f0 48 29 2d 68 0c bb  lock sub %rbp,0x2bb0c68(%rip)
     ...

   Quote from "Instruction tables" pdf file [3]:
     Instructions with a LOCK prefix have a long latency that depends on
     cache organization and possibly RAM speed. If there are multiple
     processors or cores or direct memory access (DMA) devices, then all
     locked instructions will lock a cache line for exclusive access,
     which may involve RAM access. A LOCK prefix typically costs more
     than a hundred clock cycles, even on single-processor systems.

   That's why the latency of purge_vmap_node() dramatically increases
   on a many-core system: One core is busy on purging each vmap_area of
   the *long* purge_list and executing atomic_long_sub() for each
   vmap_area, while other cores free vmalloc allocations and execute
   atomic_long_add_return() in free_vmap_area_noflush().

[Solution]
Employ a local variable to record the total purged pages, and execute
atomic_long_sub() after the traversal of the purge_list is done. The
experiment result shows the latency improvement is 99%.

[Experiment Result]
1) System Configuration: Three servers (with HT-enabled) are tested.
     * 72-core server: 3rd Gen Intel Xeon Scalable Processor*1
     * 192-core server: 5th Gen Intel Xeon Scalable Processor*2
     * 448-core server: AMD Zen 4 Processor*2

2) Kernel Config
     * CONFIG_KASAN is disabled

3) The data in column "w/o patch" and "w/ patch"
     * Unit: micro seconds (us)
     * Each data is the average of 3-time measurements

         System        w/o patch (us)   w/ patch (us)    Improvement (%)
     ---------------   --------------   -------------    -------------
     72-core server          2194              14            99.36%
     192-core server       143799            1139            99.21%
     448-core server      1992122            6883            99.65%

[1] https://github.com/iovisor/bcc/blob/master/tools/funclatency.py
[2] https://gist.github.com/AdrianHuang/37c15f67b45407b83c2d32f918656c12
[3] https://www.agner.org/optimize/instruction_tables.pdf

Link: https://lkml.kernel.org/r/20240829130633.2184-1-ahuang12@lenovo.com
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

2024-09-01 17:59:02 -07:00

arch

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

2024-09-01 09:18:48 +12:00

block

block: fix detection of unsupported WRITE SAME in blkdev_issue_write_zeroes

2024-08-28 08:49:25 -06:00

certs

kbuild: use $(src) instead of $(srctree)/$(src) for source directory

2024-05-10 04:34:52 +09:00

crypto

crypto: testmgr - generate power-of-2 lengths more often

2024-07-13 11:50:28 +12:00

Documentation

mm/memcontrol: respect zswap.writeback setting from parent cg too

2024-09-01 17:59:02 -07:00

drivers

Merge tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

2024-09-01 09:07:44 +12:00

nilfs2: fix state management in error path of log writing function

2024-09-01 17:59:00 -07:00

include

Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

2024-09-01 15:49:26 +12:00

init

Merge tag 'rust-fixes-6.11' of https://github.com/Rust-for-Linux/linux

2024-08-16 11:24:06 -07:00

io_uring

io_uring/kbuf: return correct iovec count from classic buffer peek

2024-08-30 10:45:54 -06:00

ipc

sysctl: treewide: constify the ctl_table argument of proc_handlers

2024-07-24 20:59:29 +02:00

kernel

kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y

2024-09-01 17:59:01 -07:00

lib

maple_tree: remove rcu_read_lock() from mt_validate()

2024-09-01 17:59:01 -07:00

LICENSES

LICENSES: Add the copyleft-next-0.3.1 license

2022-11-08 15:44:01 +01:00

mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area

2024-09-01 17:59:02 -07:00

net

Merge tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

2024-08-30 06:14:39 +12:00

rust

Merge tag 'rust-fixes-6.11' of https://github.com/Rust-for-Linux/linux

2024-08-16 11:24:06 -07:00

samples

treewide: remove unnecessary <linux/version.h> inclusion

2024-08-12 18:36:44 +09:00

scripts

scripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum

2024-09-01 17:59:01 -07:00

security

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

2024-09-01 09:18:48 +12:00

sound

Merge tag 'sound-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

2024-08-28 06:24:22 +12:00

tools

selftests: mm: fix build errors on armhf

2024-09-01 17:58:59 -07:00

usr

initramfs: shorten cmd_initfs in usr/Makefile

2024-07-16 01:07:52 +09:00

virt

KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)

2024-08-14 12:28:24 -04:00

.clang-format

Docs: Move clang-format from process/ to dev-tools/

2024-06-26 16:36:00 -06:00

.cocciconfig

…

.editorconfig

.editorconfig: remove trim_trailing_whitespace option

2024-06-13 16:47:52 +02:00

.get_maintainer.ignore

Add Jeff Kirsher to .get_maintainer.ignore

2024-03-08 11:36:54 +00:00

.gitattributes

.gitattributes: set diff driver for Rust source code files

2023-05-31 17:48:25 +02:00

.gitignore

kbuild: add script and target to generate pacman package

2024-07-22 01:24:22 +09:00

.mailmap

mailmap: update entry for Jan Kuliga

2024-09-01 17:59:02 -07:00

.rustfmt.toml

rust: add .rustfmt.toml

2022-09-28 09:02:20 +02:00

COPYING

COPYING: state that all contributions really are covered by this file

2020-02-10 13:32:20 -08:00

CREDITS

Merge tag 'trace-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

2024-07-18 14:08:42 -07:00

Kbuild

Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

2022-10-10 12:00:45 -07:00

Kconfig

kbuild: ensure full rebuild when the compiler is updated

2020-05-12 13:28:33 +09:00

MAINTAINERS

Merge tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

2024-09-01 07:06:28 +12:00

Makefile

Linux 6.11-rc6

2024-09-01 19:46:02 +12:00

README

README: Fix spelling

2024-03-18 03:36:32 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.

Languages

C 97%

Assembly 1%

Shell 0.6%

Rust 0.5%

Python 0.4%

Other 0.3%