It builds a test program and use it to verify pipe behavior with perf
record, inject and report.
$ perf test pipe -v
80: perf pipe recording and injection test :
--- start ---
test child forked, pid 1109301
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
1109315 1109315 -1 |test.file.MGNff
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
99.99% test.file.MGNff test.file.MGNffM [.] noploop
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
99.99% test.file.MGNff test.file.MGNffM [.] noploop
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.153 MB /tmp/perf.data.dmsnlx (3995 samples) ]
99.99% test.file.MGNff test.file.MGNffM [.] noploop
test child finished with 0
---- end ----
perf pipe recording and injection test: Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210719223153.1618812-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the input is a regular file but the output is a pipe, it should
write a pipe header. But just repiping would write a portion of the
existing header which is different in 'size' value. So we need to
prevent it and write a new pipe header along with other information
like event attributes and features.
This can handle something like this:
# perf record -a -B sleep 1
# perf inject -b -i perf.data | perf report -i -
Factor out perf_event__synthesize_for_pipe() to be shared between perf
record and inject.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210719223153.1618812-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes it needs to save the perf inject data to a file for debugging.
But normally it assumes the same format for input and output, so the end
result cannot be used due to a broken format.
# perf record -a -o - sleep 1 | perf inject -b -o my.data
# perf report -i my.data --stdio
0x208 [0]: failed to process type: 0 [Invalid argument]
Error:
failed to process sample
# To display the perf.data header info, please use --header/--header-only options.
#
In this case, it thought the data has a regular file header since the
output is not a pipe. But actually it doesn't have one and has a pipe
file header. At the end of the session, it tries to rewrite the regular
file header with updated features and it overwrites the data just
follows the pipe header.
Fix it by checking either the input and the output is a pipe.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210719223153.1618812-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
https://github.com/beaker-project/restraint/issues/215 describes a file
descriptor leak which revealed the test failure described here.
The 'DSO data reopen' perf test assumes that RLIMIT_NOFILE limits the
number of open file descriptors, but it actually limits newly opened
file descriptors. When the file descriptor limit is reduced, file
descriptors already open remain open regardless of the new limit. This
test failure does not occur if open file descriptors are contiguous,
beginning at zero.
The following command triggers this perf test failure.
perf test 'DSO data reopen' 3>/dev/null 8>/dev/null
This patch determines the file descriptor limit by opening four files
and then closing them. The limit is set to the fourth file descriptor,
leaving only the first three available because any newly opened file
descriptor must be less than the limit.
Signed-off-by: Eirik Fuller <efuller@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Michael Petlan <mpetlan@redhat.com>
LPU-Reference: 20210626023825.1398547-1-efuller@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On my aarch64 big endian machine, the perf annotate does not work.
# perf annotate
Percent | Source code & Disassembly of [kernel.kallsyms] for cycles (253 samples, percent: local period)
--------------------------------------------------------------------------------------------------------------
Percent | Source code & Disassembly of [kernel.kallsyms] for cycles (1 samples, percent: local period)
------------------------------------------------------------------------------------------------------------
Percent | Source code & Disassembly of [kernel.kallsyms] for cycles (47 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------
...
This is because the arch_find() function uses the normalized architecture
name provided by normalize_arch(), and my machine's architecture name
aarch64_be is not normalized to arm64. Like other architectures such as
arm and powerpc, we can fuzzy match the architecture names associated with
aarch64.* and normalize them.
It seems that there is also arm64_be architecture name, which we also
normalize to arm64.
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dengcheng Zhu <dzhu@wavecomp.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Zhang Jinhao <zhangjinhao2@huawei.com>
Link: http //lore.kernel.org/lkml/20210726123854.13463-1-lihuafei1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Previously the code would see if, for example,
tools/perf/arch/arm/include/uapi/asm/errno.h exists and if not generate
a "generic" switch statement using the asm-generic/errno.h.
This creates multiple identical "generic" switch statements before the
default generic switch statement for an unknown architecture.
By simplifying the archlist to be only for architectures that are not
"generic" the amount of generated code can be reduced from 14 down to 6
functions.
Remove the special case of x86, instead reverse the architecture names
so that it comes first.
Committer testing:
$ tools/perf/trace/beauty/arch_errno_names.sh gcc tools > before
Apply this patch and:
$ tools/perf/trace/beauty/arch_errno_names.sh gcc tools > after
14 arches down to 6, that are the ones with an explicit errno.h file:
$ ls -1 tools/arch/*/include/uapi/asm/errno.h
tools/arch/alpha/include/uapi/asm/errno.h
tools/arch/mips/include/uapi/asm/errno.h
tools/arch/parisc/include/uapi/asm/errno.h
tools/arch/powerpc/include/uapi/asm/errno.h
tools/arch/sparc/include/uapi/asm/errno.h
tools/arch/x86/include/uapi/asm/errno.h
$
$ diff -u4 before after
@@ -2099,32 +987,16 @@
const char *arch_syscalls__strerrno(const char *arch, int err)
{
if (!strcmp(arch, "x86"))
return errno_to_name__x86(err);
- if (!strcmp(arch, "alpha"))
- return errno_to_name__alpha(err);
- if (!strcmp(arch, "arc"))
- return errno_to_name__arc(err);
- if (!strcmp(arch, "arm"))
- return errno_to_name__arm(err);
- if (!strcmp(arch, "arm64"))
- return errno_to_name__arm64(err);
- if (!strcmp(arch, "csky"))
- return errno_to_name__csky(err);
- if (!strcmp(arch, "mips"))
- return errno_to_name__mips(err);
- if (!strcmp(arch, "parisc"))
- return errno_to_name__parisc(err);
- if (!strcmp(arch, "powerpc"))
- return errno_to_name__powerpc(err);
- if (!strcmp(arch, "riscv"))
- return errno_to_name__riscv(err);
- if (!strcmp(arch, "s390"))
- return errno_to_name__s390(err);
- if (!strcmp(arch, "sh"))
- return errno_to_name__sh(err);
if (!strcmp(arch, "sparc"))
return errno_to_name__sparc(err);
- if (!strcmp(arch, "xtensa"))
- return errno_to_name__xtensa(err);
+ if (!strcmp(arch, "powerpc"))
+ return errno_to_name__powerpc(err);
+ if (!strcmp(arch, "parisc"))
+ return errno_to_name__parisc(err);
+ if (!strcmp(arch, "mips"))
+ return errno_to_name__mips(err);
+ if (!strcmp(arch, "alpha"))
+ return errno_to_name__alpha(err);
return errno_to_name__generic(err);
}
The rest of the patch is the removal of the errno_to_name__generic()
unneeded clones.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210513060441.408507-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Revert "perf map: Fix dso->nsinfo refcounting", this makes 'perf top'
abort, uncovering a design flaw on how namespace information is kept.
The fix for that is more than we can do right now, leave it for the
next merge window.
- Split --dump-raw-trace by AUX records for ARM's CoreSight, fixing up
the decoding of some records.
- Fix PMU alias matching.
Thanks to James Clark and John Garry for these fixes.
* tag 'perf-tools-fixes-for-v5.14-2021-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
Revert "perf map: Fix dso->nsinfo refcounting"
perf pmu: Fix alias matching
perf cs-etm: Split --dump-raw-trace by AUX records
Pull powerpc fixes from Michael Ellerman:
- Don't use r30 in VDSO code, to avoid breaking existing Go lang
programs.
- Change an export symbol to allow non-GPL modules to use spinlocks
again.
Thanks to Paul Menzel, and Srikar Dronamraju.
* tag 'powerpc-5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/vdso: Don't use r30 to avoid breaking Go lang
powerpc/pseries: Fix regression while building external modules
Pull xfs fixes from Darrick Wong:
"This contains a bunch of bug fixes in XFS.
Dave and I have been busy the last couple of weeks to find and fix as
many log recovery bugs as we can find; here are the results so far. Go
fstests -g recoveryloop! ;)
- Fix a number of coordination bugs relating to cache flushes for
metadata writeback, cache flushes for multi-buffer log writes, and
FUA writes for single-buffer log writes
- Fix a bug with incorrect replay of attr3 blocks
- Fix unnecessary stalls when flushing logs to disk
- Fix spoofing problems when recovering realtime bitmap blocks"
* tag 'xfs-5.14-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: prevent spoofing of rtbitmap blocks when recovering buffers
xfs: limit iclog tail updates
xfs: need to see iclog flags in tracing
xfs: Enforce attr3 buffer recovery order
xfs: logging the on disk inode LSN can make it go backwards
xfs: avoid unnecessary waits in xfs_log_force_lsn()
xfs: log forces imply data device cache flushes
xfs: factor out forced iclog flushes
xfs: fix ordering violation between cache flushes and tail updates
xfs: fold __xlog_state_release_iclog into xlog_state_release_iclog
xfs: external logs need to flush data device
xfs: flush data dev on external log write
Pull cifs fixes from Steve French:
"Three cifs/smb3 fixes, including two for stable, and a fix for an
fallocate problem noticed by Clang"
* tag '5.14-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: add missing parsing of backupuid
smb3: rc uninitialized in one fallocate path
SMB3: fix readpage for large swap cache
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi
(mac80211) and netfilter trees.
Current release - regressions:
- mac80211: fix starting aggregation sessions on mesh interfaces
Current release - new code bugs:
- sctp: send pmtu probe only if packet loss in Search Complete state
- bnxt_en: add missing periodic PHC overflow check
- devlink: fix phys_port_name of virtual port and merge error
- hns3: change the method of obtaining default ptp cycle
- can: mcba_usb_start(): add missing urb->transfer_dma initialization
Previous releases - regressions:
- set true network header for ECN decapsulation
- mlx5e: RX, avoid possible data corruption w/ relaxed ordering and
LRO
- phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811
PHY
- sctp: fix return value check in __sctp_rcv_asconf_lookup
Previous releases - always broken:
- bpf:
- more spectre corner case fixes, introduce a BPF nospec
instruction for mitigating Spectre v4
- fix OOB read when printing XDP link fdinfo
- sockmap: fix cleanup related races
- mac80211: fix enabling 4-address mode on a sta vif after assoc
- can:
- raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
- j1939: j1939_session_deactivate(): clarify lifetime of session
object, avoid UAF
- fix number of identical memory leaks in USB drivers
- tipc:
- do not blindly write skb_shinfo frags when doing decryption
- fix sleeping in tipc accept routine"
* tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
gve: Update MAINTAINERS list
can: esd_usb2: fix memory leak
can: ems_usb: fix memory leak
can: usb_8dev: fix memory leak
can: mcba_usb_start(): add missing urb->transfer_dma initialization
can: hi311x: fix a signedness bug in hi3110_cmd()
MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
bpf: Fix leakage due to insufficient speculative store bypass mitigation
bpf: Introduce BPF nospec instruction for mitigating Spectre v4
sis900: Fix missing pci_disable_device() in probe and remove
net: let flow have same hash in two directions
nfc: nfcsim: fix use after free during module unload
tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
sctp: fix return value check in __sctp_rcv_asconf_lookup
nfc: s3fwrn5: fix undefined parameter values in dev_err()
net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32
net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()
net/mlx5: Unload device upon firmware fatal error
net/mlx5e: Fix page allocation failure for ptp-RQ over SF
net/mlx5e: Fix page allocation failure for trap-RQ over SF
...
Pull ACPI fixes from Rafael Wysocki:
"These revert a recent IRQ resources handling modification that turned
out to be problematic, fix suspend-to-idle handling on AMD platforms
to take upcoming systems into account properly and fix the retrieval
of the DPTF attributes of the PCH FIVR.
Specifics:
- Revert recent change of the ACPI IRQ resources handling that
attempted to improve the ACPI IRQ override selection logic, but
introduced serious regressions on some systems (Hui Wang).
- Fix up quirks for AMD platforms in the suspend-to-idle support code
so as to take upcoming systems using uPEP HID AMDI007 into account
as appropriate (Mario Limonciello).
- Fix the code retrieving DPTF attributes of the PCH FIVR so that it
agrees on the return data type with the ACPI control method
evaluated for this purpose (Srinivas Pandruvada)"
* tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: DPTF: Fix reading of attributes
Revert "ACPI: resources: Add checks for ACPI IRQ override"
ACPI: PM: Add support for upcoming AMD uPEP HID AMDI007
Since commit 1b6b26ae70 ("pipe: fix and clarify pipe write wakeup
logic") we have sanitized the pipe write logic, and would only try to
wake up readers if they needed it.
In particular, if the pipe already had data in it before the write,
there was no point in trying to wake up a reader, since any existing
readers must have been aware of the pre-existing data already. Doing
extraneous wakeups will only cause potential thundering herd problems.
However, it turns out that some Android libraries have misused the EPOLL
interface, and expected "edge triggered" be to "any new write will
trigger it". Even if there was no edge in sight.
Quoting Sandeep Patil:
"The commit 1b6b26ae70 ('pipe: fix and clarify pipe write wakeup
logic') changed pipe write logic to wakeup readers only if the pipe
was empty at the time of write. However, there are libraries that
relied upon the older behavior for notification scheme similar to
what's described in [1]
One such library 'realm-core'[2] is used by numerous Android
applications. The library uses a similar notification mechanism as GNU
Make but it never drains the pipe until it is full. When Android moved
to v5.10 kernel, all applications using this library stopped working.
The library has since been fixed[3] but it will be a while before all
applications incorporate the updated library"
Our regression rule for the kernel is that if applications break from
new behavior, it's a regression, even if it was because the application
did something patently wrong. Also note the original report [4] by
Michal Kerrisk about a test for this epoll behavior - but at that point
we didn't know of any actual broken use case.
So add the extraneous wakeup, to approximate the old behavior.
[ I say "approximate", because the exact old behavior was to do a wakeup
not for each write(), but for each pipe buffer chunk that was filled
in. The behavior introduced by this change is not that - this is just
"every write will cause a wakeup, whether necessary or not", which
seems to be sufficient for the broken library use. ]
It's worth noting that this adds the extraneous wakeup only for the
write side, while the read side still considers the "edge" to be purely
about reading enough from the pipe to allow further writes.
See commit f467a6a664 ("pipe: fix and clarify pipe read wakeup logic")
for the pipe read case, which remains that "only wake up if the pipe was
full, and we read something from it".
Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1]
Link: https://github.com/realm/realm-core [2]
Link: https://github.com/realm/realm-core/issues/4666 [3]
Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4]
Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/
Reported-by: Sandeep Patil <sspatil@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block fixes from Jens Axboe:
- gendisk freeing fix (Christoph)
- blk-iocost wake ordering fix (Tejun)
- tag allocation error handling fix (John)
- loop locking fix. While this isn't the prettiest fix in the world,
nobody has any good alternatives for 5.14. Something to likely
revisit for 5.15. (Tetsuo)
* tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
block: delay freeing the gendisk
blk-iocost: fix operation ordering in iocg_wake_fn()
blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling
loop: reintroduce global lock for safe loop_validate_file() traversal
Pull libata fixlets from Jens Axboe:
- A fix for PIO highmem (Christoph)
- Kill HAVE_IDE as it's now unused (Lukas)
* tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
arch: Kconfig: clean up obsolete use of HAVE_IDE
libata: fix ata_pio_sector for CONFIG_HIGHMEM
Pull btrfs fixes from David Sterba:
- fix -Warray-bounds warning, to help external patchset to make it
default treewide
- fix writeable device accounting (syzbot report)
- fix fsync and log replay after a rename and inode eviction
- fix potentially lost error code when submitting multiple bios for
compressed range
* tag 'for-5.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: calculate number of eb pages properly in csum_tree_block
btrfs: fix rw device counting in __btrfs_free_extra_devids
btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction
btrfs: mark compressed range uptodate only if all bio succeed
Pull HID fixes from Jiri Kosina:
- resume timing fix for intel-ish driver (Ye Xiang)
- fix for using incorrect MMIO register in amd_sfh driver (Dylan
MacKenzie)
- Cintiq 24HDT / 27QHDT regression fix and touch processing fix for
Wacom driver (Jason Gerecke)
- device removal bugfix for ft260 driver (Michael Zaidman)
- other small assorted fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: ft260: fix device removal due to USB disconnect
HID: wacom: Skip processing of touches with negative slot values
HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT
HID: Kconfig: Fix spelling mistake "Uninterruptable" -> "Uninterruptible"
HID: apple: Add support for Keychron K1 wireless keyboard
HID: fix typo in Kconfig
HID: ft260: fix format type warning in ft260_word_show()
HID: amd_sfh: Use correct MMIO register for DMA address
HID: asus: Remove check for same LED brightness on set
HID: intel-ish-hid: use async resume function
Merge misc fixes from Andrew Morton:
"7 patches.
Subsystems affected by this patch series: lib, ocfs2, and mm (slub,
migration, and memcg)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()
slub: fix unreclaimable slab stat for bulk free
mm/migrate: fix NR_ISOLATED corruption on 64-bit
mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code
ocfs2: issue zeroout to EOF blocks
ocfs2: fix zero out valid data
lib/test_string.c: move string selftest in the Runtime Testing menu
Marc Kleine-Budde says:
====================
pull-request: can 2021-07-30
The first patch is by me and adds Yasushi SHOJI as a reviewer for the
Microchip CAN BUS Analyzer Tool driver.
Dan Carpenter's patch fixes a signedness bug in the hi311x driver.
Pavel Skripkin provides 4 patches, the first targets the mcba_usb
driver by adding the missing urb->transfer_dma initialization, which
was broken in a previous commit. The last 3 patches fix a memory leak
in the usb_8dev, ems_usb and esd_usb2 driver.
* tag 'linux-can-fixes-for-5.14-20210730' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: esd_usb2: fix memory leak
can: ems_usb: fix memory leak
can: usb_8dev: fix memory leak
can: mcba_usb_start(): add missing urb->transfer_dma initialization
can: hi311x: fix a signedness bug in hi3110_cmd()
MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
====================
Link: https://lore.kernel.org/r/20210730070526.1699867-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When I use kfree_rcu() to free a large memory allocated by kmalloc_node(),
the following dump occurs.
BUG: kernel NULL pointer dereference, address: 0000000000000020
[...]
Oops: 0000 [#1] SMP
[...]
Workqueue: events kfree_rcu_work
RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline]
RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline]
RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363
[...]
Call Trace:
kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293
kfree_bulk include/linux/slab.h:413 [inline]
kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300
process_one_work+0x207/0x530 kernel/workqueue.c:2276
worker_thread+0x320/0x610 kernel/workqueue.c:2422
kthread+0x13d/0x160 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
When kmalloc_node() a large memory, page is allocated, not slab, so when
freeing memory via kfree_rcu(), this large memory should not be used by
memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for
slab.
Using page_objcgs_check() instead of page_objcgs() in
memcg_slab_free_hook() to fix this bug.
Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com
Fixes: 270c6a7146 ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>