Introduce the LANDLOCK_ACCESS_FS_TRUNCATE flag for file truncation.
This flag hooks into the path_truncate, file_truncate and
file_alloc_security LSM hooks and covers file truncation using
truncate(2), ftruncate(2), open(2) with O_TRUNC, as well as creat().
This change also increments the Landlock ABI version, updates
corresponding selftests, and updates code documentation to document
the flag.
In security/security.c, allocate security blobs at pointer-aligned
offsets. This fixes the problem where one LSM's security blob can
shift another LSM's security blob to an unaligned address (reported
by Nathan Chancellor).
The following operations are restricted:
open(2): requires the LANDLOCK_ACCESS_FS_TRUNCATE right if a file gets
implicitly truncated as part of the open() (e.g. using O_TRUNC).
Notable special cases:
* open(..., O_RDONLY|O_TRUNC) can truncate files as well in Linux
* open() with O_TRUNC does *not* need the TRUNCATE right when it
creates a new file.
truncate(2) (on a path): requires the LANDLOCK_ACCESS_FS_TRUNCATE
right.
ftruncate(2) (on a file): requires that the file had the TRUNCATE
right when it was previously opened. File descriptors acquired by
other means than open(2) (e.g. memfd_create(2)) continue to support
truncation with ftruncate(2).
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com> (LSM)
Link: https://lore.kernel.org/r/20221018182216.301684-5-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-10-18
We've added 33 non-merge commits during the last 14 day(s) which contain
a total of 31 files changed, 874 insertions(+), 538 deletions(-).
The main changes are:
1) Add RCU grace period chaining to BPF to wait for the completion
of access from both sleepable and non-sleepable BPF programs,
from Hou Tao & Paul E. McKenney.
2) Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
values. In the wild we have seen OS vendors doing buggy backports
where helper call numbers mismatched. This is an attempt to make
backports more foolproof, from Andrii Nakryiko.
3) Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions,
from Roberto Sassu.
4) Fix libbpf's BTF dumper for structs with padding-only fields,
from Eduard Zingerman.
5) Fix various libbpf bugs which have been found from fuzzing with
malformed BPF object files, from Shung-Hsi Yu.
6) Clean up an unneeded check on existence of SSE2 in BPF x86-64 JIT,
from Jie Meng.
7) Fix various ASAN bugs in both libbpf and selftests when running
the BPF selftest suite on arm64, from Xu Kuohai.
8) Fix missing bpf_iter_vma_offset__destroy() call in BPF iter selftest
and use in-skeleton link pointer to remove an explicit bpf_link__destroy(),
from Jiri Olsa.
9) Fix BPF CI breakage by pointing to iptables-legacy instead of relying
on symlinked iptables which got upgraded to iptables-nft,
from Martin KaFai Lau.
10) Minor BPF selftest improvements all over the place, from various others.
* tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (33 commits)
bpf/docs: Update README for most recent vmtest.sh
bpf: Use rcu_trace_implies_rcu_gp() for program array freeing
bpf: Use rcu_trace_implies_rcu_gp() in local storage map
bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
rcu-tasks: Provide rcu_trace_implies_rcu_gp()
selftests/bpf: Use sys_pidfd_open() helper when possible
libbpf: Fix null-pointer dereference in find_prog_by_sec_insn()
libbpf: Deal with section with no data gracefully
libbpf: Use elf_getshdrnum() instead of e_shnum
selftest/bpf: Fix error usage of ASSERT_OK in xdp_adjust_tail.c
selftests/bpf: Fix error failure of case test_xdp_adjust_tail_grow
selftest/bpf: Fix memory leak in kprobe_multi_test
selftests/bpf: Fix memory leak caused by not destroying skeleton
libbpf: Fix memory leak in parse_usdt_arg()
libbpf: Fix use-after-free in btf_dump_name_dups
selftests/bpf: S/iptables/iptables-legacy/ in the bpf_nf and xdp_synproxy test
selftests/bpf: Alphabetize DENYLISTs
selftests/bpf: Add tests for _opts variants of bpf_*_get_fd_by_id()
libbpf: Introduce bpf_link_get_fd_by_id_opts()
libbpf: Introduce bpf_btf_get_fd_by_id_opts()
...
====================
Link: https://lore.kernel.org/r/20221018210631.11211-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit causes torture.sh to use the new --bootargs and --datestamp
parameters to kvm-again.sh in order to avoid redundant kernel builds
during rcuscale and refscale testing. This trims the better part of an
hour off of torture.sh runs that use --do-kasan.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit adds a --datestamp parameter to kvm-again.sh, which, in
contrast to the existing --rundir argument, specifies only the last
segments of the pathname. This addition enables torture.sh to use
kvm-again.sh in order to avoid redundant kernel builds.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
As it should, the kvm-recheck.sh script sets the TORTURE_SUITE bash
variable based on the type of rcutorture test being run. However,
it does not export it. Which is OK, at least until you try running
kvm-again.sh on either a rcuscale or a refscale test, at which point you
get false-positive "no success message, N successful version messages"
errors. This commit therefore causes the kvm-recheck.sh script to export
TORTURE_SUITE, suppressing these false positives.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The kvm-again.sh script, when running locally, can place the QEMU output
into kvm-test-1-run-qemu.sh.out instead of kvm-test-1-run.sh.out. This
commit therefore makes kvm-test-1-run-qemu.sh check both locations.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit drags the rcutorture scripting kicking and screaming into the
twenty-first century by making use of the BSD-derived mktemp command to
create temporary files and directories. In happy contrast to many of its
ill-behaved predecessors, mktemp seems to actually work reasonably reliably!
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The kvm-again.sh script can be used to repeat short boot-time tests,
but the kernel boot arguments cannot be changed. This means that every
change in kernel boot arguments currently necessitates a kernel build,
which greatly increases the duration of kernel-boot testing.
This commit therefore adds a --bootargs parameter to kvm-again.sh,
which allows a given kernel to be repeatedly booted, but overriding
old and adding new kernel boot parameters. This allows an old kernel
to be booted with new kernel boot parameters, avoiding the overhead of
rebuilding the kernel under test.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
commit 95c104c378 ("tracing: Auto generate event name when creating a
group of events") changed the syntax in the ftrace README file which is
used by the selftests to check what features are support. Adjust the
string to make test_duplicates.tc and trigger-synthetic-eprobe.tc work
again.
Fixes: 95c104c378 ("tracing: Auto generate event name when creating a group of events")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Remove the redundant warning information of online_all_offline_memory()
since there is a warning in online_memory_expect_success().
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Don't use the test-specific header files as source files to force a
target dependency, as clang will complain if more than one source file
is used for a compile command with a single '-o' flag.
Use the proper Makefile variables instead as defined in
tools/testing/selftests/lib.mk.
Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Pull more MM updates from Andrew Morton:
- fix a race which causes page refcounting errors in ZONE_DEVICE pages
(Alistair Popple)
- fix userfaultfd test harness instability (Peter Xu)
- various other patches in MM, mainly fixes
* tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits)
highmem: fix kmap_to_page() for kmap_local_page() addresses
mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page
mm/selftest: uffd: explain the write missing fault check
mm/hugetlb: use hugetlb_pte_stable in migration race check
mm/hugetlb: fix race condition of uffd missing/minor handling
zram: always expose rw_page
LoongArch: update local TLB if PTE entry exists
mm: use update_mmu_tlb() on the second thread
kasan: fix array-bounds warnings in tests
hmm-tests: add test for migrate_device_range()
nouveau/dmem: evict device private memory during release
nouveau/dmem: refactor nouveau_dmem_fault_copy_one()
mm/migrate_device.c: add migrate_device_range()
mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page()
mm/memremap.c: take a pgmap reference on page allocation
mm: free device private pages have zero refcount
mm/memory.c: fix race when faulting a device private page
mm/damon: use damon_sz_region() in appropriate place
mm/damon: move sz_damon_region to damon_sz_region
lib/test_meminit: add checks for the allocation functions
...
SYS_pidfd_open may be undefined for old glibc, so using sys_pidfd_open()
helper defined in task_local_storage_helpers.h instead to fix potential
build failure.
And according to commit 7615d9e178 ("arch: wire-up pidfd_open()"), the
syscall number of pidfd_open is always 434 except for alpha architure,
so update the definition of __NR_pidfd_open accordingly.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221011071249.3471760-1-houtao@huaweicloud.com
test_xdp_adjust_tail_grow failed with ipv6:
test_xdp_adjust_tail_grow:FAIL:ipv6 unexpected error: -28 (errno 28)
The reason is that this test case tests ipv4 before ipv6, and when ipv4
test finished, topts.data_size_out was set to 54, which is smaller than the
ipv6 output data size 114, so ipv6 test fails with NOSPC error.
Fix it by reset topts.data_size_out to sizeof(buf) before testing ipv6.
Fixes: 04fcb5f9a1 ("selftests/bpf: Migrate from bpf_prog_test_run")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20221011120108.782373-6-xukuohai@huaweicloud.com
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, and wifi.
Current release - regressions:
- Revert "net/sched: taprio: make qdisc_leaf() see the
per-netdev-queue pfifo child qdiscs", it may cause crashes when the
qdisc is reconfigured
- inet: ping: fix splat due to packet allocation refactoring in inet
- tcp: clean up kernel listener's reqsk in inet_twsk_purge(), fix UAF
due to races when per-netns hash table is used
Current release - new code bugs:
- eth: adin1110: check in netdev_event that netdev belongs to driver
- fixes for PTR_ERR() vs NULL bugs in driver code, from Dan and co.
Previous releases - regressions:
- ipv4: handle attempt to delete multipath route when fib_info
contains an nh reference, avoid oob access
- wifi: fix handful of bugs in the new Multi-BSSID code
- wifi: mt76: fix rate reporting / throughput regression on mt7915
and newer, fix checksum offload
- wifi: iwlwifi: mvm: fix double list_add at
iwl_mvm_mac_wake_tx_queue (other cases)
- wifi: mac80211: do not drop packets smaller than the LLC-SNAP
header on fast-rx
Previous releases - always broken:
- ieee802154: don't warn zero-sized raw_sendmsg()
- ipv6: ping: fix wrong checksum for large frames
- mctp: prevent double key removal and unref
- tcp/udp: fix memory leaks and races around IPV6_ADDRFORM
- hv_netvsc: fix race between VF offering and VF association message
Misc:
- remove -Warray-bounds silencing in the drivers, compilers fixed"
* tag 'net-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
sunhme: fix an IS_ERR() vs NULL check in probe
net: marvell: prestera: fix a couple NULL vs IS_ERR() checks
kcm: avoid potential race in kcm_tx_work
tcp: Clean up kernel listener's reqsk in inet_twsk_purge()
net: phy: micrel: Fixes FIELD_GET assertion
openvswitch: add nf_ct_is_confirmed check before assigning the helper
tcp: Fix data races around icsk->icsk_af_ops.
ipv6: Fix data races around sk->sk_prot.
tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).
tcp/udp: Fix memory leak in ipv6_renew_options().
mctp: prevent double key removal and unref
selftests: netfilter: Fix nft_fib.sh for all.rp_filter=1
netfilter: rpfilter/fib: Populate flowic_l3mdev field
selftests: netfilter: Test reverse path filtering
net/mlx5: Make ASO poll CQ usable in atomic context
tcp: cdg: allow tcp_cdg_release() to be called multiple times
inet: ping: fix recent breakage
ipv6: ping: fix wrong checksum for large frames
net: ethernet: ti: am65-cpsw: set correct devlink flavour for unused ports
...
In commit 1bfe26fb08 ("bpf: Add verifier support for custom callback
return range"), the verifier was updated to require callbacks to BPF
helpers to explicitly specify the range of values that can be returned.
bpf_user_ringbuf_drain() was merged after this in commit 2057156738
("bpf: Add bpf_user_ringbuf_drain() helper"), and this change in default
behavior was missed. This patch updates the BPF_MAP_TYPE_USER_RINGBUF
selftests to also return 1 from a bpf_user_ringbuf_drain() callback so
as to properly test this going forward.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221012232015.1510043-3-void@manifault.com
The recent vm image in CI has reported error in selftests that use
the iptables command. Manu Bretelle has pointed out the difference
in the recent vm image that the iptables is sym-linked to the iptables-nft.
With this knowledge, I can also reproduce the CI error by manually running
with the 'iptables-nft'.
This patch is to replace the iptables command with iptables-legacy
to unblock the CI tests.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20221012221235.3529719-1-martin.lau@linux.dev
It's required by vm_userspace_mem_region_add() that memory size
should be aligned to host page size. However, one guest page is
provided by memslot_modification_stress_test. It triggers failure
in the scenario of 64KB-page-size-host and 4KB-page-size-guest,
as the following messages indicate.
# ./memslot_modification_stress_test
Testing guest mode: PA-bits:40, VA-bits:48, 4K pages
guest physical test memory: [0xffbfff0000, 0xffffff0000)
Finished creating vCPUs
Started all vCPUs
==== Test Assertion Failure ====
lib/kvm_util.c:824: vm_adjust_num_guest_pages(vm->mode, npages) == npages
pid=5712 tid=5712 errno=0 - Success
1 0x0000000000404eeb: vm_userspace_mem_region_add at kvm_util.c:822
2 0x0000000000401a5b: add_remove_memslot at memslot_modification_stress_test.c:82
3 (inlined by) run_test at memslot_modification_stress_test.c:110
4 0x0000000000402417: for_each_guest_mode at guest_modes.c:100
5 0x00000000004016a7: main at memslot_modification_stress_test.c:187
6 0x0000ffffb8cd4383: ?? ??:0
7 0x0000000000401827: _start at :?
Number of guest pages is not compatible with the host. Try npages=16
Fix the issue by providing 16 guest pages to the memory slot for this
particular combination of 64KB-page-size-host and 4KB-page-size-guest
on aarch64.
Fixes: ef4c9f4f65 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()")
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221013063020.201856-1-gshan@redhat.com
Pull more KUnit updates from Shuah Khan:
"Features and fixes:
- simplify resource use
- make kunit_malloc() and kunit_free() allocations and frees
consistent. kunit_free() frees only the memory allocated by
kunit_malloc()
- stop downloading risc-v opensbi binaries using wget
- other fixes and improvements to tool and KUnit framework"
* tag 'linux-kselftest-kunit-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: kunit: Update description of --alltests option
kunit: declare kunit_assert structs as const
kunit: rename base KUNIT_ASSERTION macro to _KUNIT_FAILED
kunit: remove format func from struct kunit_assert, get it to 0 bytes
kunit: tool: Don't download risc-v opensbi firmware with wget
kunit: make kunit_kfree(NULL) a no-op to match kfree()
kunit: make kunit_kfree() not segfault on invalid inputs
kunit: make kunit_kfree() only work on pointers from kunit_malloc() and friends
kunit: drop test pointer in string_stream_fragment
kunit: string-stream: Simplify resource use
Pull more Kselftest updates from Shuah Khan:
"This consists of fixes and improvements to memory-hotplug test and a
minor spelling fix to ftrace test"
* tag 'linux-kselftest-next-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
docs: notifier-error-inject: Correct test's name
selftests/memory-hotplug: Adjust log info for maintainability
selftests/memory-hotplug: Restore memory before exit
selftests/memory-hotplug: Add checking after online or offline
selftests/ftrace: func_event_triggers: fix typo in user message
Pull non-MM updates from Andrew Morton:
- hfs and hfsplus kmap API modernization (Fabio Francesco)
- make crash-kexec work properly when invoked from an NMI-time panic
(Valentin Schneider)
- ntfs bugfixes (Hawkins Jiawei)
- improve IPC msg scalability by replacing atomic_t's with percpu
counters (Jiebin Sun)
- nilfs2 cleanups (Minghao Chi)
- lots of other single patches all over the tree!
* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
proc: test how it holds up with mapping'less process
mailmap: update Frank Rowand email address
ia64: mca: use strscpy() is more robust and safer
init/Kconfig: fix unmet direct dependencies
ia64: update config files
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
fork: remove duplicate included header files
init/main.c: remove unnecessary (void*) conversions
proc: mark more files as permanent
nilfs2: remove the unneeded result variable
nilfs2: delete unnecessary checks before brelse()
checkpatch: warn for non-standard fixes tag style
usr/gen_init_cpio.c: remove unnecessary -1 values from int file
ipc/msg: mitigate the lock contention with percpu counter
percpu: add percpu_counter_add_local and percpu_counter_sub_local
fs/ocfs2: fix repeated words in comments
relay: use kvcalloc to alloc page array in relay_alloc_page_array
proc: make config PROC_CHILDREN depend on PROC_FS
fs: uninline inode_maybe_inc_iversion()
...
If net.ipv4.conf.all.rp_filter is set, it overrides the per-interface
setting and thus defeats the fix from bbe4c0896d ("selftests:
netfilter: disable rp_filter on router"). Unset it as well to cover that
case.
Fixes: bbe4c0896d ("selftests: netfilter: disable rp_filter on router")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Test reverse path (filter) matches in iptables, ip6tables and nftables.
Both with a regular interface and a VRF.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
The DENYLIST and DENYLIST.s390x files are used to specify testcases
which should not be run on CI. Currently, testcases are appended to the
end of these files as needed. This can make it a pain to resolve merge
conflicts. This patch alphabetizes the DENYLIST files to ease this
burden.
Signed-off-by: David Vernet <void@manifault.com>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/r/20221011165255.774014-1-void@manifault.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Pull memblock updates from Mike Rapoport:
"Test suite improvements:
- Added verification that memblock allocations zero the allocated
memory
- Added more test cases for memblock_add(), memblock_remove(),
memblock_reserve() and memblock_free()
- Added tests for memblock_*_raw() family
- Added tests for NUMA-aware allocations in memblock_alloc_try_nid()
and memblock_alloc_try_nid_raw()"
* tag 'memblock-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock tests: add generic NUMA tests for memblock_alloc_try_nid*
memblock tests: add bottom-up NUMA tests for memblock_alloc_try_nid*
memblock tests: add top-down NUMA tests for memblock_alloc_try_nid*
memblock tests: add simulation of physical memory with multiple NUMA nodes
memblock_tests: move variable declarations to single block
memblock tests: remove 'cleared' from comment blocks
memblock tests: add tests for memblock_trim_memory
memblock tests: add tests for memblock_*bottom_up functions
memblock tests: update alloc_nid_api to test memblock_alloc_try_nid_raw
memblock tests: update alloc_api to test memblock_alloc_raw
memblock tests: add additional tests for basic api and memblock_alloc
memblock tests: add labels to verbose output for generic alloc tests
memblock tests: update zeroed memory check for memblock_alloc_* tests
memblock tests: update tests to check if memblock_alloc zeroed memory
memblock tests: update reference to obsolete build option in comments
memblock tests: add command line help option
Pull more kvm updates from Paolo Bonzini:
"The main batch of ARM + RISC-V changes, and a few fixes and cleanups
for x86 (PMU virtualization and selftests).
ARM:
- Fixes for single-stepping in the presence of an async exception as
well as the preservation of PSTATE.SS
- Better handling of AArch32 ID registers on AArch64-only systems
- Fixes for the dirty-ring API, allowing it to work on architectures
with relaxed memory ordering
- Advertise the new kvmarm mailing list
- Various minor cleanups and spelling fixes
RISC-V:
- Improved instruction encoding infrastructure for instructions not
yet supported by binutils
- Svinval support for both KVM Host and KVM Guest
- Zihintpause support for KVM Guest
- Zicbom support for KVM Guest
- Record number of signal exits as a VCPU stat
- Use generic guest entry infrastructure
x86:
- Misc PMU fixes and cleanups.
- selftests: fixes for Hyper-V hypercall
- selftests: fix nx_huge_pages_test on TDP-disabled hosts
- selftests: cleanups for fix_hypercall_test"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (57 commits)
riscv: select HAVE_POSIX_CPU_TIMERS_TASK_WORK
RISC-V: KVM: Use generic guest entry infrastructure
RISC-V: KVM: Record number of signal exits as a vCPU stat
RISC-V: KVM: add __init annotation to riscv_kvm_init()
RISC-V: KVM: Expose Zicbom to the guest
RISC-V: KVM: Provide UAPI for Zicbom block size
RISC-V: KVM: Make ISA ext mappings explicit
RISC-V: KVM: Allow Guest use Zihintpause extension
RISC-V: KVM: Allow Guest use Svinval extension
RISC-V: KVM: Use Svinval for local TLB maintenance when available
RISC-V: Probe Svinval extension form ISA string
RISC-V: KVM: Change the SBI specification version to v1.0
riscv: KVM: Apply insn-def to hlv encodings
riscv: KVM: Apply insn-def to hfence encodings
riscv: Introduce support for defining instructions
riscv: Add X register names to gpr-nums
KVM: arm64: Advertise new kvmarm mailing list
kvm: vmx: keep constant definition format consistent
kvm: mmu: fix typos in struct kvm_arch
KVM: selftests: Fix nx_huge_pages_test on TDP-disabled hosts
...
Create process without mappings and check
/proc/*/maps
/proc/*/numa_maps
/proc/*/smaps
/proc/*/smaps_rollup
They must be empty (excluding vsyscall page) or full of zeroes.
Retroactively this test should've caught embarassing /proc/*/smaps_rollup
oops:
[17752.703567] BUG: kernel NULL pointer dereference, address: 0000000000000000
[17752.703580] #PF: supervisor read access in kernel mode
[17752.703583] #PF: error_code(0x0000) - not-present page
[17752.703587] PGD 0 P4D 0
[17752.703593] Oops: 0000 [#1] PREEMPT SMP PTI
[17752.703598] CPU: 0 PID: 60649 Comm: cat Tainted: G W 5.19.9-100.fc35.x86_64 #1
[17752.703603] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme6/3.1, BIOS P3.30 08/05/2016
[17752.703607] RIP: 0010:show_smaps_rollup+0x159/0x2e0
Note 1:
ProtectionKey field in /proc/*/smaps is optional,
so check most of its contents, not everything.
Note 2:
due to the nature of this test, child process hardly can signal
its readiness (after unmapping everything!) to parent.
I feel like "sleep(1)" is justified.
If you know how to do it without sleep please tell me.
Note 3:
/proc/*/statm is not tested but can be.
Link: https://lkml.kernel.org/r/Yz3liL6Dn+n2SD8Q@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull MM updates from Andrew Morton:
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any
negative reports (or any positive ones, come to that).
- Also the Maple Tree from Liam Howlett. An overlapping range-based
tree for vmas. It it apparently slightly more efficient in its own
right, but is mainly targeted at enabling work to reduce mmap_lock
contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
at [1]. This has yet to be addressed due to Liam's unfortunately
timed vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down
to the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
support file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging
activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]
* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
hugetlb: allocate vma lock for all sharable vmas
hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
hugetlb: fix vma lock handling during split vma and range unmapping
mglru: mm/vmscan.c: fix imprecise comments
mm/mglru: don't sync disk for each aging cycle
mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
mm: memcontrol: use do_memsw_account() in a few more places
mm: memcontrol: deprecate swapaccounting=0 mode
mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
mm/secretmem: remove reduntant return value
mm/hugetlb: add available_huge_pages() func
mm: remove unused inline functions from include/linux/mm_inline.h
selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
selftests/vm: add thp collapse shmem testing
selftests/vm: add thp collapse file and tmpfs testing
selftests/vm: modularize thp collapse memory operations
selftests/vm: dedup THP helpers
mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
mm/madvise: add file and shmem support to MADV_COLLAPSE
...
Introduce the data_input map, write-protected with a small eBPF program
implementing the lsm/bpf_map hook.
Then, ensure that bpf_map_get_fd_by_id() and bpf_map_get_fd_by_id_opts()
with NULL opts don't succeed due to requesting read-write access to the
write-protected map. Also, ensure that bpf_map_get_fd_by_id_opts() with
open_flags in opts set to BPF_F_RDONLY instead succeeds.
After obtaining a read-only fd, ensure that only map lookup succeeds and
not update. Ensure that update works only with the read-write fd obtained
at program loading time, when the write protection was not yet enabled.
Finally, ensure that the other _opts variants of bpf_*_get_fd_by_id() don't
work if the BPF_F_RDONLY flag is set in opts (due to the kernel not
handling the open_flags member of bpf_attr).
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221006110736.84253-7-roberto.sassu@huaweicloud.com
Pull tpm updates from Jarkko Sakkinen:
"Just a few bug fixes this time"
* tag 'tpmdd-next-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
selftest: tpm2: Add Client.__del__() to close /dev/tpm* handle
security/keys: Remove inconsistent __user annotation
char: move from strlcpy with unused retval to strscpy
Pull tracing updates from Steven Rostedt:
"Major changes:
- Changed location of tracing repo from personal git repo to:
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
- Added Masami Hiramatsu as co-maintainer
- Updated MAINTAINERS file to separate out FTRACE as it is more than
just TRACING.
Minor changes:
- Added Mark Rutland as FTRACE reviewer
- Updated user_events to make it on its way to remove the BROKEN tag.
The changes should now be acceptable but will run it through a
cycle and hopefully we can remove the BROKEN tag next release.
- Added filtering to eprobes
- Added a delta time to the benchmark trace event
- Have the histogram and filter callbacks called via a switch
statement instead of indirect functions. This speeds it up to avoid
retpolines.
- Add a way to wake up ring buffer waiters waiting for the ring
buffer to fill up to its watermark.
- New ioctl() on the trace_pipe_raw file to wake up ring buffer
waiters.
- Wake up waiters when the ring buffer is disabled. A reader may
block when the ring buffer is disabled, but if it was blocked when
the ring buffer is disabled it should then wake up.
Fixes:
- Allow splice to read partially read ring buffer pages. This fixes
splice never moving forward.
- Fix inverted compare that made the "shortest" ring buffer wait
queue actually the longest.
- Fix a race in the ring buffer between resetting a page when a
writer goes to another page, and the reader.
- Fix ftrace accounting bug when function hooks are added at boot up
before the weak functions are set to "disabled".
- Fix bug that freed a user allocated snapshot buffer when enabling a
tracer.
- Fix possible recursive locks in osnoise tracer
- Fix recursive locking direct functions
- Other minor clean ups and fixes"
* tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
ftrace: Create separate entry in MAINTAINERS for function hooks
tracing: Update MAINTAINERS to reflect new tracing git repo
tracing: Do not free snapshot if tracer is on cmdline
ftrace: Still disable enabled records marked as disabled
tracing/user_events: Move pages/locks into groups to prepare for namespaces
tracing: Add Masami Hiramatsu as co-maintainer
tracing: Remove unused variable 'dups'
MAINTAINERS: add myself as a tracing reviewer
ring-buffer: Fix race between reset page and reading page
tracing/user_events: Update ABI documentation to align to bits vs bytes
tracing/user_events: Use bits vs bytes for enabled status page data
tracing/user_events: Use refcount instead of atomic for ref tracking
tracing/user_events: Ensure user provided strings are safely formatted
tracing/user_events: Use WRITE instead of READ for io vector import
tracing/user_events: Use NULL for strstr checks
tracing: Fix spelling mistake "preapre" -> "prepare"
tracing: Wake up waiters when tracing is disabled
tracing: Add ioctl() to force ring buffer waiters to wake up
tracing: Wake up ring buffer waiters on closing of the file
ring-buffer: Add ring_buffer_wake_waiters()
...
Pull livepatching updates from Petr Mladek:
- Fix race between fork and livepatch transition revert
- Add sysfs entry that shows "patched" state for each object (module)
that can be livepatched by the given livepatch
- Some clean up
* tag 'livepatching-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests/livepatch: add sysfs test
livepatch: add sysfs entry "patched" for each klp_object
selftests/livepatch: normalize sysctl error message
livepatch: Add a missing newline character in klp_module_coming()
livepatch: fix race between fork and KLP transition
Pull cgroup updates from Tejun Heo:
- cpuset now support isolated cpus.partition type, which will enable
dynamic CPU isolation
- pids.peak added to remember the max number of pids used
- holes in cgroup namespace plugged
- internal cleanups
* tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (25 commits)
cgroup: use strscpy() is more robust and safer
iocost_monitor: reorder BlkgIterator
cgroup: simplify code in cgroup_apply_control
cgroup: Make cgroup_get_from_id() prettier
cgroup/cpuset: remove unreachable code
cgroup: Remove CFTYPE_PRESSURE
cgroup: Improve cftype add/rm error handling
kselftest/cgroup: Add cpuset v2 partition root state test
cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst
cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule
cgroup/cpuset: Relocate a code block in validate_change()
cgroup/cpuset: Show invalid partition reason string
cgroup/cpuset: Add a new isolated cpus.partition type
cgroup/cpuset: Relax constraints to partition & cpus changes
cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective
cgroup/cpuset: Miscellaneous cleanups & add helper functions
cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset
cgroup: add pids.peak interface for pids controller
cgroup: Remove data-race around cgrp_dfl_visible
cgroup: Fix build failure when CONFIG_SHRINKER_DEBUG
...
Pull locking updates from Ingo Molnar:
- Disable preemption in rwsem_write_trylock()'s attempt to take the
rwsem, to avoid RT tasks hogging the CPU, which managed to preempt
this function after the owner has been cleared but before a new owner
is set. Also add debug checks to enforce this.
- Add __lockfunc to more slow path functions and add __sched to
semaphore functions.
- Mark spinlock APIs noinline when the respective CONFIG_INLINE_SPIN_*
toggles are disabled, to reduce LTO text size.
- Print more debug information when lockdep gets confused in
look_up_lock_class().
- Improve header file abuse checks.
- Misc cleanups
* tag 'locking-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Print more debug information - report name and key when look_up_lock_class() got confused
locking: Add __sched to semaphore functions
locking/rwsem: Disable preemption while trying for rwsem lock
locking: Detect includes rwlock.h outside of spinlock.h
locking: Add __lockfunc to slow path functions
locking/spinlocks: Mark spinlocks noinline when inline spinlocks are disabled
selftests: futex: Fix 'the the' typo in comment
Pull powerpc updates from Michael Ellerman:
- Remove our now never-true definitions for pgd_huge() and p4d_leaf().
- Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
- Add support for syscall wrappers.
- Add support for KFENCE on 64-bit.
- Update 64-bit HV KVM to use the new guest state entry/exit accounting
API.
- Support execute-only memory when using the Radix MMU (P9 or later).
- Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
- Updates to our linker script to move more data into read-only
sections.
- Allow the VDSO to be randomised on 32-bit.
- Many other small features and fixes.
Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas,
Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin
Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent
Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali
Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool,
Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng
Yongjun.
* tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
KVM: PPC: Book3S HV: Fix stack frame regs marker
powerpc: Don't add __powerpc_ prefix to syscall entry points
powerpc/64s/interrupt: Fix stack frame regs marker
powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
powerpc/pseries: Add firmware details to the hardware description
powerpc/powernv: Add opal details to the hardware description
powerpc: Add device-tree model to the hardware description
powerpc/64: Add logical PVR to the hardware description
powerpc: Add PVR & CPU name to hardware description
powerpc: Add hardware description string
powerpc/configs: Enable PPC_UV in powernv_defconfig
powerpc/configs: Update config files for removed/renamed symbols
powerpc/mm: Fix UBSAN warning reported on hugetlb
powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()
powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
powerpc: Drops STABS_DEBUG from linker scripts
powerpc/64s: Remove lost/old comment
powerpc/64s: Remove old STAB comment
powerpc: remove orphan systbl_chk.sh
...
Pull kvm updates from Paolo Bonzini:
"The first batch of KVM patches, mostly covering x86.
ARM:
- Account stage2 page table allocations in memory stats
x86:
- Account EPT/NPT arm64 page table allocations in memory stats
- Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR
accesses
- Drop eVMCS controls filtering for KVM on Hyper-V, all known
versions of Hyper-V now support eVMCS fields associated with
features that are enumerated to the guest
- Use KVM's sanitized VMCS config as the basis for the values of
nested VMX capabilities MSRs
- A myriad event/exception fixes and cleanups. Most notably, pending
exceptions morph into VM-Exits earlier, as soon as the exception is
queued, instead of waiting until the next vmentry. This fixed a
longstanding issue where the exceptions would incorrecly become
double-faults instead of triggering a vmexit; the common case of
page-fault vmexits had a special workaround, but now it's fixed for
good
- A handful of fixes for memory leaks in error paths
- Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow
- Never write to memory from non-sleepable kvm_vcpu_check_block()
- Selftests refinements and cleanups
- Misc typo cleanups
Generic:
- remove KVM_REQ_UNHALT"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
KVM: remove KVM_REQ_UNHALT
KVM: mips, x86: do not rely on KVM_REQ_UNHALT
KVM: x86: never write to memory from kvm_vcpu_check_block()
KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events
KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending
KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter
KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set
KVM: x86: lapic does not have to process INIT if it is blocked
KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific
KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed
KVM: nVMX: Make an event request when pending an MTF nested VM-Exit
KVM: x86: make vendor code check for all nested events
mailmap: Update Oliver's email address
KVM: x86: Allow force_emulation_prefix to be written without a reload
KVM: selftests: Add an x86-only test to verify nested exception queueing
KVM: selftests: Use uapi header to get VMX and SVM exit reasons/codes
KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events()
KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behavior
KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions
KVM: x86: Morph pending exceptions to pending VM-Exits at queue time
...
Some momory will be left in offline state when calling
offline_memory_expect_fail() failed. Restore it before exit.
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add checking for online_memory_expect_success()/
offline_memory_expect_success()/offline_memory_expect_fail(), or
the test would exit 0 although the functions return 1.
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>