linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-15 10:42:40 -04:00

Author	SHA1	Message	Date
Sean Christopherson	9b56532b8a	KVM: selftests: Adjust number of files rlimit for all "standard" VMs Move the max vCPUs test's RLIMIT_NOFILE adjustments to common code, and use the new helper to adjust the resource limit for non-barebones VMs by default. x86's recalc_apic_map_test creates 512 vCPUs, and a future change will open the binary stats fd for all vCPUs, which will put the recalc APIC test above some distros' default limit of 1024. Link: https://lore.kernel.org/r/20250111005049.1247555-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-14 07:02:12 -08:00
Sean Christopherson	ea7179f995	KVM: selftests: Get VM's binary stats FD when opening VM Get and cache a VM's binary stats FD when the VM is opened, as opposed to waiting until the stats are first used. Opening the stats FD outside of __vm_get_stat() will allow converting it to a scope-agnostic helper. Note, this doesn't interfere with kvm_binary_stats_test's testcase that verifies a stats FD can be used after its own VM's FD is closed, as the cached FD is also closed during kvm_vm_free(). Link: https://lore.kernel.org/r/20250111005049.1247555-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-14 07:02:11 -08:00
Sean Christopherson	e65faf71bd	KVM: selftests: Add struct and helpers to wrap binary stats cache Add a struct and helpers to manage the binary stats cache, which is currently used only for VM-scoped stats. This will allow expanding the selftests infrastructure to provide support for vCPU-scoped binary stats, which, except for the ioctl to get the stats FD are identical to VM-scoped stats. Defer converting __vm_get_stat() to a scope-agnostic helper to a future patch, as getting the stats FD from KVM needs to be moved elsewhere before it can be made completely scope-agnostic. Link: https://lore.kernel.org/r/20250111005049.1247555-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-14 07:02:09 -08:00
Sean Christopherson	b0c3f5df92	KVM: selftests: Macrofy vm_get_stat() to auto-generate stat name string Turn vm_get_stat() into a macro that generates a string for the stat name, as opposed to taking a string. This will allow hardening stat usage in the future to generate errors on unknown stats at compile time. No functional change intended. Link: https://lore.kernel.org/r/20250111005049.1247555-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-14 07:01:55 -08:00
Sean Christopherson	eead13d493	KVM: selftests: Assert that __vm_get_stat() actually finds a stat Fail the test if it attempts to read a stat that doesn't exist, e.g. due to a typo (hooray, strings), or because the test tried to get a stat for the wrong scope. As is, there's no indiciation of failure and @data is left untouched, e.g. holds '0' or random stack data in most cases. Fixes: `8448ec5993` ("KVM: selftests: Add NX huge pages test") Link: https://lore.kernel.org/r/20250111005049.1247555-4-seanjc@google.com [sean: fixup spelling mistake, courtesy of Colin Ian King] Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-14 07:01:27 -08:00
Sean Christopherson	f7f232a01f	KVM: selftests: Close VM's binary stats FD when releasing VM Close/free a VM's binary stats cache when the VM is released, not when the VM is fully freed. When a VM is re-created, e.g. for state save/restore tests, the stats FD and descriptor points at the old, defunct VM. The FD is still valid, in that the underlying stats file won't be freed until the FD is closed, but reading stats will always pull information from the old VM. Note, this is a benign bug in the current code base as none of the tests that recreate VMs use binary stats. Fixes: `83f6e109f5` ("KVM: selftests: Cache binary stats metadata for duration of test") Link: https://lore.kernel.org/r/20250111005049.1247555-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:02:42 -08:00
Sean Christopherson	fd546aba19	KVM: selftests: Fix mostly theoretical leak of VM's binary stats FD When allocating and freeing a VM's cached binary stats info, check for a NULL descriptor, not a '0' file descriptor, as '0' is a legal FD. E.g. in the unlikely scenario the kernel installs the stats FD at entry '0', selftests would reallocate on the next __vm_get_stat() and/or fail to free the stats in kvm_vm_free(). Fixes: `83f6e109f5` ("KVM: selftests: Cache binary stats metadata for duration of test") Link: https://lore.kernel.org/r/20250111005049.1247555-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:02:42 -08:00
Sean Christopherson	dae7d81e8d	KVM: selftests: Allow running a single iteration of dirty_log_test Now that dirty_log_test doesn't require running multiple iterations to verify dirty pages, and actually runs the requested number of iterations, drop the requirement that the test run at least "3" (which was really "2" at the time the test was written) iterations. Link: https://lore.kernel.org/r/20250111003004.1235645-21-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:56 -08:00
Sean Christopherson	7f225650e0	KVM: selftests: Fix an off-by-one in the number of dirty_log_test iterations Actually run all requested iterations, instead of iterations-1 (the count starts at '1' due to the need to avoid '0' as an in-memory value for a dirty page). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:56 -08:00
Sean Christopherson	2680dcfb34	KVM: selftests: Set per-iteration variables at the start of each iteration Set the per-iteration variables at the start of each iteration instead of setting them before the loop, and at the end of each iteration. To ensure the vCPU doesn't race ahead before the first iteration, simply have the vCPU worker want for sem_vcpu_cont, which conveniently avoids the need to special case posting sem_vcpu_cont from the loop. Link: https://lore.kernel.org/r/20250111003004.1235645-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:56 -08:00
Sean Christopherson	2020d3b77a	KVM: selftests: Tighten checks around prev iter's last dirty page in ring Now that each iteration collects all dirty entries and ensures the guest completes at least one write, tighten the exemptions for the last dirty page of the previous iteration. Specifically, the only legal value (other than the current iteration) is N-1. Unlike the last page for the current iteration, the in-progress write from the previous iteration is guaranteed to have completed, otherwise the test would have hung. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:56 -08:00
Sean Christopherson	73eaa2aa14	KVM: selftests: Ensure guest writes min number of pages in dirty_log_test Ensure the vCPU fully completes at least one write in each dirty_log_test iteration, as failure to dirty any pages complicates verification and forces the test to be overly conservative about possible values. E.g. verification needs to allow the last dirty page from a previous iteration to have any value, because the vCPU could get stuck for multiple iterations, which is unlikely but can happen in heavily overloaded and/or nested virtualization setups. Somewhat arbitrarily set the minimum to 0x100/256; high enough to be interesting, but not so high as to lead to pointlessly long runtimes. Opportunistically report the number of writes per iteration for debug purposes, and so that a human can sanity check the test. Due to each write targeting a random page, the number of dirty pages will likely be lower than the number of total writes, but it shouldn't be absurdly lower (which would suggest the pRNG is broken) Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:56 -08:00
Sean Christopherson	485e27ed20	KVM: sefltests: Verify value of dirty_log_test last page isn't bogus Add a sanity check that a completely garbage value wasn't written to the last dirty page in the ring, e.g. that it doesn't contain the next iteration's value. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:56 -08:00
Sean Christopherson	d0bd72cb91	KVM: selftests: Collect all dirty entries in each dirty_log_test iteration Collect all dirty entries during each iteration of dirty_log_test by doing a final collection after the vCPU has been stopped. To deal with KVM's destructive approach to getting the dirty bitmaps, use a second bitmap for the post-stop collection. Collecting all entries that were dirtied during an iteration simplifies the verification logic and improves test coverage. - If a page is written during iteration X, but not seen as dirty until X+1, the test can get a false pass if the page is also written during X+1. - If a dirty page used a stale value from a previous iteration, the test would grant a false pass. - If a missed dirty log occurs in the last iteration, the test would fail to detect the issue. E.g. modifying mark_page_dirty_in_slot() to dirty an unwritten gfn: if (memslot && kvm_slot_dirty_track_enabled(memslot)) { unsigned long rel_gfn = gfn - memslot->base_gfn; u32 slot = (memslot->as_id << 16) \| memslot->id; if (!vcpu->extra_dirty && gfn_to_memslot(kvm, gfn + 1) == memslot) { vcpu->extra_dirty = true; mark_page_dirty_in_slot(kvm, memslot, gfn + 1); } if (kvm->dirty_ring_size && vcpu) kvm_dirty_ring_push(vcpu, slot, rel_gfn); else if (memslot->dirty_bitmap) set_bit_le(rel_gfn, memslot->dirty_bitmap); } isn't detected with the current approach, even with an interval of 1ms (when running nested in a VM; bare metal would be even less likely to detect the bug due to the vCPU being able to dirty more memory). Whereas collecting all dirty entries consistently detects failures with an interval of 700ms or more (the longer interval means a higher probability of an actual write to the prematurely-dirtied page). Link: https://lore.kernel.org/r/20250111003004.1235645-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:56 -08:00
Sean Christopherson	24b9a2a613	KVM: selftests: Print (previous) last_page on dirty page value mismatch Print out the last dirty pages from the current and previous iteration on verification failures. In many cases, bugs (especially test bugs) occur on the edges, i.e. on or near the last pages, and being able to correlate failures with the last pages can aid in debug. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:56 -08:00
Sean Christopherson	c616f36a10	KVM: selftests: Use continue to handle all "pass" scenarios in dirty_log_test When verifying pages in dirty_log_test, immediately continue on all "pass" scenarios to make the logic consistent in how it handles pass vs. fail. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:55 -08:00
Sean Christopherson	9a91f65424	KVM: selftests: Post to sem_vcpu_stop if and only if vcpu_stop is true When running dirty_log_test using the dirty ring, post to sem_vcpu_stop only when the main thread has explicitly requested that the vCPU stop. Synchronizing the vCPU and main thread whenever the dirty ring happens to be full is unnecessary, as KVM's ABI is to actively prevent the vCPU from running until the ring is no longer full. I.e. attempting to run the vCPU will simply result in KVM_EXIT_DIRTY_RING_FULL without ever entering the guest. And if KVM doesn't exit, e.g. let's the vCPU dirty more pages, then that's a KVM bug worth finding. Posting to sem_vcpu_stop on ring full also makes it difficult to get the test logic right, e.g. it's easy to let the vCPU keep running when it shouldn't, as a ring full can essentially happen at any given time. Opportunistically rework the handling of dirty_ring_vcpu_ring_full to leave it set for the remainder of the iteration in order to simplify the surrounding logic. Link: https://lore.kernel.org/r/20250111003004.1235645-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:55 -08:00
Sean Christopherson	0a818b3541	KVM: selftests: Keep dirty_log_test vCPU in guest until it needs to stop In the dirty_log_test guest code, exit to userspace only when the vCPU is explicitly told to stop. Periodically exiting just to check if a flag has been set is unnecessary, weirdly complex, and wastes time handling exits that could be used to dirty memory. Opportunistically convert 'i' to a uint64_t to guard against the unlikely scenario that guest_num_pages exceeds the storage of an int. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:55 -08:00
Sean Christopherson	f3629c0ef1	KVM: selftests: Honor "stop" request in dirty ring test Now that the vCPU doesn't dirty every page on the first iteration for architectures that support the dirty ring, honor vcpu_stop in the dirty ring's vCPU worker, i.e. stop when the main thread says "stop". This will allow plumbing vcpu_stop into the guest so that the vCPU doesn't need to periodically exit to userspace just to see if it should stop. Add a comment explaining that marking all pages as dirty is problematic for the dirty ring, as it results in the guest getting stuck on "ring full". This could be addressed by adding a GUEST_SYNC() in that initial loop, but it's not clear how that would interact with s390's behavior. Link: https://lore.kernel.org/r/20250111003004.1235645-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:55 -08:00
Maxim Levitsky	deb8b8400e	KVM: selftests: Limit dirty_log_test's s390x workaround to s390x s390 specific workaround causes the dirty-log mode of the test to dirty all guest memory on the first iteration, which is very slow when the test is run in a nested VM. Limit this workaround to s390x. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:55 -08:00
Sean Christopherson	9b1feec83e	KVM: selftests: Continuously reap dirty ring while vCPU is running Continue collecting entries from the dirty ring for the entire time the vCPU is running. Collecting exactly once all but guarantees the vCPU will encounter a "ring full" event and stop. While testing ring full is interesting, stopping and doing nothing is not, especially for larger intervals as the test effectively does nothing for a much longer time. To balance continuous collection with letting the guest make forward progress, chunk the interval waiting into 1ms loops (which also makes the math dead simple). To maintain coverage for "ring full", collect entries on subsequent iterations if and only if the ring has been filled at least once. I.e. let the ring fill up (if the interval allows), but after that contiuously empty it so that the vCPU can keep running. Opportunistically drop unnecessary zero-initialization of "count". Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:55 -08:00
Sean Christopherson	f2228aa083	KVM: selftests: Read per-page value into local var when verifying dirty_log_test Cache the page's value during verification in a local variable, re-reading from the pointer is ugly and error prone, e.g. allows for bugs like checking the pointer itself instead of the value. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:55 -08:00
Sean Christopherson	af2d85d34d	KVM: selftests: Precisely track number of dirty/clear pages for each iteration Track and print the number of dirty and clear pages for each iteration. This provides parity between all log modes, and will allow collecting the dirty ring multiple times per iteration without spamming the console. Opportunistically drop the "Dirtied N pages" print, which is redundant and wrong. For the dirty ring testcase, the vCPU isn't guaranteed to complete a loop. And when the vCPU does complete a loot, there are no guarantees that it has dirtied that many pages; because the writes are to random address, the vCPU may have written the same page over and over, i.e. only dirtied one page. While the number of writes performed by the vCPU is also interesting, e.g. the pr_info() could be tweaked to use different verbiage, pages_count doesn't correctly track the number of writes either (because loops aren't guaranteed to a complete). Delete the print for now, as a future patch will precisely track the number of writes, at which point the verification phase can report the number of writes performed by each iteration. Link: https://lore.kernel.org/r/20250111003004.1235645-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:55 -08:00
Sean Christopherson	1230907864	KVM: selftests: Drop stale srandom() initialization from dirty_log_test Drop an srandom() initialization that was leftover from the conversion to use selftests' guest_random_xxx() APIs. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:55 -08:00
Sean Christopherson	ff0efc77bc	KVM: selftests: Drop signal/kick from dirty ring testcase Drop the signal/kick from dirty_log_test's dirty ring handling, as kicking the vCPU adds marginal value, at the cost of adding significant complexity to the test. Asynchronously interrupting the vCPU isn't novel; unless the kernel is fully tickless, the vCPU will be interrupted by IRQs for any decently large interval. And exiting to userspace mode in the middle of a sequence isn't novel either, as the vCPU will do so every time the ring becomes full. Link: https://lore.kernel.org/r/20250111003004.1235645-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:55 -08:00
Sean Christopherson	67428ee7b7	KVM: selftests: Sync dirty_log_test iteration to guest before resuming Sync the new iteration to the guest prior to restarting the vCPU, otherwise it's possible for the vCPU to dirty memory for the next iteration using the current iteration's value. Note, because the guest can be interrupted between the vCPU's load of the iteration and its write to memory, it's still possible for the guest to store the previous iteration to memory as the previous iteration may be cached in a CPU register (which the test accounts for). Note #2, the test's current approach of collecting dirty entries before stopping the vCPU also results dirty memory having the previous iteration. E.g. if page is dirtied in the previous iteration, but not the current iteration, the verification phase will observe the previous iteration's value in memory. That wart will be remedied in the near future, at which point synchronizing the iteration before restarting the vCPU will guarantee the only way for verification to observe stale iterations is due to the CPU register caching case, or due to a dirty entry being collected before the store retires. Link: https://lore.kernel.org/r/20250111003004.1235645-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:54 -08:00
Maxim Levitsky	fe49f80052	KVM: selftests: Support multiple write retires in dirty_log_test If dirty_log_test is run nested, it is possible for entries in the emulated PML log to appear before the actual memory write is committed to the RAM, due to the way KVM retries memory writes as a response to a MMU fault. In addition to that in some very rare cases retry can happen more than once, which will lead to the test failure because once the write is finally committed it may have a very outdated iteration value. Detect and avoid this case. Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:54 -08:00
Sean Christopherson	89ea56a404	KVM: selftests: Actually emit forced emulation prefix for kvm_asm_safe_fep() Use KVM_ASM_SAFE_FEP, not simply KVM_ASM_SAFE, for kvm_asm_safe_fep(), as the non-FEP version doesn't force emulation (stating the obvious). Note, there are currently no users of kvm_asm_safe_fep(). Fixes: `ab3b6a7de8` ("KVM: selftests: Add a forced emulation variation of KVM_ASM_SAFE()") Link: https://lore.kernel.org/r/20250130163135.270770-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-02-12 09:00:39 -08:00
Linus Torvalds	954a209f43	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "ARM: - Correctly clean the BSS to the PoC before allowing EL2 to access it on nVHE/hVHE/protected configurations - Propagate ownership of debug registers in protected mode after the rework that landed in 6.14-rc1 - Stop pretending that we can run the protected mode without a GICv3 being present on the host - Fix a use-after-free situation that can occur if a vcpu fails to initialise the NV shadow S2 MMU contexts - Always evaluate the need to arm a background timer for fully emulated guest timers - Fix the emulation of EL1 timers in the absence of FEAT_ECV - Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0 s390: - move some of the guest page table (gmap) logic into KVM itself, inching towards the final goal of completely removing gmap from the non-kvm memory management code. As an initial set of cleanups, move some code from mm/gmap into kvm and start using __kvm_faultin_pfn() to fault-in pages as needed; but especially stop abusing page->index and page->lru to aid in the pgdesc conversion. x86: - Add missing check in the fix to defer starting the huge page recovery vhost_task - SRSO_USER_KERNEL_NO does not need SYNTHESIZED_F" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (31 commits) KVM: x86/mmu: Ensure NX huge page recovery thread is alive before waking KVM: remove kvm_arch_post_init_vm KVM: selftests: Fix spelling mistake "initally" -> "initially" kvm: x86: SRSO_USER_KERNEL_NO is not synthesized KVM: arm64: timer: Don't adjust the EL2 virtual timer offset KVM: arm64: timer: Correctly handle EL1 timer emulation when !FEAT_ECV KVM: arm64: timer: Always evaluate the need for a soft timer KVM: arm64: Fix nested S2 MMU structures reallocation KVM: arm64: Fail protected mode init if no vgic hardware is present KVM: arm64: Flush/sync debug state in protected mode KVM: s390: selftests: Streamline uc_skey test to issue iske after sske KVM: s390: remove the last user of page->index KVM: s390: move PGSTE softbits KVM: s390: remove useless page->index usage KVM: s390: move gmap_shadow_pgt_lookup() into kvm KVM: s390: stop using lists to keep track of used dat tables KVM: s390: stop using page->index for non-shadow gmaps KVM: s390: move some gmap shadowing functions away from mm/gmap.c KVM: s390: get rid of gmap_translate() KVM: s390: get rid of gmap_fault() ...	2025-02-09 09:41:38 -08:00
Linus Torvalds	f4a45f14cf	Merge tag 'seccomp-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp fix from Kees Cook: "This is really a work-around for x86_64 having grown a syscall to implement uretprobe, which has caused problems since v6.11. This may change in the future, but for now, this fixes the unintended seccomp filtering when uretprobe switched away from traps, and does so with something that should be easy to backport. - Allow uretprobe on x86_64 to avoid behavioral complications (Eyal Birger)" * tag 'seccomp-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests/seccomp: validate uretprobe syscall passes through seccomp seccomp: passthrough uretprobe systemcall without filtering	2025-02-08 14:04:21 -08:00
Linus Torvalds	8c67da5bc1	Merge tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix fsnotify FMODE_NONOTIFY* handling. This also disables fsnotify on all pseudo files by default apart from very select exceptions. This carries a regression risk so we need to watch out and adapt accordingly. However, it is overall a significant improvement over the current status quo where every rando file can get fsnotify enabled. - Cleanup and simplify lockref_init() after recent lockref changes. - Fix vboxfs build with gcc-15. - Add an assert into inode_set_cached_link() to catch corrupt links. - Allow users to also use an empty string check to detect whether a given mount option string was empty or not. - Fix how security options were appended to statmount()'s ->mnt_opt field. - Fix statmount() selftests to always check the returned mask. - Fix uninitialized value in vfs_statx_path(). - Fix pidfs_ioctl() sanity checks to guard against ioctl() overloading and preserve extensibility. * tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: sanity check the length passed to inode_set_cached_link() pidfs: improve ioctl handling fsnotify: disable pre-content and permission events by default selftests: always check mask returned by statmount(2) fsnotify: disable notification by default for all pseudo files fs: fix adding security options to statmount.mnt_opt fsnotify: use accessor to set FMODE_NONOTIFY_* lockref: remove count argument of lockref_init gfs2: switch to lockref_init(..., 1) gfs2: use lockref_init for gl_lockref statmount: let unset strings be empty vboxsf: fix building with GCC 15 fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()	2025-02-07 09:22:31 -08:00
Miklos Szeredi	2cc02059fb	selftests: always check mask returned by statmount(2) STATMOUNT_MNT_OPTS can actually be missing if there are no options. This is a change of behavior since `75ead69a71` ("fs: don't let statmount return empty strings"). The other checks shouldn't actually trigger, but add them for correctness and for easier debugging if the test fails. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250129160641.35485-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-07 10:27:26 +01:00
Eyal Birger	c2debdb854	selftests/seccomp: validate uretprobe syscall passes through seccomp The uretprobe syscall is implemented as a performance enhancement on x86_64 by having the kernel inject a call to it on function exit; User programs cannot call this system call explicitly. As such, this syscall is considered a kernel implementation detail and should not be filtered by seccomp. Enhance the seccomp bpf test suite to check that uretprobes can be attached to processes without the killing the process regardless of seccomp policy. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20250202162921.335813-3-eyal.birger@gmail.com [kees: Skip archs without __NR_uretprobe] Signed-off-by: Kees Cook <kees@kernel.org>	2025-02-06 13:19:14 -08:00
Linus Torvalds	3cf0a98fea	Merge tag 'net-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Interestingly the recent kmemleak improvements allowed our CI to catch a couple of percpu leaks addressed here. We (mostly Jakub, to be accurate) are working to increase review coverage over the net code-base tweaking the MAINTAINER entries. Current release - regressions: - core: harmonize tstats and dstats - ipv6: fix dst refleaks in rpl, seg6 and ioam6 lwtunnels - eth: tun: revert fix group permission check - eth: stmmac: revert "specify hardware capability value when FIFO size isn't specified" Previous releases - regressions: - udp: gso: do not drop small packets when PMTU reduces - rxrpc: fix race in call state changing vs recvmsg() - eth: ice: fix Rx data path for heavy 9k MTU traffic - eth: vmxnet3: fix tx queue race condition with XDP Previous releases - always broken: - sched: pfifo_tail_enqueue: drop new packet when sch->limit == 0 - ethtool: ntuple: fix rss + ring_cookie check - rxrpc: fix the rxrpc_connection attend queue handling Misc: - recognize Kuniyuki Iwashima as a maintainer" * tag 'net-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) Revert "net: stmmac: Specify hardware capability value when FIFO size isn't specified" MAINTAINERS: add a sample ethtool section entry MAINTAINERS: add entry for ethtool rxrpc: Fix race in call state changing vs recvmsg() rxrpc: Fix call state set to not include the SERVER_SECURING state net: sched: Fix truncation of offloaded action statistics tun: revert fix group permission check selftests/tc-testing: Add a test case for qdisc_tree_reduce_backlog() netem: Update sch->q.qlen before qdisc_tree_reduce_backlog() selftests/tc-testing: Add a test case for pfifo_head_drop qdisc when limit==0 pfifo_tail_enqueue: Drop new packet when sch->limit == 0 selftests: mptcp: connect: -f: no reconnect net: rose: lock the socket in rose_bind() net: atlantic: fix warning during hot unplug rxrpc: Fix the rxrpc_connection attend queue handling net: harmonize tstats and dstats selftests: drv-net: rss_ctx: don't fail reconfigure test if queue offset not supported selftests: drv-net: rss_ctx: add missing cleanup in queue reconfigure ethtool: ntuple: fix rss + ring_cookie check ethtool: rss: fix hiding unsupported fields in dumps ...	2025-02-06 09:14:54 -08:00
Cong Wang	91aadc16ee	selftests/tc-testing: Add a test case for qdisc_tree_reduce_backlog() Integrate the test case provided by Mingi Cho into TDC. All test results: 1..4 ok 1 ca5e - Check class delete notification for ffff: ok 2 e4b7 - Check class delete notification for root ffff: ok 3 33a9 - Check ingress is not searchable on backlog update ok 4 a4b9 - Test class qlen notification Cc: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-5-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-05 18:15:00 -08:00
Quang Le	3fe5648d1d	selftests/tc-testing: Add a test case for pfifo_head_drop qdisc when limit==0 When limit == 0, pfifo_tail_enqueue() must drop new packet and increase dropped packets count of the qdisc. All test results: 1..16 ok 1 a519 - Add bfifo qdisc with system default parameters on egress ok 2 585c - Add pfifo qdisc with system default parameters on egress ok 3 a86e - Add bfifo qdisc with system default parameters on egress with handle of maximum value ok 4 9ac8 - Add bfifo qdisc on egress with queue size of 3000 bytes ok 5 f4e6 - Add pfifo qdisc on egress with queue size of 3000 packets ok 6 b1b1 - Add bfifo qdisc with system default parameters on egress with invalid handle exceeding maximum value ok 7 8d5e - Add bfifo qdisc on egress with unsupported argument ok 8 7787 - Add pfifo qdisc on egress with unsupported argument ok 9 c4b6 - Replace bfifo qdisc on egress with new queue size ok 10 3df6 - Replace pfifo qdisc on egress with new queue size ok 11 7a67 - Add bfifo qdisc on egress with queue size in invalid format ok 12 1298 - Add duplicate bfifo qdisc on egress ok 13 45a0 - Delete nonexistent bfifo qdisc ok 14 972b - Add prio qdisc on egress with invalid format for handles ok 15 4d39 - Delete bfifo qdisc twice ok 16 d774 - Check pfifo_head_drop qdisc enqueue behaviour when limit == 0 Signed-off-by: Quang Le <quanglex97@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-05 18:14:46 -08:00
Matthieu Baerts (NGI0)	5368a67307	selftests: mptcp: connect: -f: no reconnect The '-f' parameter is there to force the kernel to emit MPTCP FASTCLOSE by closing the connection with unread bytes in the receive queue. The xdisconnect() helper was used to stop the connection, but it does more than that: it will shut it down, then wait before reconnecting to the same address. This causes the mptcp_join's "fastclose test" to fail all the time. This failure is due to a recent change, with commit `218cc16632` ("selftests: mptcp: avoid spurious errors on disconnect"), but that went unnoticed because the test is currently ignored. The recent modification only shown an existing issue: xdisconnect() doesn't need to be used here, only the shutdown() part is needed. Fixes: `6bf41020b7` ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250204-net-mptcp-sft-conn-f-v1-1-6b470c72fffa@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-05 17:54:32 -08:00
Linus Torvalds	d009de7d54	Merge tag 'livepatching-for-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching fix from Petr Mladek: - Fix livepatching selftests for util-linux-2.40.x * tag 'livepatching-for-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: selftests: livepatch: handle PRINTK_CALLER in check_result()	2025-02-04 09:52:01 -08:00
Colin Ian King	203a53029a	KVM: selftests: Fix spelling mistake "initally" -> "initially" There is a spelling mistake in a literal string and in the function test_get_inital_dirty. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Message-ID: <20250204105647.367743-1-colin.i.king@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-02-04 11:15:24 -05:00
Paolo Bonzini	35441cdd50	Merge tag 'kvm-s390-next-6.14-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - some selftest fixes - move some kvm-related functions from mm into kvm - remove all usage of page->index and page->lru from kvm - fixes and cleanups for vsie	2025-02-04 11:14:21 -05:00
Jakub Kicinski	c3da585509	selftests: drv-net: rss_ctx: don't fail reconfigure test if queue offset not supported Vast majority of drivers does not support queue offset. Simply return if the rss context + queue ntuple fails. Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250201013040.725123-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-03 18:39:23 -08:00
Jakub Kicinski	de379dfd9a	selftests: drv-net: rss_ctx: add missing cleanup in queue reconfigure Commit under Fixes adds ntuple rules but never deletes them. Fixes: `29a4bc1fe9` ("selftest: extend test_rss_context_queue_reconfigure for action addition") Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250201013040.725123-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-03 18:39:23 -08:00
Yan Zhai	235174b2be	udp: gso: do not drop small packets when PMTU reduces Commit `4094871db1` ("udp: only do GSO if # of segs > 1") avoided GSO for small packets. But the kernel currently dismisses GSO requests only after checking MTU/PMTU on gso_size. This means any packets, regardless of their payload sizes, could be dropped when PMTU becomes smaller than requested gso_size. We encountered this issue in production and it caused a reliability problem that new QUIC connection cannot be established before PMTU cache expired, while non GSO sockets still worked fine at the same time. Ideally, do not check any GSO related constraints when payload size is smaller than requested gso_size, and return EMSGSIZE instead of EINVAL on MTU/PMTU check failure to be more specific on the error cause. Fixes: `4094871db1` ("udp: only do GSO if # of segs > 1") Signed-off-by: Yan Zhai <yan@cloudflare.com> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-02-03 10:13:27 +00:00
Linus Torvalds	bdd4f86c97	Merge tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull AT_EXECVE_CHECK selftest fix from Kees Cook: "Fixes the AT_EXECVE_CHECK selftests which didn't run on old versions of glibc" * tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests: Handle old glibc without execveat(2)	2025-01-31 17:12:31 -08:00
Linus Torvalds	1b5f3c51fb	Merge tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - The PH1520 pinctrl and dwmac drivers are enabeled in defconfig - A redundant AQRL barrier has been removed from the futex cmpxchg implementation - Support for the T-Head vector extensions, which includes exposing these extensions to userspace on systems that implement them - Some more page table information is now printed on die() and systems that cause PA overflows * tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: add a warning when physical memory address overflows riscv/mm/fault: add show_pte() before die() riscv: Add ghostwrite vulnerability selftests: riscv: Support xtheadvector in vector tests selftests: riscv: Fix vector tests riscv: hwprobe: Document thead vendor extensions and xtheadvector extension riscv: hwprobe: Add thead vendor extension probing riscv: vector: Support xtheadvector save/restore riscv: Add xtheadvector instruction definitions riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT RISC-V: define the elements of the VCSR vector CSR riscv: vector: Use vlenb from DT for thead riscv: Add thead and xtheadvector as a vendor extension riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree dt-bindings: cpus: add a thead vlen register length property dt-bindings: riscv: Add xtheadvector ISA extension description RISC-V: Mark riscv_v_init() as __init riscv: defconfig: drop RT_GROUP_SCHED=y riscv/futex: Optimize atomic cmpxchg riscv: defconfig: enable pinctrl and dwmac support for TH1520	2025-01-31 15:13:25 -08:00
Linus Torvalds	c545cd3276	Merge tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: - The biggest changes are the TLB flushing scalability optimizations, to update the mm_cpumask lazily and related changes. This feature has both a track record and a continued risk of performance regressions, so it was already delayed by a cycle - but it's all 100% perfect now™ (Rik van Riel) - Also miscellaneous fixes and cleanups. (Gautam Somani, Kirill Shutemov, Sebastian Andrzej Siewior) * tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Remove unnecessary include of <linux/extable.h> x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state() x86/mm/selftests: Fix typo in lam.c x86/mm/tlb: Only trim the mm_cpumask once a second x86/mm/tlb: Also remove local CPU from mm_cpumask if stale x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU x86/mm/tlb: Update mm_cpumask lazily	2025-01-31 10:39:07 -08:00
Christoph Schlameuss	3223906677	KVM: s390: selftests: Streamline uc_skey test to issue iske after sske In some rare situations a non default storage key is already set on the memory used by the test. Within normal VMs the key is reset / zapped when the memory is added to the VM. This is not the case for ucontrol VMs. With the initial iske check removed this test case can work in all situations. The function of the iske instruction is still validated by the remaining code. Fixes: `0185fbc6a2` ("KVM: s390: selftests: Add uc_skey VM test case") Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Link: https://lore.kernel.org/r/20250128131803.1047388-1-schlameuss@linux.ibm.com Message-ID: <20250128131803.1047388-1-schlameuss@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2025-01-31 12:03:53 +01:00
Claudio Imbrenda	63e7151989	KVM: s390: selftests: fix ucontrol memory region test With the latest patch, attempting to create a memslot from userspace will result in an EEXIST error for UCONTROL VMs, instead of EINVAL, since the new memslot will collide with the internal memslot. There is no simple way to bring back the previous behaviour. This is not a problem, but the test needs to be fixed accordingly. Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Link: https://lore.kernel.org/r/20250123144627.312456-5-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-5-imbrenda@linux.ibm.com>	2025-01-31 12:03:52 +01:00
Linus Torvalds	c2933b2bef	Merge tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from IPSec, netfilter and Bluetooth. Nothing really stands out, but as usual there's a slight concentration of fixes for issues added in the last two weeks before the merge window, and driver bugs from 6.13 which tend to get discovered upon wider distribution. Current release - regressions: - net: revert RTNL changes in unregister_netdevice_many_notify() - Bluetooth: fix possible infinite recursion of btusb_reset - eth: adjust locking in some old drivers which protect their state with spinlocks to avoid sleeping in atomic; core protects netdev state with a mutex now Previous releases - regressions: - eth: - mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node() - bgmac: reduce max frame size to support just 1500 bytes; the jumbo frame support would previously cause OOB writes, but now fails outright - mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted, avoid false detection of MPTCP blackholing Previous releases - always broken: - mptcp: handle fastopen disconnect correctly - xfrm: - make sure skb->sk is a full sock before accessing its fields - fix taking a lock with preempt disabled for RT kernels - usb: ipheth: improve safety of packet metadata parsing; prevent potential OOB accesses - eth: renesas: fix missing rtnl lock in suspend/resume path" * tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits) MAINTAINERS: add Neal to TCP maintainers net: revert RTNL changes in unregister_netdevice_many_notify() net: hsr: fix fill_frame_info() regression vs VLAN packets doc: mptcp: sysctl: blackhole_timeout is per-netns mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted netfilter: nf_tables: reject mismatching sum of field_len with set key length net: sh_eth: Fix missing rtnl lock in suspend/resume path net: ravb: Fix missing rtnl lock in suspend/resume path selftests/net: Add test for loading devbound XDP program in generic mode net: xdp: Disallow attaching device-bound programs in generic mode tcp: correct handling of extreme memory squeeze bgmac: reduce max frame size to support just MTU 1500 vsock/test: Add test for connect() retries vsock/test: Add test for UAF due to socket unbinding vsock/test: Introduce vsock_connect_fd() vsock/test: Introduce vsock_bind() vsock: Allow retrying on connect() failure vsock: Keep the binding until socket destruction Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection Bluetooth: btnxpuart: Fix glitches seen in dual A2DP streaming ...	2025-01-30 12:24:20 -08:00
Linus Torvalds	90cb220062	Merge tag 'gpio-fixes-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - update gpio-sim selftests to not fail now that we no longer allow rmdir() on configfs entries of active devices - remove leftover code from gpio-mxc * tag 'gpio-fixes-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: selftests: gpio: gpio-sim: Fix missing chip disablements gpio: mxc: remove dead code after switch to DT-only	2025-01-30 10:19:30 -08:00

1 2 3 4 5 ...

18003 Commits