linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-05 14:40:12 -04:00

Author	SHA1	Message	Date
Linus Torvalds	ec25bd8d98	Merge tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs Pull bcachefs repair code from Kent Overstreet: "A couple more small fixes, and new repair code. We can now automatically recover from arbitrary corrupted interior btree nodes by scanning, and we can reconstruct metadata as needed to bring a filesystem back into a working, consistent, read-write state and preserve access to whatevver wasn't corrupted. Meaning - you can blow away all metadata except for extents and dirents leaf nodes, and repair will reconstruct everything else and give you your data, and under the correct paths. If inodes are missing i_size will be slightly off and permissions/ownership/timestamps will be gone, and we do still need the snapshots btree if snapshots were in use - in the future we'll be able to guess the snapshot tree structure in some situations. IOW - aside from shaking out remaining bugs (fuzz testing is still coming), repair code should be complete and if repair ever doesn't work that's the highest priority bug that I want to know about immediately. This patchset was kindly tested by a user from India who accidentally wiped one drive out of a three drive filesystem with no replication on the family computer - it took a couple weeks but we got everything important back" * tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs: bcachefs: reconstruct_inode() bcachefs: Subvolume reconstruction bcachefs: Check for extents that point to same space bcachefs: Reconstruct missing snapshot nodes bcachefs: Flag btrees with missing data bcachefs: Topology repair now uses nodes found by scanning to fill holes bcachefs: Repair pass for scanning for btree nodes bcachefs: Don't skip fake btree roots in fsck bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake() bcachefs: Etyzinger cleanups bcachefs: bch2_shoot_down_journal_keys() bcachefs: Clear recovery_passes_required as they complete without errors bcachefs: ratelimit informational fsck errors bcachefs: Check for bad needs_discard before doing discard bcachefs: Improve bch2_btree_update_to_text() mean_and_variance: Drop always failing tests bcachefs: fix nocow lock deadlock bcachefs: BCH_WATERMARK_interior_updates bcachefs: Fix btree node reserve	2024-04-04 14:36:32 -07:00
Kent Overstreet	09d4c2acbf	bcachefs: reconstruct_inode() If an inode is missing, but corresponding extents and dirent still exist, it's well worth recreating it - this does so. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:46:51 -04:00
Kent Overstreet	cc0532900b	bcachefs: Subvolume reconstruction We can now recreate missing subvolumes from dirents and/or inodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:46:51 -04:00
Kent Overstreet	4c02e63dad	bcachefs: Check for extents that point to same space In backpointer repair, if we get a missing backpointer - but there's already a backpointer that points to an existing extent - we've got multiple extents that point to the same space and need to decide which to keep. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:46:51 -04:00
Kent Overstreet	a292be3b68	bcachefs: Reconstruct missing snapshot nodes When the snapshots btree is going, we'll have to delete huge amounts of data - unless we can reconstruct it by looking at the keys that refer to it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:46:51 -04:00
Kent Overstreet	55936afe11	bcachefs: Flag btrees with missing data We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:46:51 -04:00
Kent Overstreet	43f5ea4646	bcachefs: Topology repair now uses nodes found by scanning to fill holes With the new btree node scan code, we can now recover from corrupt btree roots - simply create a new fake root at depth 1, and then insert all the leaves we found. If the root wasn't corrupt but there's corruption elsewhere in the btree, we can fill in holes as needed with the newest version of a given node(s) from the scan; we also check if a given btree node is older than what we found from the scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:45:30 -04:00
Kent Overstreet	4409b8081d	bcachefs: Repair pass for scanning for btree nodes If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements the scanning - next patch will rework topology repair to make use of the found nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:44:18 -04:00
Kent Overstreet	b268aa4e7f	bcachefs: Don't skip fake btree roots in fsck When a btree root is unreadable, we might still have keys fro the journal to walk and mark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:44:18 -04:00
Kent Overstreet	f2f61f4192	bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:44:18 -04:00
Kent Overstreet	ca1e02f7e9	bcachefs: Etyzinger cleanups Pull out eytzinger.c and kill eytzinger_cmp_fn. We now provide eytzinger0_sort and eytzinger0_sort_r, which use the standard cmp_func_t and cmp_r_func_t callbacks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:44:18 -04:00
Kent Overstreet	bdbf953b3c	bcachefs: bch2_shoot_down_journal_keys() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:44:18 -04:00
Kent Overstreet	27fcec6c27	bcachefs: Clear recovery_passes_required as they complete without errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:44:18 -04:00
Linus Torvalds	c85af715ca	Merge tag 'vboxsf-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux Pull vboxsf fixes from Hans de Goede: - Compiler warning fixes - Explicitly deny setlease attempts * tag 'vboxsf-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux: vboxsf: explicitly deny setlease attempts vboxsf: Remove usage of the deprecated ida_simple_xx() API vboxsf: Avoid an spurious warning if load_nls_xxx() fails vboxsf: remove redundant variable out_len	2024-04-03 10:30:52 -07:00
Linus Torvalds	0f099dc9d1	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "ARM: - Ensure perf events programmed to count during guest execution are actually enabled before entering the guest in the nVHE configuration - Restore out-of-range handler for stage-2 translation faults - Several fixes to stage-2 TLB invalidations to avoid stale translations, possibly including partial walk caches - Fix early handling of architectural VHE-only systems to ensure E2H is appropriately set - Correct a format specifier warning in the arch_timer selftest - Make the KVM banner message correctly handle all of the possible configurations RISC-V: - Remove redundant semicolon in num_isa_ext_regs() - Fix APLIC setipnum_le/be write emulation - Fix APLIC in_clrip[x] read emulation x86: - Fix a bug in KVM_SET_CPUID{2,} where KVM looks at the wrong CPUID entries (old vs. new) and ultimately neglects to clear PV_UNHALT from vCPUs with HLT-exiting disabled - Documentation fixes for SEV - Fix compat ABI for KVM_MEMORY_ENCRYPT_OP - Fix a 14-year-old goof in a declaration shared by host and guest; the enabled field used by Linux when running as a guest pushes the size of "struct kvm_vcpu_pv_apf_data" from 64 to 68 bytes. This is really unconsequential because KVM never consumes anything beyond the first 64 bytes, but the resulting struct does not match the documentation Selftests: - Fix spelling mistake in arch_timer selftest" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) KVM: arm64: Rationalise KVM banner output arm64: Fix early handling of FEAT_E2H0 not being implemented KVM: arm64: Ensure target address is granule-aligned for range TLBI KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range() KVM: arm64: Don't pass a TLBI level hint when zapping table entries KVM: arm64: Don't defer TLB invalidation when zapping table entries KVM: selftests: Fix __GUEST_ASSERT() format warnings in ARM's arch timer test KVM: arm64: Fix out-of-IPA space translation fault handling KVM: arm64: Fix host-programmed guest events in nVHE RISC-V: KVM: Fix APLIC in_clrip[x] read emulation RISC-V: KVM: Fix APLIC setipnum_le/be write emulation RISC-V: KVM: Remove second semicolon KVM: selftests: Fix spelling mistake "trigged" -> "triggered" Documentation: kvm/sev: clarify usage of KVM_MEMORY_ENCRYPT_OP Documentation: kvm/sev: separate description of firmware KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP KVM: selftests: Check that PV_UNHALT is cleared when HLT exiting is disabled KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT KVM: x86: Introduce __kvm_get_hypervisor_cpuid() helper KVM: SVM: Return -EINVAL instead of -EBUSY on attempt to re-init SEV/SEV-ES ...	2024-04-03 10:26:37 -07:00
Roberto Sassu	701b38995e	security: Place security_path_post_mknod() where the original IMA call was Commit `08abce60d6` ("security: Introduce path_post_mknod hook") introduced security_path_post_mknod(), to replace the IMA-specific call to ima_post_path_mknod(). For symmetry with security_path_mknod(), security_path_post_mknod() was called after a successful mknod operation, for any file type, rather than only for regular files at the time there was the IMA call. However, as reported by VFS maintainers, successful mknod operation does not mean that the dentry always has an inode attached to it (for example, not for FIFOs on a SAMBA mount). If that condition happens, the kernel crashes when security_path_post_mknod() attempts to verify if the inode associated to the dentry is private. Move security_path_post_mknod() where the ima_post_path_mknod() call was, which is obviously correct from IMA/EVM perspective. IMA/EVM are the only in-kernel users, and only need to inspect regular files. Reported-by: Steve French <smfrench@gmail.com> Closes: https://lore.kernel.org/linux-kernel/CAH2r5msAVzxCUHHG8VKrMPUKQHmBpE6K9_vjhgDa1uAvwx4ppw@mail.gmail.com/ Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: `08abce60d6` ("security: Introduce path_post_mknod hook") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-04-03 10:21:32 -07:00
Borislav Petkov (AMD)	0e11073247	x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO The srso_alias_untrain_ret() dummy thunk in the !CONFIG_MITIGATION_SRSO case is there only for the altenative in CALL_UNTRAIN_RET to have a symbol to resolve. However, testing with kernels which don't have CONFIG_MITIGATION_SRSO enabled, leads to the warning in patch_return() to fire: missing return thunk: srso_alias_untrain_ret+0x0/0x10-0x0: eb 0e 66 66 2e WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:826 apply_returns (arch/x86/kernel/alternative.c:826 Put in a plain "ret" there so that gcc doesn't put a return thunk in in its place which special and gets checked. In addition: ERROR: modpost: "srso_alias_untrain_ret" [arch/x86/kvm/kvm-amd.ko] undefined! make[2]: * [scripts/Makefile.modpost:145: Module.symvers] Chyba 1 make[1]: * [/usr/src/linux-6.8.3/Makefile:1873: modpost] Chyba 2 make: *** [Makefile:240: __sub-make] Chyba 2 since !SRSO builds would use the dummy return thunk as reported by petr.pisar@atlas.cz, https://bugzilla.kernel.org/show_bug.cgi?id=218679. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202404020901.da75a60f-oliver.sang@intel.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/202404020901.da75a60f-oliver.sang@intel.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-04-03 10:12:38 -07:00
Jeff Layton	1ece2c43b8	vboxsf: explicitly deny setlease attempts vboxsf does not break leases on its own, so it can't properly handle the case where the hypervisor changes the data. Don't allow file leases on vboxsf. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240319-setlease-v1-1-5997d67e04b3@kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>	2024-04-03 16:06:39 +02:00
Christophe JAILLET	0141d68f86	vboxsf: Remove usage of the deprecated ida_simple_xx() API ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). This is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/b3c057c86b73f0309a6362031d21f4d7ebb60587.1698835730.git.christophe.jaillet@wanadoo.fr Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>	2024-04-03 16:06:11 +02:00
Christophe JAILLET	de3f64b738	vboxsf: Avoid an spurious warning if load_nls_xxx() fails If an load_nls_xxx() function fails a few lines above, the 'sbi->bdi_id' is still 0. So, in the error handling path, we will call ida_simple_remove(..., 0) which is not allocated yet. In order to prevent a spurious "ida_free called for id=0 which is not allocated." message, tweak the error handling path and add a new label. Fixes: `0fd1695766` ("fs: Add VirtualBox guest shared folder (vboxsf) support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/d09eaaa4e2e08206c58a1a27ca9b3e81dc168773.1698835730.git.christophe.jaillet@wanadoo.fr Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>	2024-04-03 16:05:51 +02:00
Colin Ian King	0200ceed30	vboxsf: remove redundant variable out_len The variable out_len is being used to accumulate the number of bytes but it is not being used for any other purpose. The variable is redundant and can be removed. Cleans up clang scan build warning: fs/vboxsf/utils.c:443:9: warning: variable 'out_len' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20240229225138.351909-1-colin.i.king@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>	2024-04-03 15:55:33 +02:00
Linus Torvalds	3e92c1e6cd	Merge tag 'selinux-pr-20240402' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fix from Paul Moore: "A single patch for SELinux to fix a problem where we could potentially dereference an error pointer if we failed to successfully mount selinuxfs" * tag 'selinux-pr-20240402' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: avoid dereference of garbage after mount failure	2024-04-02 20:13:09 -07:00
Kent Overstreet	fa14b50460	bcachefs: ratelimit informational fsck errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-02 20:24:00 -04:00
Kent Overstreet	7ee88737ab	bcachefs: Check for bad needs_discard before doing discard In the discard worker, we were failing to validate the bucket state - meaning a corrupt needs_discard btree could cause us to discard a bucket that we shouldn't. If check_alloc_info hasn't run yet we just want to bail out, otherwise it's a filesystem inconsistent error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-02 20:24:00 -04:00
Kent Overstreet	e0319af2b6	bcachefs: Improve bch2_btree_update_to_text() Print out the mode as a string, and also print out the btree and watermark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-02 17:13:46 -04:00
Linus Torvalds	b1e6ec0a0f	Merge tag 'docs-6.9-fixes' of git://git.lwn.net/linux Pull documentation fixes from Jonathan Corbet: "Four small documentation fixes" * tag 'docs-6.9-fixes' of git://git.lwn.net/linux: docs: zswap: fix shell command format tracing: Fix documentation on tp_printk cmdline option docs: Fix bitfield handling in kernel-doc Documentation: dev-tools: Add link to RV docs	2024-04-02 12:44:09 -07:00
Linus Torvalds	67199a47dd	Merge tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs Pull bcachefs fixes from Kent Overstreet: "Lots of fixes for situations with extreme filesystem damage. One fix ("Fix journal pins in btree write buffer") applicable to normal usage; also a dio performance fix. New repair/construction code is in the final stages, should be ready in about a week. Anyone that lost btree interior nodes (or a variety of other damage) as a result of the splitbrain bug will be able to repair then" * tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs: (32 commits) bcachefs: On emergency shutdown, print out current journal sequence number bcachefs: Fix overlapping extent repair bcachefs: Fix remove_dirent() bcachefs: Logged op errors should be ignored bcachefs: Improve -o norecovery; opts.recovery_pass_limit bcachefs: bch2_run_explicit_recovery_pass_persistent() bcachefs: Ensure bch_sb_field_ext always exists bcachefs: Flush journal immediately after replay if we did early repair bcachefs: Resume logged ops after fsck bcachefs: Add error messages to logged ops fns bcachefs: Split out recovery_passes.c bcachefs: fix backpointer for missing alloc key msg bcachefs: Fix bch2_btree_increase_depth() bcachefs: Kill bch2_bkey_ptr_data_type() bcachefs: Fix use after free in check_root_trans() bcachefs: Fix repair path for missing indirect extents bcachefs: Fix use after free in bch2_check_fix_ptrs() bcachefs: Fix btree node keys accounting in topology repair path bcachefs: Check btree ptr min_key in .invalid bcachefs: add REQ_SYNC and REQ_IDLE in write dio ...	2024-04-02 11:51:42 -07:00
Guenter Roeck	97ca7c1f93	mean_and_variance: Drop always failing tests mean_and_variance_test_2 and mean_and_variance_test_4 always fail. The input parameters to those tests are identical to the input parameters to tests 1 and 3, yet the expected result for tests 2 and 4 is different for the mean and stddev tests. That will always fail. Expected mean_and_variance_get_mean(mv) == mean[i], but mean_and_variance_get_mean(mv) == 22 (0x16) mean[i] == 10 (0xa) Drop the bad tests. Fixes: `65bc410907` ("mean and variance: More tests") Closes: https://lore.kernel.org/lkml/065b94eb-6a24-4248-b7d7-d3212efb4787@roeck-us.net/ Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-02 14:45:08 -04:00
Paolo Bonzini	9bc60f7338	Merge tag 'kvm-riscv-fixes-6.9-1' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv fixes for 6.9, take #1 - Fix spelling mistake in arch_timer selftest - Remove redundant semicolon in num_isa_ext_regs() - Fix APLIC setipnum_le/be write emulation - Fix APLIC in_clrip[x] read emulation	2024-04-02 12:29:51 -04:00
Paolo Bonzini	52b761b48f	Merge tag 'kvmarm-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.9, part #1 - Ensure perf events programmed to count during guest execution are actually enabled before entering the guest in the nVHE configuration. - Restore out-of-range handler for stage-2 translation faults. - Several fixes to stage-2 TLB invalidations to avoid stale translations, possibly including partial walk caches. - Fix early handling of architectural VHE-only systems to ensure E2H is appropriately set. - Correct a format specifier warning in the arch_timer selftest. - Make the KVM banner message correctly handle all of the possible configurations.	2024-04-02 12:26:15 -04:00
Kent Overstreet	c42cd606e4	bcachefs: fix nocow lock deadlock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-02 01:04:10 -04:00
Christian Göttsche	37801a36b4	selinux: avoid dereference of garbage after mount failure In case kern_mount() fails and returns an error pointer return in the error branch instead of continuing and dereferencing the error pointer. While on it drop the never read static variable selinuxfs_mount. Cc: stable@vger.kernel.org Fixes: `0619f0f5e3` ("selinux: wrap selinuxfs state") Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2024-04-01 23:32:35 -04:00
Kent Overstreet	e2a316b3cc	bcachefs: BCH_WATERMARK_interior_updates This adds a new watermark, higher priority than BCH_WATERMARK_reclaim, for interior btree updates. We've seen a deadlock where journal replay triggers a ton of btree node merges, and these use up all available open buckets and then interior updates get stuck. One cause of this is that we're currently lacking btree node merging on write buffer btrees - that needs to be fixed as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 21:14:02 -04:00
Kent Overstreet	ba947ecd39	bcachefs: Fix btree node reserve Sign error when checking the watermark - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 21:14:02 -04:00
Linus Torvalds	026e680b0a	Merge tag 'pwm/for-6.9-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux Pull pwm fix from Uwe Kleine-König: "This fixes a regression intoduced by an off-by-one in v6.9-rc1 making the pwm-pxa and the pwm driver in ti-sn65dsi86 unusable for most consumer drivers because the default period wasn't set" * tag 'pwm/for-6.9-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux: pwm: Fix setting period with #pwm-cells = <1> and of_pwm_single_xlate()	2024-04-01 14:38:55 -07:00
Marc Zyngier	d96c66ab9f	KVM: arm64: Rationalise KVM banner output We are not very consistent when it comes to displaying which mode we're in (VHE, {n,h}VHE, protected or not). For example, booting in protected mode with hVHE results in: [ 0.969545] kvm [1]: Protected nVHE mode initialized successfully which is mildly amusing considering that the machine is VHE only. We already cleaned this up a bit with commit `1f3ca7023f` ("KVM: arm64: print Hyp mode"), but that's still unsatisfactory. Unify the three strings into one and use a mess of conditional statements to sort it out (yes, it's a slow day). Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240321173706.3280796-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-04-01 01:33:52 -07:00
Marc Zyngier	b3320142f3	arm64: Fix early handling of FEAT_E2H0 not being implemented Commit `3944382fa6` introduced checks for the FEAT_E2H0 not being implemented. However, the check is absolutely wrong and makes a point it testing a bit that is guaranteed to be zero. On top of that, the detection happens way too late, after the init_el2_state has done its job. This went undetected because the HW this was tested on has E2H being RAO/WI, and not RES1. However, the bug shows up when run as a nested guest, where HCR_EL2.E2H is not necessarily set to 1. As a result, booting the kernel in hVHE mode fails with timer accesses being cought in a trap loop (which was fun to debug). Fix the check for ID_AA64MMFR4_EL1.E2H0, and set the HCR_EL2.E2H bit early so that it can be checked by the rest of the init sequence. With this, hVHE works again in a NV environment that doesn't have FEAT_E2H0. Fixes: `3944382fa6` ("arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative") Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240321115414.3169115-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-04-01 01:33:29 -07:00
Will Deacon	4c36a15673	KVM: arm64: Ensure target address is granule-aligned for range TLBI When zapping a table entry in stage2_try_break_pte(), we issue range TLB invalidation for the region that was mapped by the table. However, we neglect to align the base address down to the granule size and so if we ended up reaching the table entry via a misaligned address then we will accidentally skip invalidation for some prefix of the affected address range. Align 'ctx->addr' down to the granule size when performing TLB invalidation for an unmapped table in stage2_try_break_pte(). Cc: Raghavendra Rao Ananta <rananta@google.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Shaoqin Huang <shahuang@redhat.com> Cc: Quentin Perret <qperret@google.com> Fixes: `defc8cc7ab` ("KVM: arm64: Invalidate the table entries upon a range") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240327124853.11206-5-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-04-01 01:30:45 -07:00
Will Deacon	0f0ff097bf	KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range() Commit `c910f2b655` ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2") updated the __tlbi_level() macro to take the target level as an argument, with TLBI_TTL_UNKNOWN (rather than 0) indicating that the caller cannot provide level information. Unfortunately, the two implementations of __kvm_tlb_flush_vmid_range() were not updated and so now ask for an level 0 invalidation if FEAT_LPA2 is implemented. Fix the problem by passing TLBI_TTL_UNKNOWN instead of 0 as the level argument to __flush_s2_tlb_range_op() in __kvm_tlb_flush_vmid_range(). Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Fixes: `c910f2b655` ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240327124853.11206-4-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-04-01 01:30:45 -07:00
Will Deacon	36e0083239	KVM: arm64: Don't pass a TLBI level hint when zapping table entries The TLBI level hints are for leaf entries only, so take care not to pass them incorrectly after clearing a table entry. Cc: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Fixes: `82bb02445d` ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2") Fixes: `6d9d2115c4` ("KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240327124853.11206-3-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-04-01 01:30:45 -07:00
Will Deacon	f62d4c3eb6	KVM: arm64: Don't defer TLB invalidation when zapping table entries Commit `7657ea920c` ("KVM: arm64: Use TLBI range-based instructions for unmap") introduced deferred TLB invalidation for the stage-2 page-table so that range-based invalidation can be used for the accumulated addresses. This works fine if the structure of the page-tables remains unchanged, but if entire tables are zapped and subsequently freed then we transiently leave the hardware page-table walker with a reference to freed memory thanks to the translation walk caches. For example, stage2_unmap_walker() will free page-table pages: if (childp) mm_ops->put_page(childp); and issue the TLB invalidation later in kvm_pgtable_stage2_unmap(): if (stage2_unmap_defer_tlb_flush(pgt)) /* Perform the deferred TLB invalidations */ kvm_tlb_flush_vmid_range(pgt->mmu, addr, size); For now, take the conservative approach and invalidate the TLB eagerly when we clear a table entry. Note, however, that the existing level hint passed to __kvm_tlb_flush_vmid_ipa() is incorrect and will be fixed in a subsequent patch. Cc: Raghavendra Rao Ananta <rananta@google.com> Cc: Shaoqin Huang <shahuang@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240327124853.11206-2-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-04-01 01:30:45 -07:00
Kent Overstreet	b3c7fd35c0	bcachefs: On emergency shutdown, print out current journal sequence number Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 01:07:24 -04:00
Kent Overstreet	eab3a3ce2d	bcachefs: Fix overlapping extent repair overlapping extent repair was colliding with extent past end of inode checks - don't update "extent ends at" until we know we have an extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 01:05:50 -04:00
Kent Overstreet	8ce1db8091	bcachefs: Fix remove_dirent() We were missing an iter_traverse(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 00:52:32 -04:00
Kent Overstreet	cecfed9b44	bcachefs: Logged op errors should be ignored If something is wrong with a logged op, we just want to delete it - there's nothing to repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 00:04:10 -04:00
Kent Overstreet	13c1e583f9	bcachefs: Improve -o norecovery; opts.recovery_pass_limit This adds opts.recovery_pass_limit, and redoes -o norecovery to make use of it; this fixes some issues with -o norecovery so it can be safely used for data recovery. Norecovery means "don't do journal replay"; it's an important data recovery tool when we're getting stuck in journal replay. When using it this way we need to make sure we don't free journal keys after startup, so we continue to overlay them: thus it needs to imply retain_recovery_info, as well as nochanges. recovery_pass_limit is an explicit option for telling recovery to exit after a specific recovery pass; this is a much cleaner way of implementing -o norecovery, as well as being a useful debug feature in its own right. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:12 -04:00
Kent Overstreet	060ff30a85	bcachefs: bch2_run_explicit_recovery_pass_persistent() Flag that we need to run a recovery pass and run it - persistenly, so if we crash it'll still get run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:12 -04:00
Kent Overstreet	0a34c058fc	bcachefs: Ensure bch_sb_field_ext always exists This makes bch_sb_field_ext more consistent with the rest of -o nochanges - we don't want to be varying other codepaths based on -o nochanges, since it's used for testing in dry run mode; also fixes some potential null ptr derefs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:12 -04:00
Kent Overstreet	4fe0eeeae4	bcachefs: Flush journal immediately after replay if we did early repair Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:12 -04:00
Kent Overstreet	af855a5f5e	bcachefs: Resume logged ops after fsck Finishing logged ops requires the filesystem to be in a reasonably consistent state - and other fsck passes don't require it to have completed, so just run it last. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:12 -04:00

1 2 3 4 5 ...

1265020 Commits