Make buffer heads point to mapping_metadata_bhs instead of struct
address_space. This makes the code more self contained. For the (only)
case of IO error handling where we really need to reach struct
address_space add a pointer to the mapping from mapping_metadata_bhs.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260326095354.16340-73-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
Instead of tracking metadata bhs for a mapping using i_private_list and
i_private_lock create a dedicated mapping_metadata_bhs struct for it.
So far this struct is embedded in address_space but that will be
switched for per-fs private inode parts later in the series. This also
changes the locking from bdev mapping's i_private_lock to a new lock
embedded in mapping_metadata_bhs to untangle the i_private_lock locking
for maintaining lists of metadata bhs and the locking for looking up /
reclaiming bdev's buffer heads. The locking in remove_assoc_map() gets
more complex due to this but overall this looks like a reasonable
tradeoff.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260326095354.16340-72-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
There's only single caller of fsync_buffers_list() so untangle the code
a bit by folding fsync_buffers_list() into sync_mapping_buffers(). Also
merge the comments and update them to reflect current state of code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260326095354.16340-71-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
The function only waits for already locked buffers in the list of
metadata bhs. fsync_buffers_list() has just waited for all outstanding
IO on buffers so this isn't adding anything useful. Comment in front of
fsync_buffers_list() mentions concerns about buffers being moved out
from tmp list back to mappings i_private_list but these days
mark_buffer_dirty_inode() doesn't touch inodes with b_assoc_map set so
that cannot happen. Just delete the stale code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260326095354.16340-70-jack@suse.cz
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
There are only a few filesystems that use generic tracking of inode
metadata buffer heads. As such the logic to reclaim tracked metadata
buffer heads in inode_lru_isolate() doesn't bring a benefit big enough
to justify intertwining of inode reclaim and metadata buffer head
tracking. Just treat tracked metadata buffer heads as any other metadata
filesystem has to properly clean up on inode eviction and stop handling
it in inode_lru_isolate(). As a result filesystems using generic
tracking of metadata buffer heads may now see dirty metadata buffers in
their .evict methods more often which can slow down inode reclaim but
given these filesystems aren't used in performance demanding setups we
should be fine.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260326095354.16340-64-jack@suse.cz
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
No filesystem calling __generic_file_fsync() uses metadata bh tracking.
Drop sync_mapping_buffers() call from __generic_file_fsync() as it's
pointless now which untangles buffer head handling from fs/libfs.c.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260326095354.16340-55-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
FAT uses a list of metadata bhs attached to an inode. Switch it to use
generic_buffers_fsync_noflush() instead of __generic_file_fsync() as
we'll be removing metadata bh handling from __generic_file_fsync().
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260326095354.16340-54-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
Inode lock in __generic_file_fsync() protects sync_mapping_buffers() and
sync_inode_metadata() calls. Neither sync_mapping_buffers() nor
sync_inode_metadata() themselves need the protection by inode_lock and
both metadata buffer head writeback and inode writeback can happen
without inode lock (either in case of background writeback or sync(2)
calls). The only protection inode_lock can possibly provide is that
write(2) or other inode modifying calls cannot happen in the middle of
bh+inode writeout and thus result in writeout of inconsistent metadata.
However if writes and fsyncs race, background writeback can submit
inconsistent metadata just after fsync completed even with inode_lock
protecting fsync so this seems moot as well. So let's remove the
apparently pointless inode_lock protection.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260326095354.16340-50-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
Nobody is calling mark_buffer_dirty_inode() with internal bdev inode and
it doesn't make sense for internal bdev inode to have any metadata
buffer heads. Just drop the pointless invalidate_inode_buffers() call
and consequently the whole bdev_evict_inode() because generic code takes
care of the rest.
CC: linux-block@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260326095354.16340-47-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
Pull EFI fix from Ard Biesheuvel:
"Fix for the x86 EFI workaround keeping boot services code and data
regions reserved until after SetVirtualAddressMap() completes:
deferred struct page initialization may result in some of this memory
being lost permanently"
* tag 'efi-fixes-for-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
x86/efi: defer freeing of boot services memory
Pull i2c fix from Wolfram Sang:
"A revert for the i801 driver restoring old locking behaviour"
* tag 'i2c-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i801: Revert "i2c: i801: replace acpi_lock with I2C bus lock"
Pull x86 fixes from Ingo Molnar:
- Fix SEV guest boot failures in certain circumstances, due to
very early code relying on a BSS-zeroed variable that isn't
actually zeroed yet an may contain non-zero bootup values
Move the variable into the .data section go gain even earlier
zeroing
- Expose & allow the IBPB-on-Entry feature on SNP guests, which
was not properly exposed to guests due to initial implementational
caution
- Fix O= build failure when CONFIG_EFI_SBAT_FILE is using relative
file paths
- Fix the various SNC (Sub-NUMA Clustering) topology enumeration
bugs/artifacts (sched-domain build errors mostly).
SNC enumeration data got more complicated with Granite Rapids X
(GNR) and Clearwater Forest X (CWF), which exposed these bugs
and made their effects more serious
- Also use the now sane(r) SNC code to fix resctrl SNC detection bugs
- Work around a historic libgcc unwinder bug in the vdso32 sigreturn
code (again), which regressed during an overly aggressive recent
cleanup of DWARF annotations
* tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/vdso32: Work around libgcc unwinder bug
x86/resctrl: Fix SNC detection
x86/topo: Fix SNC topology mess
x86/topo: Replace x86_has_numa_in_package
x86/topo: Add topology_num_nodes_per_package()
x86/numa: Store extra copy of numa_nodes_parsed
x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file paths
x86/sev: Allow IBPB-on-Entry feature for SNP guests
x86/boot/sev: Move SEV decompressor variables into the .data section
Pull timer fix from Ingo Molnar:
"Make clock_adjtime() syscall timex validation slightly more permissive
for auxiliary clocks, to not reject syscalls based on the status field
that do not try to modify the status field.
This makes the ABI behavior in clock_adjtime() consistent with
CLOCK_REALTIME"
* tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix timex status validation for auxiliary clocks
Pull scheduler fix from Ingo Molnar:
"Fix a DL scheduler bug that may corrupt internal metrics during PI and
setscheduler() syscalls, resulting in kernel warnings and misbehavior.
Found during stress-testing"
* tag 'sched-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
Pull SCSI fixes from James Bottomley:
"Two core changes and the rest in drivers, one core change to quirk the
behaviour of the Iomega Zip drive and one to fix a hang caused by tag
reallocation problems, which has mostly been seen by the iscsi client.
Note the latter fixes the problem but still has a slight sysfs memory
leak, so will be amended in the next pull request (once we've run the
fix for the fix through our testing)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: Fix recursive locking in __configfs_open_file()
scsi: devinfo: Add BLIST_SKIP_IO_HINTS for Iomega ZIP
scsi: mpi3mr: Clear reset history on ready and recheck state after timeout
scsi: core: Fix refcount leak for tagset_refcnt
Pull fbdev fix from Helge Deller:
"Silence build error in au1100fb driver found by kernel test robot"
* tag 'fbdev-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: au1100fb: Fix build on MIPS64
Pull parisc fixes from Helge Deller:
"While testing Sasha Levin's 'kallsyms: embed source file:line info in
kernel stack traces' patch series, which increases the typical kernel
image size, I found some issues with the parisc initial kernel mapping
which may prevent the kernel to boot.
The three small patches here fix this"
* tag 'parisc-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix initial page table creation for boot
parisc: Check kernel mapping earlier at bootup
parisc: Increase initial mapping to 64 MB with KALLSYMS